Music Popularity Predictor

How do song metrics impact popularity?

Statistics 810, also referred to as "Mathematical Statistics for Data Scientists" at Michigan State is part of the Master's in Data Science degree program and instructed by Dr. Paul Speaker.

Much of the coursework includes hands-on application of statistical materials learned in class with computational applications in R. The final assignment in the course assigned a partnership based project using Spotify track information for songs, including their popularity rating and a number of numerical and categorical characteristics.

The goal of the project was to develop a predictive model using simple statistics, feature engineering, data visualization, and analytics to predict song popularity. With my partner, Chris Grandy (M.S. Data Science candidate), we completed exploratory analytics to come to the conclusion that the model was particularly random and a few categories of raw data were truly the best predictor of this seemingly random dataset. We used a 70%-30% train-test-split to test our conclusions. Computational design of this model was conducted in R, using the following packages:

  • ggplot2
  • dplyr
  • corrplot
  • tidyverse
  • MASS
  • Plotly
  • Patchwork
  • caTools
The source of the data provided is found here.

The full report and exploration is shown below and can be read in detail to view the processes followed. Our final model elected to use characteristics in energy, key, and speechiness to predict song popularity. The project was a phenomenal lesson that not all information has strong enough connections to develop truly successful models from. It would be ideal to continue the investigation outside the scope of the course timeline.