Chart Longevity and Stardom Prediction using Spotify Data

This project aims to predict song duration in Spotify's top 200 charts and classify artists as one-hit wonders or stars using data from Spotify's top 200 playlists from 2017 to 2023. The project combines generic variables like artist names and nationalities with engagement metrics like danceability and points accumulated.

Installation

The full script you need is located in the spotify_analysis.ipynb notebook. To run the project, please install the following libraries:

gdown
pandas
numpy
sklearn
matplotlib
seaborn
statsmodels
joblib
imblearn

You can install these libraries using pip:

pip install gdown pandas numpy sklearn matplotlib seaborn statsmodels joblib imblearn

Dataset

The dataset used in this project is stored in a public Google Drive file. The notebook is configured to download the dataset automatically.

Project Structure

The project consists of two main tasks:

Regression Task: Predicting the longevity of a song in the charts
Classification Task: Predicting whether an artist will be a star or a one-hit wonder

Regression Task

The regression task involves the following steps:

Data preprocessing
Exploratory data analysis
Model implementation and evaluation

The best-performing model for the regression task was Linear Regression, with an average error of around 3 days per song duration.

Classification Task

The classification task involves the following steps:

Data preprocessing
Exploratory data analysis
Model implementation and evaluation

The best-performing model for the classification task was the improved XGBoost classifier, achieving an accuracy of 71% with a balanced target variable.

Usage

Clone the repository
Install the required libraries
Run the Jupyter notebook

The notebook will download the dataset from the provided Google Drive link and perform the analysis.

Results

The project demonstrates the effectiveness of machine learning models in predicting song duration and artist stardom using data from Spotify. The insights gained from this analysis can be used by Spotify to identify rising stars and predict the chart durations of their hits, enabling smarter, dual-focused marketing efforts.

Future Work

Potential areas for expansion include integrating social media metrics and broader market trends to enrich the dataset and explore more complex algorithms.

Report

If you are too lazy to check the code out, the final report for the project is in the spotify_analysis.pdf file.

License

This project is open-source and available under the MIT License.

Acknowledgements

We would like to thank Spotify for providing the dataset used in this project.

References

The project report includes a list of references that were used in the research and development of this project.

Feel free to contribute to the project by submitting pull requests or opening issues for any bugs or feature requests.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
figures		figures
.gitignore		.gitignore
README.md		README.md
spotify_analysis.ipynb		spotify_analysis.ipynb
spotify_analysis.pdf		spotify_analysis.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chart Longevity and Stardom Prediction using Spotify Data

Installation

Dataset

Project Structure

Regression Task

Classification Task

Usage

Results

Future Work

Report

License

Acknowledgements

References

About

Releases

Packages

Languages

marcv12/spotify_analysis

Folders and files

Latest commit

History

Repository files navigation

Chart Longevity and Stardom Prediction using Spotify Data

Installation

Dataset

Project Structure

Regression Task

Classification Task

Usage

Results

Future Work

Report

License

Acknowledgements

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages