- One of the metrics that Spotify uses to characterise its songs, is valence. Valence is used to measure how happy or sad a particular song is.
- The calculation of valence was developed by The Echo Nest, which was acquired by Spotify in 2014.
- Though the valence of each song is publicly available, there is a mystery that surrounds its calculation. Some information can be found here but not something very specific.
- In this project we will use Machine Learning methods to create a predictive model for the valence calculation.
The analysis was executed on Jupyter (Jupyter Notebook 6.4.4 and Python 3.9.0 will work for sure).
Additional packages required for the project to run are:
All the packages above can be installed using the pip install
command-line command.
Two data sources were used:
- Spotify's Web API: Spotify offers numerous metrics for every song through its API. Specifically, Get Tracks' Audio Features and Get Track's Audio Analysis operations were used.
- Spotify-Data 1921-2020 from Kaggle. This dataset was used to get the Spotify ids from many songs.
Important Note: Some files that contain data, obtained from the API of Spotify, were too large to by uploaded. If a user wants to access this files without executing the corresponding code, please contact the author.
This repository contains 4 notebooks, and each of them has its own purpose. Specifically:
- data_preparation: contains the code used to collect and prepare the data that will be used by the other notebooks.
- statistics: contains some statistical analyses, that were done in order to understand better the correlation between numerous features and valence.
- non_nn_predictive: contains the development of various predictive models (not including neural networks).
- nn_predictive: contains the development of neural network models, for valence prediction.
The best results were achieved by the Neural Network built around all the data collected and metadata created. Specifically, its Mean Absolute Error on the test set was 0.0846.