Conducted by: Jingyu Chen, Yulin Hong, Yingxin Jiang, Yi Kuang, Xintong Li, Alice Liu, Yichen Wang, Kexin Wang
This project is conducted using Spotify dataset on Kaggle.
We also build a front-end web app of our project, to use it, please refer to this URL (https://8potify.netlify.app/)
The data dictionary is as followed:
- id (Id of track generated by Spotify)
Numerical:
- acousticness (Ranges from 0 to 1)
- danceability (Ranges from 0 to 1)
- energy (Ranges from 0 to 1)
- duration_ms (Integer typically ranging from 200k to 300k)
- instrumentalness (Ranges from 0 to 1)
- valence (Ranges from 0 to 1)
- popularity (Ranges from 0 to 100)
- tempo (Float typically ranging from 50 to 150)
- liveness (Ranges from 0 to 1)
- loudness (Float typically ranging from -60 to 0)
- speechiness (Ranges from 0 to 1)
- year (Ranges from 1921 to 2020)
Dummy:
- mode (0 = Minor, 1 = Major)
- explicit (0 = No explicit content, 1 = Explicit content)
- genre_fit (1 = not fit the desired genre, 5 = Excellent fit)
- mood_fit (1 = not fit the desired mood, 5 = Excellent fit)
- age (ranges from 20-29)
Categorical:
- key (All keys on octave encoded as values ranging from 0 to 11, starting on C as 0, C# as 1 and so on…)
- artists (List of artists mentioned)
- release_date (Date of release mostly in yyyy-mm-dd format, however precision of date may vary)
- name (Name of the song)