In this repository we implement Spotify's Generalist-Specialist score on the MovieLens dataset.
The code here accompanies the Medium article "Different Strokes for Different Folks at Spotify".
We recommend using Anaconda to create a virtual environment. Install the requirements via
conda install --yes --file requirements.txt
or if using pip
, run
pip install -r requirements.txt
To run the notebook, simply start a Jupyter notebook session via jupyter notebook
The notebook should be able to download the MovieLens dataset and unzip it in the
current directory.
Training of the model and scoring the generalist-specialist (GS) scores of all users on the MovieLens dataset can be time consuming.
The results of the Shannon entropy and GS scores for MovieLens users are shown here:
As seen from the histogram above, the majority of users have a wide range of movies watched and rated. The spike at the 1.0 bin are mainly due to users who have only watched and rated a single movie. There are 857 users who watched and rated more than a single movie with a GS score of above 0.90 using our trained Word2Vec model.