Skip to content

dj3500/movielens-matrix-completion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

MovieLens-1M matrix completion

The MovieLens-1M dataset has found much use in experiments for machine learning papers. The dataset contains approximately 1 million ratings for 3900 movies by 6040 users. The usual experimental setup is to compute a low-rank completion of this user-movie rating matrix, which gives rise to low-dimensional feature vectors for each user and for each movie. We can then say that the dot product of a user vector with a movie vector approximates the rating of that movie by that user; we can also approximate the similarity of two movies by their (possibly normalized) dot product.

One particular area of application is submodular maximization. Having the above vectors, we can define useful monotone submodular objective functions to build a personalized movie recommendation system. See references below for examples.

This small script, written in 2017 by Jakub Tarnawski (dj3500), reads the user-movie matrix from the MovieLens-1M dataset and computes its low-rank completion (which yields the user and movie vectors). The rank is set to 20.

Usage

Ensure that the dependencies are installed. These are:

  • Python version at least 3.6
  • scikit-learn
  • numpy
  • fancyimpute (can be installed by running pip install fancyimpute)

Then:

  • download the MovieLens-1M dataset
  • extract the ratings.dat file from the archive
  • download the prepare_movies.py script to the same directory
  • run it (python prepare_movies.py)

It should take a few minutes and produce files U.txt and VT.txt in the same directory. These files hold the $U$ (user) and $V$ (movie) matrices, respectively, in text format. The first line holds the number of rows and columns, respectively, and then the matrix is given as space-separated decimal values. The second matrix $V$ is transposed. The idea is that the MovieLens sparse user-movie matrix $M$ is approximately equal to $M \approx U \cdot V^\top$ (on the entries that are present in $M$).

References

[1] is the MovieLens-1M dataset. [2] introduced a facility-location objective function that subsequent works use. [3-6] use this script in their experiments.

About

Script to compute matrix completion for Movielens-1M dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages