Bryan Lewis shows how to use IRLBA R package to do SVD on the Netflix Prize data set
Date:
David Smith, Revolutions Blog, May 31, 2011
One of the key data analysis tools that the BellKor team used to
win the Netflix Prize was the
Singular Value Decomposition (SVD) algorithm.
As a file on disk, the Neflix Prize data (a matrix of about 480,000 members' ratings for about 18,000 movies) was about 65Gb in size -- too large to be read into the standard in-memory data model of
open-source R
directly. But in the video below,
Brian Lewis shows us how to use the sparse
Matrix object in R to efficiently store the data (about 99 million actual movie ratings) and the
irlba package (which features a fast and efficient SVD algorithm for big data) to perform SVD analysis on the Netflix data in R.