David Smith, Revolutions Blog, May 31, 2011
One of the key data analysis tools that the BellKor team used to win the Netflix Prize was the Singular Value Decomposition (SVD) algorithm. As a file on disk, the Neflix Prize data (a matrix of about 480,000 members' ratings for about 18,000 movies) was about 65Gb in size -- too large to be read into the standard in-memory data model of open-source R directly. But in the video below, Brian Lewis shows us how to use the sparse Matrix object in R to efficiently store the data (about 99 million actual movie ratings) and the irlba package (which features a fast and efficient SVD algorithm for big data) to perform SVD analysis on the Netflix data in R.
Big Computing: Bryan Lewis's Vignette on IRLBA for SVD in R