Interview with Simon Funk: Why SVD approach? (KDnuggets News 07:08, item 6, Features)

KDnuggets : News : 2007 : n08 : item6

Features

From: Gregory Piatetsky-Shapiro
Subject: Interview with Simon Funk: Why SVD approach?

GPS: Q3) What led you to choose SVD approach over other methods. You have explained it well in your blog post, but can you briefly summarize the SVD approach?

Simon Funk: The best way to understand SVD is probably in reverse: to look at how one re-constructs a data matrix from the singular vectors. Consider just a single column-vector A and corresponding row-vector B. If you multiply A by B you get a matrix as tall as A and as wide as B. Now if you have some target data matrix of the same size (say the Netflix movie-by-user ratings matrix) you can ask: What choice of A and B would make that reconstructed matrix as close to my target matrix as possible?

SVD is a mathematical trick for finding that optimal AB pair. It's really just that simple, and the only additional tidbit is that once you've found that first AB pair, you can repeat the process with the leftovers (the difference between the original target matrix and AB) to get a second pair of vectors, C and D, and so on, such that the target matrix T is approximated by: T = AB + CD + ..., with each successive term being progressively less significant due to the "biggest bite first" nature of the basic algorithm.

Looked at that way, it's clear that SVD is a particular way of modeling the data matrix, T. Assuming we trust the math to find the optimal parameters (A, B, ...), the question is how well does the model reflect the true process behind the data matrix? Here A is a vector over movies, and B over users, and the matrix AB has in each cell the movie value from A times the user value from B. So in effect we are saying there is some aspect or attribute which is measured for each movie by A, such as "how much Action does this movie have?", and for each user by B, such as "how much does this user like Action movies?", and we just multiply those together to get our AB estimate for how much each user would like each movie based only on the amount of Action. The model further says that the contributions from the different attributes are just added up linearly to make a final prediction. And that's it.

It's a very simple model, but it feels qualitatively correct to a first degree of approximation. And I had a very simple way to implement it using the incremental SVD method that I had come up with previously in my attempts to filter spam among other things, so I gave it a go. Honestly I only expected it to do about as well as Netflix's posted baseline--I was quite surprised when it did as well as it did!

continued in the next KDnuggets News 07:n09

What's this?

KDnuggets : News : 2007 : n08 : item6

PREVIOUS | NEXT