## Features
**From: Gregory Piatetsky-Shapiro**
**Subject: Interview with Simon Funk: Why SVD approach?**
*GPS: Q3) What led you to choose SVD approach over
other methods. You have explained it well in
your blog post,
but can you briefly summarize the SVD approach?*
**Simon Funk: **
The best way to understand SVD is probably in reverse: to look at how one
re-constructs a data matrix from the singular vectors. Consider just a single
column-vector A and corresponding row-vector B. If you multiply A by B you
get a matrix as tall as A and as wide as B. Now if you have some target
data matrix of the same size (say the Netflix movie-by-user ratings matrix)
you can ask: What choice of A and B would make that reconstructed matrix as
close to my target matrix as possible?
SVD is a mathematical
trick for finding that optimal AB pair. It's really just that simple, and
the only additional tidbit is that once you've found that first AB pair, you
can repeat the process with the leftovers (the difference between the original
target matrix and AB) to get a second pair of vectors, C and D, and so on,
such that the target matrix T is approximated by: T = AB + CD + ..., with
each successive term being progressively less significant due to the "biggest
bite first" nature of the basic algorithm.
Looked at that way, it's clear that SVD is a particular way of
*modeling* the data matrix, T. Assuming we trust the math to find
the optimal parameters (A, B, ...), the question is how well does the model
reflect
the true process behind the data matrix? Here A is a vector over movies,
and B over users, and the matrix AB has in each cell the movie value from
A times the user value from B. So in effect we are saying there is some
aspect or attribute which is measured for each movie by A, such as "how much
Action does this movie have?", and for each user by B, such as "how much
does this user like Action movies?", and we just multiply those together
to get our AB estimate for how much each user would like each movie
based only on the amount of Action. The model further says that the
contributions
from the different attributes are just added up linearly to make a final
prediction. And that's it.
It's a very simple model, but it feels qualitatively correct to a first
degree of approximation. And I had a very simple way to implement it using
the incremental SVD method that I had come up with previously in my attempts
to filter spam among other things, so I gave it a go. Honestly I only
expected
it to do about as well as Netflix's posted baseline--I was quite surprised
when it did as well as it did!
*continued in the next KDnuggets News 07:n09
*
What's this? |