KDnuggets : News : 2007 : n23 : item7 < PREVIOUS | NEXT >


Subject: Anonymity of Netflix Prize Dataset Broken

Slashdot. "The anonymity of the Netflix Prize dataset has been broken by a pair of computer scientists from the University of Texas, according to a report from the physics arXivblog.

It turns out that an individual's set of ratings and the dates on which they were made are pretty unique, particularly if the ratings involve films outside the most popular 100 movies. So it's straightforward to find a match by comparing the anonymized data against publicly available ratings on the Internet Movie Database (IMDb) (abstract on the physics arxiv). The researchers used this method to find how individuals on the IMDb privately rated films on Netflix, in the process possibly working out their political affiliation, sexual preferences and a number of other personal details"

Read more.

See the paper below.

(Note: In the paper, the authors applied their method to a few dozen IMDB users and wre able to find a strong match for 2, whose rating on less popular movies were unusual. So the method in the paper seems to be far from revealing anonymity of all Netflix Prize users, but it may reveal the anonymity of extreme few. Editor)

How To Break Anonymity of the Netflix Prize Dataset, by Arvind Narayanan, Vitaly Shmatikov

Abstract: We present a new class of statistical de-anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on. Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary's background knowledge.

We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world's largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber's record in the dataset. Using the Internet Movie Database as the source of background knowledge, we successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive information.

Bookmark using any bookmark manager!

KDnuggets : News : 2007 : n23 : item7 < PREVIOUS | NEXT >

Copyright © 2007 KDnuggets.   Subscribe to KDnuggets News!