Defeating Machine Learning: The IJCNN Social Network Challenge results

The team won by using de-anonymization research, significantly surpassing other competitors which used machine learning / data mining approaches.

The IJCNN Social Network Challenge finished recently on Kaggle.

Participants were given about 7M contacts/edges from an online social network (Flickr) and had to predict whether the connections among a further 8,960 edges are true or false.

While the intention was that participants use machine learning, team IND CCA won in a dramatic style by using de-anonymization research. The team included Arvind Narayanan, a co-author of a landmark paper on de-anonymization of Netflix prize, ( Robust De-anonymization of Large Sparse Datasets), which likely led to cancellation of the second Netflix prize.

Here is Arvind Narayanan description of how they won

I myself work in computer security and privacy, and my specialty is de-anonymization. That explains why the other team members (Elaine Shi, Ben Rubinstein, and Yong J Kil) invited me to join them with the goal of de-anonymizing the contest graph and combining that with machine learning.
To clarify: our goal was to map the nodes in the training dataset to the real identities in the social network that was used to create the data. That would allow us to simply look up the pairs of nodes in the test set in the real graph to see whether or not the edge exists. There would be a small error rate because some edges may have changed after the Kaggle crawl was conducted, but we assumed this would be negligible.

... By the time we crawled 1 million nodes we were hitting a 60-70% coverage of the 38k nodes in the test set.

Read more.