Million Song Dataset Challenge

You have the full listening history for 1M users, and half history for 100K users, and your goal is to predict the missing half.

Date:

The Million Song Dataset Challenge aims at being the best possible offline evaluation of a music recommendation system. Any type of algorithm can be used: collaborative filtering, content-based methods, web crawling, even human oracles! By relying on the Million Song Dataset, the data for the competition is completely open: almost everything is known and possibly available.

What is the task in a few words? You have: 1) the full listening history for 1M users, 2) half of the listening history for 110K users (10K validation set, 100K test set), and you must predict the missing half. How much easier can it get?

The most straightforward approach to this task is pure collaborative filtering, but remember that there is a wealth of information available to you through the Million Song Dataset. Go ahead, explore! If you have questions, we recommend that you consult the MSD Mailing List.

For more information and to participate, visit www.kaggle.com/c/msdchallenge

Related
→ Data Mining Competitions