In the spirit of the 2010 KDD Cup, Grockit has just launched a new educational data mining competition on Kaggle. Students are presented with a series of problems to solve, and each response is either correct or incorrect. Given a sequence of item responses from a student, the task is to predict the accuracy of the student's response to a specified next question. The competition is being hosted on Kaggle at
We're very excited to be reaching out to the data mining community to try to improve outcomes for education, and we hope some of you will be interested in taking part. Cash prizes will also be awarded to the top three finishers. This a data set from students studying for the GMAT, SAT, and ACT by answering practice questions at grockit.com -- we're hoping to find better ways of understanding students' abilities and the areas of knowledge that can be tested using these (mainly multiple-choice) questions. Using that, we hope to make our assessment better and help students really improve their learning.
Almost 5,000,000 item responses (with 17 data fields each) are included in the training set provided, covering responses from >175,000 students working through 6,000+ questions. Details on the dataset, the cash prizes, and the competition are all available at:
You are, of course, more than welcome to publish work based on your techniques and results. Thanks for reading, and we hope you check it out!
Ari Bader-Natal, Grockit