KDnuggets Home » News » 2010 » Feb » Kaggle: new platform data competitions  (  )

Kaggle: a platform for data-related competitions


 
  
Kaggle provides a competition platform for data-related contests, allowing companies, researchers, government and other organizations to expose their data to a wide range of analysts and techniques.


KAGGLEKaggle is a new project that provides a competition platform for data-related contests. The platform allows companies, researchers, government and other organizations to expose their data to a wide range of analysts and techniques. Kaggle offers data professionals and researchers the opportunity to test their skills, try their techniques on interesting datasets and enhance their reputations.

Data-related competitions spur innovation

In many cases competitions generate more innovation than an in-house research and development effort. In 2006, Netflix, an American DVD rental service, offered $1m to the analyst (or team) who could improve their recommendations algorithm by 10 per cent (with Netflix, renters borrow DVDs and rate them; they are offered recommendations based on their past ratings and the past ratings of other renters with similar tastes). $1m dollars seems like a huge prize but according to Netflix CEO Reed Hastings, an improvement of 10 per cent was worth "well in excess of $1m".

And it doesn't take a $1m prize to attract participants. Last year, the French telecommunications company Orange held a competition in conjunction with the ACM SIGKDD Conference, offering $10,000 worth of prizes to contestants who could be predict which Orange customers were likely to switch providers, upgrade their plans or buy other Orange products. The contest attracted over 8,000 entries.

Competitions are a great way to find solutions to intractable problems or to improve models that can't be improved. It is rarely the case the any one organization has the best person to solve a given problem; releasing a modeling task to the world gives the organization the opportunity to tap a much wider talent pool, ensuring they have access to the best possible solutions.

Data-related competitions can also unearth hidden gems. The most proficient analysts aren't always those with a slick haircut and a silver tongue (who are favored by the job interview process). Since data-related competitions are judged using objective criteria, they are a great way of gaining access to hidden talent.

Finally data-related competitions are a really useful interface between academia and industry. Organizations can post their problems and have researchers apply cutting edge methods in an attempt to find the best solution. Competitions are also a great way for researchers to try their techniques on real world problems.

Kaggle makes hosting data-related competitions easy

Kaggle is launching its first competition in early March and is looking to host others.

The platform lowers the barriers to hosting data-related competitions by allowing companies, researchers and other organizations to host contests without having to build their own infrastructure and worry about maintenance headaches such as down time, patching software, slow download speeds and the security and privacy of competitors' information. Competition organizers can simply supply the details of their competition (including any relevant data and evaluation algorithm), preview the competition before launch and then have data professionals and researchers lodge solutions. Best of all, it costs nothing to host a competition on Kaggle (even bandwidth is covered).

The platform is easy to use. Competition organizers can simply paste pre-formatted content onto their pages using a what-you-see-is-what-you-get (WYSIWYG) editor. They can use the pages set aside for displaying a competition's rules, background, submission instructions, help, evaluation methodology, prizes. Kaggle also supports two additional fully customizable pages and permits competitions to have their own human-friendly URL (eg kaggle.com/mycompetitionURL).

When entering a competition, Kaggle allows contestants to form teams and lodge submissions. Submissions can be validated to ensure that they have been formatted correctly. For numerical submissions, Kaggle can ensure that submissions have the correct number of rows and that all rows have allowable values. The platform can evaluate submissions instantaneously and add a team directly to a leaderboard (Kaggle currently offers root-mean-squared-error and area-under-the-ROC-curve evaluation methods - its modular design allows competition-specific evaluation algorithms to be added).

In many cases, data-related competitions involve large data files. To ensure blistering download speeds regardless of how many users hit the site simultaneously, all data files are hosted on sophisticated cloud-based technology.

To test drive the Kaggle platform, visit demo.kaggle.com.

If you would like more information about the project, or if you're interested in hosting a competition on Kaggle, email anthony.goldbloom@kaggle.com .


KDnuggets Home » News » 2010 » Feb » Kaggle: new platform data competitions  (  )