KDnuggets : News : 2005 : n11 : item3 < PREVIOUS | NEXT >

Features

From: a concerned data-mining researcher
Date: 1 Jun 2005
Subject: Concerns on KDD Cup 2005

I have some serious concerns on KDD Cup 2005. The task of this year's KDD Cup is to predict a two-level categorization from keywords of web queries. This is a very challenging task, and creative solutions would clearly have huge commercial values. What is disturbing is that the KDD Cup organizer requests contestants to submit their detailed algorithm as part of the requirements, and establishes a very subjective "Creativity Award". Do the contestants own IP of their invention? How can the contestants know that their algorithms are not used by commercial companies?

I suggest that the KDD Cup organizer withdraws the request for the submission of the detailed algorithm, and as previous KDD Cup, only invites winners to publish their results in a suitable venue if they wish.

My second, minor concern is that the "training data" is so small (only 26), compared to the number of classes (74 two-level categories), that this problem has little scientific value. Furthermore, the training and the hidden test sets are manually labelled, and not replicable. Data mining researchers would have little means to use the labelled data as training, nor as validation for their proposed methods.

A Concerned Data-mining Researcher

(Note: I have asked KDD Cup Organizers to respond to this concern. Editor).


KDnuggets : News : 2005 : n11 : item3 < PREVIOUS | NEXT >

Copyright © 2005 KDnuggets.   Subscribe to KDnuggets News!