KDnuggets : News : 2001 : n17 : item13    (previous | next)


From: Ronny Kohavi
Date: Sat, 11 Aug 2001 03:36:18 -0700
Subject: Availability of association dataset and real-world benchmark

We are announcing the availability of a real-world association dataset based on web views.
The data comes from the same site that was used for the KDD Cup 2000 (except from a longer period).
It is available at the bottom of http://www.ecn.purdue.edu/KDDCUP under the same click-through agreement (basically, use for non-commercial educational or research purposes is allowed).

In addition, we would like to share a benchmark paper comparing multiple association algorithms on this and several other real-world datasets.   The main contributions of the (likely-to-be-controversial) paper are:

  1. First objective evaluation and comparison of association rule algorithms on real datasets.
  2. Performance improvements to a-priori are mostly irrelevant because there is only a very narrow range of support levels where they matter.   Above this range, Apriori finishes fast enough; below this range, no algorithm can generate all associations.
  3. In the narrow range where performance differences are interesting, algorithms that were significantly faster than Apriori in previous work using artificial data did not run must faster on several real-world datasets (including the above donated dataset).  As a community, we may have overfitted our algorithms to the IBM artificial dataset.
  4. The IBM artificial dataset has very different characteristics than the real-world datasets we used.
  5. Authors of association algorithms concentrated on performance but did not always show correctness.  We found differences in the actual results of what is suppose to be an implementation of a sound and complete algorithm.
To remain objective, we did not include our own variant of an association generator.
The paper and slides are available at http://www.ecn.purdue.edu/KDDCUP/ and

   - Zijian, Ronny, Llew

KDnuggets : News : 2001 : n17 : item13    (previous | next)

Copyright © 2001 KDnuggets.   Subscribe to KDnuggets News!