KDnuggets News 00:21, item 17, Publications

KDnuggets : News : 2000 : n21 : item17 (previous | next)

Publications

From: Ivan Bruha bruha@cas.mcmaster.ca
Date: Mon, 23 Oct 2000 14:03:03 -0400 (EDT)
Subject: Report: KDD-2000 Workshop on Post-Processing

Post-Processing in Machine Learning and Data Mining:
Interpretation, Visualization, Integration, and Related Topics

:::::A WORKSHOP WITHIN::::

KDD-2000:
The Sixth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining

20-23 August 2000, Boston, MA, USA

http://www.acm.org/sigkdd/kdd2000

Report

This workshop was addressing an important aspect related to the Data Mining (DM) and Machine Learning (ML) in post-processing and analyzing knowledge bases induced from real-world databases.

Results of a genuine ML algorithm, such as a decision tree or a set of decision rules, need not be perfect from the view of custom or commercial applications. It is quite known that a concept description (knowledge base, model) discovered by an inductive process has to be usually processed by a post-pruning procedure. Most existing procedures evaluate the extracted knowledge, visualize it, or merely document it for the end user. Also, they may interpret the knowledge and incorporate it into an existing system, and check it for potential conflicts with previously derived knowledge (models). Post-processing procedures thus provide a kind of "symbolic filter" for noisy, imprecise, or "non-user-friendly" knowledge derived by an inductive algorithm.

Thus, the post-processing tools are complementary to the DM algorithms and always help the DM algorithms to refine the acquired knowledge. Usually, these tools exploit techniques that are not genuinely logical, e.g., statistics, neural nets, and others.

As for the workshop itself, there was one invited talk (A. Famili: "Post-processing: The real challenge") that provided an overview of post-processing. It discussed some typical applications of these techniques to real-world data and explained why we need and where we use the results of post-processing. Some examples from his past experience were given, too.

Fourteen research papers were submitted to this workshop. Each paper was reviewed by three members of the programme committee. After reviewing, eight of them were selected for publication, i.e. the acceptance rate was 57%.

In the first regular paper, Baesens and his collegues explain the motivation for a post-processing phase to the association rule mining algorithm when plugged into the knowledge discovery in databases processing. They focus on processing of large sets of association rules.

Chung and Lui also deal with post-processing of association rules. They discuss the problem of mining association rules with multiple minimum support. Their algorithm is aplied in such a way that low-level rules have enough minimum support while high-level rules are prevented from combinatorial explosion.

Feng introduces two meta-learning methods of combiner and stacked generalizer in the inductive algorithm CN4 with six routines for unknown attribute values processing. This paper thus exhibits a knowledge combination technique.

The fourth paper (Franek and Bruha) introduces a new strategy that allows to modify (refine) rule qualities during the classification of unseen objects. The refiniment is carried out in a feed-back loop so that it can be viewed as a post-processing procedure.

Ma, Wond, and Liu describe their system which assists users in interpreting usually extremely large sets of association rules. It builds a hierarchical structure for easy browsing and then publishes this hierarchy via multiple web pages.

A. Prieditis introduces VizLearn, a visually-interactive machine learning system. VizLearn lets a user visialize certain patterns at-a-glance that would othervise be difficult to grasp using standard (non-visual) techniques.

Smid et al. exhibit a new model for intelligent tutorial system and discuss how to ontain data that specify this model, including their refinement.

Finally, Tan and Kumar present and compare various interestingness measures for association patterns that are proposed in statistics, machine learning, and data mining. They also introduce a new metric and show that it is highly linear with respect to the correlation coeficient for many interesting assocation patterns.

The discussion within this workshop revealed the following:

Four papers (i.e. half of accepted ones) were dealing with the association rules and their post-processing. It indicates that the above topic is under an immerse research.
Nevertheless, also the other disciplines of post-processing were presented. Two papers discussed evaluation of knowledge bases induced (rule qualities and interestingness measures). One paper talked about knowledge revision; one about knowledge combination; one about visualization. Some papers also dealt with knowledge filtering.
There is a need in commercial applications for more robust post-processing methods since not only the databases but also the knowledge bases (models, rule sets) can reach extremely large sizes.

There were 18 registered participants and, to our knowledge, all workshop notes were sold.

Organizers

A. (Fazel) Famili (co-chair)
Editor-in-Chief, Intelligent Data Analysis
Institute for Information Technology
National Research Council of Canada
Ottawa, Ont.
Canada K1A 0R6
email: Fazel.Famili@iit.nrc.ca
http://www.iit.nrc.ca
phone: +1-613-9938554
fax: +1-613-9527151

Ivan Bruha (co-chair)
Dept. Computing & Software
McMaster University
Hamilton, Ont
Canada L8S 4L7

email: bruha@mcmaster.ca
http://www.cas.mcmaster.ca/~bruha
phone: +1-905-5259140 ext 23439
fax: +1-905-5240340

Programme Committee

Petr Berka, Laboratory of Intelligent Systems, University of Economics, Prague, Czech Republic

email: berka@vse.cz , http://lisp.vse.cz/~berka
Marko Bohanec, Institute Jozef Stephan, Jamova 37, Ljubljana, Slovenia

email: marko.bohanec@ijs.si,
http://www-ai.ijs.si/MarkoBohanec/mare.html
Ivan Bruha (co-chair)
A. (Fazel) Famili (co-chair)
W.F.S. (Skip) Poehlman, McMaster University, Hamilton, Canada

email: skip@church.cas.mcmaster.ca

Contents

Invited Talk

A. Famili: Post-processing: The real challenge

Regular Papers
They can be downloaded as a single tar.gz file

B. Baesens, S. Viaene, J. Vanthienen: Post-processing of association rules
(file "baesens.ps" as postscript file)

F. Chung, C. Lui: A post-analysis framework for mining generalized association rules with multiple minimum supports
(file "chung.zip" as zip file)

J.P. Feng: Meta-CN4 for unknown attribute values processing via combiner and stack generalization
(file "feng.doc" as MSWord doc file, or "feng.ps" as postscript file)

F. Franek, I. Bruha: Post-processing of qualities of decision rules within a testing phase
(file "franek.ps" as postscript file, or "franek.wpd" as WPerfect doc file)

Y. Ma, C.K. Wong, B. Liu: Effective browsing of the discovered association rules using the web
(file "ma.ps" as postscript file)

A.E. Prieditis: VizLearn: Visualizing machine learning models and spacial data
(file "prieditis.ps" as postscript file)

J. Smid, P. Svacek, J. Smid: Processing user data for intelligent tutoring models
(file "smid.doc" as MSWord doc file)

P. Tan, V. Kumar: Interestingness measures for association patterns: A perspective
(file "tan.ps" as postscript file)

KDnuggets : News : 2000 : n21 : item17 (previous | next)

Publications

Post-Processing in Machine Learning and Data Mining: Interpretation, Visualization, Integration, and Related Topics

KDD-2000: The Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Post-Processing in Machine Learning and Data Mining:
Interpretation, Visualization, Integration, and Related Topics

KDD-2000:
The Sixth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining