Publications
From: Ivan Bruha bruha@cas.mcmaster.ca
Date: Mon, 23 Oct 2000 14:03:03 -0400 (EDT)
Subject: Report: KDD-2000 Workshop on Post-Processing
Post-Processing in Machine Learning and Data Mining:
Interpretation, Visualization, Integration, and Related Topics
:::::A WORKSHOP WITHIN::::
KDD-2000:
The Sixth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining
20-23 August 2000, Boston, MA, USA
http://www.acm.org/sigkdd/kdd2000
Report
This workshop was addressing an important aspect related to the Data
Mining (DM) and Machine Learning (ML) in post-processing and analyzing
knowledge bases induced from real-world databases.
Results of a genuine ML algorithm, such as a decision tree or a set of
decision rules, need not be perfect from the view of custom or
commercial applications. It is quite known that a concept description
(knowledge base, model) discovered by an inductive process has to be
usually processed by a post-pruning procedure. Most existing
procedures evaluate the extracted knowledge, visualize it, or merely
document it for the end user. Also, they may interpret the knowledge
and incorporate it into an existing system, and check it for potential
conflicts with previously derived knowledge (models). Post-processing
procedures thus provide a kind of "symbolic filter" for noisy,
imprecise, or "non-user-friendly" knowledge derived by an inductive
algorithm.
Thus, the post-processing tools are complementary to the DM algorithms
and always help the DM algorithms to refine the acquired knowledge.
Usually, these tools exploit techniques that are not genuinely
logical, e.g., statistics, neural nets, and others.
As for the workshop itself, there was one invited talk (A. Famili:
"Post-processing: The real challenge") that provided an overview of
post-processing. It discussed some typical applications of these
techniques to real-world data and explained why we need and where we
use the results of post-processing. Some examples from his past
experience were given, too.
Fourteen research papers were submitted to this workshop. Each paper
was reviewed by three members of the programme committee. After
reviewing, eight of them were selected for publication, i.e. the
acceptance rate was 57%.
In the first regular paper, Baesens and his collegues explain the
motivation for a post-processing phase to the association rule mining
algorithm when plugged into the knowledge discovery in databases
processing. They focus on processing of large sets of association
rules.
Chung and Lui also deal with post-processing of association
rules. They discuss the problem of mining association rules with
multiple minimum support. Their algorithm is aplied in such a way
that low-level rules have enough minimum support while high-level
rules are prevented from combinatorial explosion.
Feng introduces two meta-learning methods of combiner and stacked
generalizer in the inductive algorithm CN4 with six routines for
unknown attribute values processing. This paper thus exhibits a
knowledge combination technique.
The fourth paper (Franek and Bruha) introduces a new strategy that
allows to modify (refine) rule qualities during the classification of
unseen objects. The refiniment is carried out in a feed-back loop so
that it can be viewed as a post-processing procedure.
Ma, Wond, and Liu describe their system which assists users in
interpreting usually extremely large sets of association rules. It
builds a hierarchical structure for easy browsing and then publishes
this hierarchy via multiple web pages.
A. Prieditis introduces VizLearn, a visually-interactive machine
learning system. VizLearn lets a user visialize certain patterns
at-a-glance that would othervise be difficult to grasp using standard
(non-visual) techniques.
Smid et al. exhibit a new model for intelligent tutorial system and
discuss how to ontain data that specify this model, including their
refinement.
Finally, Tan and Kumar present and compare various interestingness
measures for association patterns that are proposed in statistics,
machine learning, and data mining. They also introduce a new metric
and show that it is highly linear with respect to the correlation
coeficient for many interesting assocation patterns.
The discussion within this workshop revealed the following:
- Four papers (i.e. half of accepted ones) were dealing with the
association rules and their post-processing. It indicates that the
above topic is under an immerse research.
- Nevertheless, also the other disciplines of post-processing were
presented. Two papers discussed evaluation of knowledge bases
induced (rule qualities and interestingness measures). One paper
talked about knowledge revision; one about knowledge combination;
one about visualization. Some papers also dealt with knowledge
filtering.
- There is a need in commercial applications for more robust
post-processing methods since not only the databases but also the
knowledge bases (models, rule sets) can reach extremely large sizes.
There were 18 registered participants and, to our knowledge, all
workshop notes were sold.
Organizers
A. (Fazel) Famili (co-chair)
Editor-in-Chief, Intelligent Data Analysis
Institute for Information Technology
National Research Council of Canada
Ottawa, Ont.
Canada K1A 0R6
email: Fazel.Famili@iit.nrc.ca
http://www.iit.nrc.ca
phone: +1-613-9938554
fax: +1-613-9527151
|
|
|
Ivan Bruha (co-chair)
Dept. Computing & Software
McMaster University
Hamilton, Ont
Canada L8S 4L7
email: bruha@mcmaster.ca
http://www.cas.mcmaster.ca/~bruha
phone: +1-905-5259140 ext 23439
fax: +1-905-5240340
|
Programme Committee
Petr Berka, Laboratory of Intelligent Systems, University of Economics, Prague,
Czech Republic
email: berka@vse.cz ,
http://lisp.vse.cz/~berka
Marko Bohanec, Institute Jozef Stephan, Jamova 37, Ljubljana, Slovenia
email: marko.bohanec@ijs.si,
http://www-ai.ijs.si/MarkoBohanec/mare.html
Ivan Bruha (co-chair)
A. (Fazel) Famili (co-chair)
W.F.S. (Skip) Poehlman, McMaster University, Hamilton, Canada
email: skip@church.cas.mcmaster.ca
Contents
Invited Talk
A. Famili: Post-processing: The real challenge
Regular Papers
They can be downloaded as a single
tar.gz file
B. Baesens, S. Viaene, J. Vanthienen: Post-processing of association rules
(file "baesens.ps" as postscript file)
F. Chung, C. Lui: A post-analysis framework for mining generalized association
rules with multiple minimum supports
(file "chung.zip" as zip file)
J.P. Feng: Meta-CN4 for unknown attribute values processing via combiner and
stack generalization
(file "feng.doc" as MSWord doc file, or "feng.ps" as postscript file)
F. Franek, I. Bruha: Post-processing of qualities of decision rules within a
testing phase
(file "franek.ps" as postscript file, or "franek.wpd" as WPerfect doc file)
Y. Ma, C.K. Wong, B. Liu: Effective browsing of the discovered association
rules using the web
(file "ma.ps" as postscript file)
A.E. Prieditis: VizLearn: Visualizing machine learning models and spacial data
(file "prieditis.ps" as postscript file)
J. Smid, P. Svacek, J. Smid: Processing user data for intelligent tutoring
models
(file "smid.doc" as MSWord doc file)
P. Tan, V. Kumar: Interestingness measures for association patterns:
A perspective
(file "tan.ps" as postscript file)
| |
|