KDnuggets : Newsletter : 1999 Issues : 99:03 Contents :

KDnuggets 99:03, item 4, Publications:

Previous | Contents |  Next

Date: 16 Jan 1999 09:56:58 
From: Hendrik Blockeel Hendrik.Blockeel@cs.kuleuven.ac.be
Subject: Two data mining PhD theses, K. U. Leuven, Belgium

FREQUENT PATTERN DISCOVERY IN FIRST-ORDER LOGIC, 
Ph.D. Dissertation of Luc Dehaspe
http://www.cs.kuleuven.ac.be/~ldh/publicaties/dehaspe98:phd.ps.gz

TOP-DOWN INDUCTION OF FIRST ORDER LOGICAL DECISION TREES (Hendrik Blockeel)
http://www.cs.kuleuven.ac.be/~ml/PS/blockeel98:phd.ps.gz

FREQUENT PATTERN DISCOVERY IN FIRST-ORDER LOGIC
-----------------------------------------------
Ph.D. Dissertation of Luc Dehaspe
Department of Computer Science
K.U.Leuven
http://www.cs.kuleuven.ac.be/~ldh/publicaties/dehaspe98:phd.ps.gz

We present a general formulation of the frequent pattern discovery
problem, where both the database and the patterns are represented in
some subset of first-order logic.

We discuss a unified representation in first-order logic that gives
insight to the blurred picture of the frequent pattern discovery
domain. Within the first-order logic formulation a number of
dimensions appear that relink diverged settings.  We present
algorithms for frequent pattern discovery in first-order logic that
are well-suited for exploratory data mining: they offer the
flexibility required to experiment with standard and --in particular--
novel settings not supported by special purpose algorithms.

We show how frequent patterns in first-order logic can be used as
building blocks for statistical predictive modeling, and demonstrate
the scientific and commercial potential of frequent pattern discovery
in first-order logic via an application in chemical toxicology, where
the task is to identify cancer-causing chemical substances.

TOP-DOWN INDUCTION OF FIRST ORDER LOGICAL DECISION TREES
--------------------------------------------------------
Ph.D. Dissertation of Hendrik Blockeel
Department of Computer Science
K.U.Leuven

Construction of decision trees is a very popular induction method.
Until now, however, this method has been used mainly within the framework
of attribute-value learning (AVL).  This framework imposes a number of
constraints on the representation of data and hypotheses.  A more powerful
formalism, inductive logic programming (ILP) does not impose such 
constraints and is therefore more generally applicable.  Unfortunately,
ILP is less mature than AVL and does not contain as many sophisticated and
specialized techniques as AVL.

In this work we upgrade induction of decision trees and some related
sophisticated techniques from AVL to ILP.  To this aim we first define a
relatively general form of induction that we call predictive clustering.
We demonstrate that a number of important tasks (classification, regression)
are special cases of predictive clustering.  Next, we discuss an algorithm
for induction of decision trees that generalizes over many existing 
algorithms for induction of classification or regression trees.

In a second part we define decision trees in the context of first order logic,
which is the representation formalism used in ILP, and study their
properties.  Once these first order logical decision trees are defined and
understood, it becomes possible to apply the proposed technique for
predictive clustering within ILP.  We present an implementation of the
algorithm and evaluate it empirically.  It turns out that the resulting
program is competitive with state-of-the-art ILP systems; however
it is more generally applicable and often faster.

Previous | Contents |  Next


KDnuggets : Newsletter : 1999 Issues : 99:03 Contents :

Copyright © 1999 KDnuggets