Date: 16 Jan 1999 09:56:58 From: Hendrik Blockeel Hendrik.Blockeel@cs.kuleuven.ac.be Subject: Two data mining PhD theses, K. U. Leuven, Belgium FREQUENT PATTERN DISCOVERY IN FIRST-ORDER LOGIC, Ph.D. Dissertation of Luc Dehaspe http://www.cs.kuleuven.ac.be/~ldh/publicaties/dehaspe98:phd.ps.gz TOP-DOWN INDUCTION OF FIRST ORDER LOGICAL DECISION TREES (Hendrik Blockeel) http://www.cs.kuleuven.ac.be/~ml/PS/blockeel98:phd.ps.gz FREQUENT PATTERN DISCOVERY IN FIRST-ORDER LOGIC ----------------------------------------------- Ph.D. Dissertation of Luc Dehaspe Department of Computer Science K.U.Leuven http://www.cs.kuleuven.ac.be/~ldh/publicaties/dehaspe98:phd.ps.gz We present a general formulation of the frequent pattern discovery problem, where both the database and the patterns are represented in some subset of first-order logic. We discuss a unified representation in first-order logic that gives insight to the blurred picture of the frequent pattern discovery domain. Within the first-order logic formulation a number of dimensions appear that relink diverged settings. We present algorithms for frequent pattern discovery in first-order logic that are well-suited for exploratory data mining: they offer the flexibility required to experiment with standard and --in particular-- novel settings not supported by special purpose algorithms. We show how frequent patterns in first-order logic can be used as building blocks for statistical predictive modeling, and demonstrate the scientific and commercial potential of frequent pattern discovery in first-order logic via an application in chemical toxicology, where the task is to identify cancer-causing chemical substances. TOP-DOWN INDUCTION OF FIRST ORDER LOGICAL DECISION TREES -------------------------------------------------------- Ph.D. Dissertation of Hendrik Blockeel Department of Computer Science K.U.Leuven Construction of decision trees is a very popular induction method. Until now, however, this method has been used mainly within the framework of attribute-value learning (AVL). This framework imposes a number of constraints on the representation of data and hypotheses. A more powerful formalism, inductive logic programming (ILP) does not impose such constraints and is therefore more generally applicable. Unfortunately, ILP is less mature than AVL and does not contain as many sophisticated and specialized techniques as AVL. In this work we upgrade induction of decision trees and some related sophisticated techniques from AVL to ILP. To this aim we first define a relatively general form of induction that we call predictive clustering. We demonstrate that a number of important tasks (classification, regression) are special cases of predictive clustering. Next, we discuss an algorithm for induction of decision trees that generalizes over many existing algorithms for induction of classification or regression trees. In a second part we define decision trees in the context of first order logic, which is the representation formalism used in ILP, and study their properties. Once these first order logical decision trees are defined and understood, it becomes possible to apply the proposed technique for predictive clustering within ILP. We present an implementation of the algorithm and evaluate it empirically. It turns out that the resulting program is competitive with state-of-the-art ILP systems; however it is more generally applicable and often faster.
Copyright © 1999 KDnuggets