KDD Nugget 94:9, e-mailed 94-05-13
Contents: 
	* G. Piatetsky-Shapiro,
	   ISR: Microsoft success using neural network for direct marketing
	* B. Wuthrich, A draft of a manuscript on Knowledge Discovery 
	* Y. Kodratoff, The comprehensibility manifesto and 
		CFP for ECML'95 workshop Industrial applications of ML 
		and comprehensibility

The KDD Nuggets is a moderated list for the exchange of information 
relevant to Knowledge Discovery in Databases (KDD), e.g.
application descriptions, conference announcements, tool reviews, 
information requests, interesting ideas, clever opinions, etc.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ Back issues, FAQ, and other KDD-related information are now available 
+ via Mosaic, URL http://info.gte.com/~kdd/  or 
+ by anonymous ftp to ftp.gte.com, cd /pub/kdd, get README
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

If you have something relevant to KDD, send it to kdd@gte.com ; 
Add/delete requests to kdd-request@gte.com  
 -- Gregory Piatetsky-Shapiro (moderator)


********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) * 
* and not necessarily of their respective employers (or GTE Laboratories)   *
*****************************************************************************

~~~~~~~~~~~~ Quote of the Week ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Date: Fri, 6 May 94
From: gps@gte.com (Gregory Piatetsky-Shapiro)
Subject: ISR: Microsoft success using neural network for direct marketing

March 1994 issue of Intelligent Systems Report has an interesting
article entitled "Microsoft targets direct mail recipients with neural
network".  Microsoft sends over 40 million pieces of direct mail to
more than 8 million registered users, usually in attempt to get the
users to upgrade to a new version. Although the first mailing is sent
to everyone, the key is to send a second, more appealing, mailing only
to those most likely to respond. Prior to using a neural network, an
average mailing would get a response rate of only 4.9%, but using a
neural net the response rate increased to 8.2%, according to 
Microsoft's Jim Minervino. 
	The application was developed using Brainmaker, a neural
network tool. 

---------------------------------------
Date: Sat, 7 May 94 11:00:55 HKT
From: beat@cs.ust.hk (DR. BEAT WUTHRICH)
Subject: a preliminary TR on KDD available

Right now I am teaching a postgraduate course on
"Knowledge Discovery in Databases" at the Hong Kong
University of Science and Technology. 

--------------------
Abstract

This is a draft of a manuscript of a postgraduate
course taught at the Hong Kong University of Science and Technology
in Spring 94. The course gives an introduction into the young and
fascinating field of knowledge discovery in databases.
The manuscript is suited for beginners who can leave out the more
advanced sections, as well as people who would like to do research
in this area. This manuscript is partly incomplete.


Table of Contents

1. Introduction
	1.1 Course Outline
	1.2 Basic Notions
	1.3 A Case Study
	1.4 Outlook

2. Rule Languages
	2.1 Propositional Rules and Decision Trees
	2.2 Datalog
	2.3 FQL*

3. Uncertainty
	3.1 Foundations of Probability Theory
	3.2 Other Approaches to Uncertainty
	3.3 Probabilistic Datalog
	3.4 Probabilistic FQL*

4. Time
	4.1 Foundations
	4.2 Temporal Datalog
	4.3 Temporal FQL*
	4.4 Probabilistic Temporal FQL*

5. Learning Propostional Rules and Decision Trees
	5.1 Generating Decision Trees
	5.2 Choosing a Tes
	5.3 Generating Probabilistic Decision Trees
	5.4 Further Issues

6. Learning Datalog Rules
	6.1 Generating Datalog Rules
	6.2 Choosing a Specialization
	6.3 Further Issues

7. Learning Probabilistic Knowledge 
	(basically references to papers)

----------
To get it: 

  1) `ftp ftp.cs.ust.hk` 
  2) login as:  anonymous
  2) cd pub/techreport/postscript
  3) get tr94-2.ps.gz 
     or get tr94-2A.ps.gz
        get tr94-2B.ps.gz
note: 
- tr94-2.ps.gz is the full tech rep, 84 pages long.
- tr94-2A.ps.gz and tr94-2B.ps.gz is the first and second part repectively
  of the same tech rep, but both uncoded take less than 1 MB.

(note: to decode, use gunzip tr94-2.ps.gz   If that does not work, 
rename  tr94-2.ps.gz to  tr94-2.ps.z and try gunzip again.  The report
takes 247503 bytes compressed, 1211745 bytes uncompressed and 84 pages
if printed. GPS)

Unfortunately there are a couple of typos in the current
version of this party incomplete manuscript. I apologize
for that and I am working on enhancements and improvements.

Dr. Beat Wuethrich		
The Hong Kong University of Science and Technology
CS Dept (room 3512)
Clear Water Bay
Kowloon, Hong Kong

------------------------------------------
Date: Mon, 9 May 94 15:53:45 +0200
From: Yves.Kodratoff@lri.fr
Subject: The comprehensibility manifesto 

Hi! here is a version I would be delighted to see
announced in the kdd nuggets!
Cheers   Yves
The comprehensibility manifesto

Yves Kodratoff
(yk@lri.lri.fr)


and (unconventional) submission for an 

ECML'95 workshop on 

Industrial applications of ML and comprehensibility


The importance of explanations and of comprehensibility of the results 
provided by an expert system or a machine learning (ML) algorithm is by 
no means a new idea. To my knowledge, it has been around since the 80's 
(see details below), but I am almost sure that others realized its importance 
before. This old idea did not attract much attention from a scientific 
community more interested in measuring the complexity of the algorithms 
and the accuracy of their results than the comprehensibility of the software 
and of the results. This attitude can be explained by the fact that we have 
no precise definition of what an explanation really is, that we have no way 
of measuring or even analyzing what a "good" explanation is: 
Comprehensibility is a badly defined concept, presently non measurable.

This state of facts seems to me unbearable now in view of the analysis of 
the industrial applications done of the various ML approaches. Each time 
one of our favorite ML approaches has been applied in industry, each time 
the comprehensibility of the results, though ill-defined, has been a decisive 
factor of choice over an approach by pure statistical means, or by neural 
networks. To confirm this opinion, think over that, very recently, answering 
questions about the difference between ML and more application 
oriented data mining, G. Piatetski-Shapiro claimed that "Knowledge 
discovery in Data Bases (KDD) is concerned with finding *understandable* 
knowledge, while ML is concerned with improving performance of an 
agent." Rather than discussing what properly belongs to ML or not, let us 
rather ask the KDD community to join us.

This manifesto induces from these examples (and here is its weakness) that 
a large number of industrial application of ML will demand good 
explanations, as long as the domain is understood by the experts. We are well 
aware that mechanizing one stage of a complex process may not require 
comprehensibility, but we claim that the whole process, as soon as the 
decisions it helps to make are important, will request a high level of 
comprehensibility, for the experts to validate the system, for its 
maintenance, and for its evolution in view of changes in the external 
world. Now, let us deduce the consequences of our claim.

The problem we are left with is that we do not understand 
comprehensibility! This is why I propose to stop fleeing before the 
problem and define comprehensibility as an acknowledged research topic. 
It is a hard problem, sure enough, but are we supposed to tackle with the 
easy ones only? Now that it seems to be well identified as an industrial 
problem, we so to say cannot "afford" to go on shunning it.

What kind of forces do we need to join in order to hope finding a solution? 
We obviously need the MLists who developed the symbolic/numeric 
systems able to generate understandable knowledge. Notice that we cannot 
work in isolation from the users, the industrialists, who know empirically 
what a good explanation is, or rather what a bad one is, and who are the 
only ones able to attribute scores to the results of our future experiments. 
Just as an example, the explanatory value of special "nuggets" has been 
introduced in the ML community by P. Riddle because of her study of 
manufacturing at Boeing, not to ease a tension internal to the ML field. The 
KDD community, cited above, is obviously concerned. We need also 
specialists in knowledge acquisition (KA), the research topic of which is 
how to help a user to make his/her knowledge understandable to the 
machine. They are thus used to work on the inverse problem to ours, and 
their experience in the topic will be invaluable. Specialists in explanations 
for the expert systems (ES) have already provided definitions and taxo-
nomies of explanations, they are the pioneers of the field: There exists now 
a large body of workers that follow and deepen the ideas that led Clancey 
to NEOMYCIN. Our problem would be more specifically to define a 
measure of comprehensibility on the explanations generated by their systems. 
Psychologists and more particularly pedagogists should be also part of this 
game since they are used to analyze what a student understands really out 
of a set of explanations, that is, what are for a human the internally genera-
ted explanations. Another type of interesting knowledge should come from 
the specialists in the social sciences who could help us to define the social 
contexts in which comprehensibility can take place. Finally, it is obvious 
that statistics do not demand obscurity, and some efforts are done to ease 
up the interpretations of the results. Those statisticians interested in these 
efforts would be most welcome.

All this looks pretty well as a new theme for Cognitive Science, and we 
must acknowledge that AI in general is deeply embedded in Cognitive 
Science. Nevertheless, the ambiguous status of AI is again very typical 
here since there are yet many problems, all relative to comprehensibility, 
that are to be solved in the frame of Computer Science. 

Let us start by underlining a few important problems related to Cognitive 
Science. Since comprehension is perhaps the most context-dependent of all 
human activities we cannot avoid holding positions in the symbolic/situated 
cognition debate.
Can we define situated comprehensibility? Are we able to start an ontology 
of the different contexts in which comprehension is possible?
What is the exact status of comprehensibility in a situated cognition? Do we 
believe that the situated character of comprehension precludes communication, 
that we must thus confuse lack of comprehensibility and situated? My 
personal answer is no, but it is clearly an important debate, illustrated by 
the industrial applications that rejected neural networks on the ground of 
their lack of comprehensibility.
Do we follow Clancey in thinking that symbolic representation are simple 
shadows of what we must explain? In our (symbolic) implementations 
how can we evaluate the loss due to symbolization, and how can we 
translate it to make it understandable to the human expert? How could it be 
possible that the situated knowledge representations generated during 
problem solving combine efficiency and comprehensibility?

I would also like to insist on four issues related to Computer Science be-
cause they are sometimes hidden by other concerns.

The first one, to the best of my knowledge, to work on these topics has 
been R. S. Michalski who stated a "comprehensibility postulate" in his 
famous paper on the star methodology. This work requests two remarks. 
First remark is that the star algorithm can be well perceived as a statistical 
classification method in which comprehensibility as been introduced as a 
constraint on the description obtained. This shows that Michalski can be 
credited of being the first scientist to create a program in which efficiency 
and comprehensibility have been synthesized in the same algorithm. This 
effort, which I think very important, opposes several subsequent attempts 
to disconnect efficiency and comprehensibility in different and even 
possibly unrelated modules. At any rate, this choice should be discussed 
and explained. The second remark is that when Michalski gives an 
overview of ML a few pages before, co-authoring with others, he 
describes ML surprisingly enough without the smallest hint to his own 
concept of comprehensibility. This shows how still shocking is for some 
people the idea of a work taking into account ill-defined comprehensibility.

The first to work on industrial applications of ML, D. Michie, often
stated in front of our community, for instance in his address at
EWSL'87 in Bled, that one of the main features of ID3-like algorithms,
as opposed to so many statistical systems that use also information
compression, is their ability to generate easy to understand decision
trees. I remember also that at this meeting, I. Bratko argued that,
depending on the experts, decision trees might be more understandable
than the rules one extracts from them.  All these are early examples
of the realization that comprehensibility is an essential factor to an
ML algorithm. As stated above, this has been confirmed many times by
subsequent industrial applications. From the research point of view,
it underlines that comprehensibility-decreasing changes in the
representation should be carefully considered before acceptance. A
thorough discussion of the importance of learning hyperrectangles,
obviously leading to understandable results is needed, together with a
look at the possible ways to make understandable other approaches
using other shapes to cover the examples. People that used diagonals
or ellipses have always justified their approach by an increase in
accuracy. It is not sure at all that they always kill comprehension,
it is probable that a representational change is needed in such a way
that it will lead to even better further understanding. More
generally, all people concerned by changes in representation or
invention of new predicates, as for instance people working on
constructive induction, should be also interested by our proposal.

Another topic of interest should be "knowledge architectured" neural
networks (NN)  la Shavlik who has shown very neatly that introducing
knowledge to built the network, and to compute its initial weights and
activation thresholds, not only increases accuracy, but also helps
subsequent interpretation of the learned NN by rules containing n-of-m
conditions. Even more easily, genetic algorithms (GA) can be tuned in
such a way that the strings of bits that are learned are easily
translated back into meaningful information.

A last example, yet not acknowledged as linked to comprehensibility, I 
would like to cite is the effort for avoid absurd classifications that 
recognize an irrelevant item as belonging to a class. Such is, for example, 
W. Emde's reaction to the old car recognized has being sick of the measles 
by a knowledge-based system. Even if the system is supposedly equipped 
of the best explanatory mechanisms, it would have hard times explaining 
this result in any convincing way. This example shows well that they are 
other measurable quantities than accuracy, here the amount of falsely 
recognized items in the test set, that measure some amount of 
comprehensibility. From Emde's preliminary experiments it is clear that 
decreasing the amount of false recognition may also decrease dramatically 
accuracy. What are we to choose, accuracy or no false recognition? Is it 
possible to preserve accuracy in some ways? What is the best architecture 
that would allow us to get alternatively accurate, or (exclusive and not 
exclusive or) non absurd results? Similarly, let us cite G. Nakhaeizadeh 
results at Daimler-Benz. I know that he was not at all inspired by 
comprehensibility of results, but by immediate industrial concerns. Yet, he 
and his group devised a cost-driven ID3 which avoids performing false 
recognition that would be very expensive. As Emde, he acknowledges that 
in some cases he obtains huge decreases in accuracy when optimizing low 
cost.

As you can see, the community is not really empty-handed when facing the 
problem of understandable learning, and I am convinced that we shall be 
able to find very soon objective definitions of what comprehensibility is, 
with the help of the users to judge our results, with the combined forces of 
ML, KDD, KA, explanatory ES, knowledge intensive NN and GA, 
pedagogy, sociology, and of statisticians eager to communicate better with 
their users.

This is why I invite all interested parties to join a workshop on 
"Industrial applications of ML and comprehensibility", I plan to organize 
next to ECML'95 in Heraklion. Would not it be a beautiful symbol that a 
new science comes to existence so near to Knossos where the Labyrinth 
has been built a few years ago? Before sending papers, send me your view 
of the problem of comprehensibility or your industrial experience, and how 
you could contribute to the workshop, even if you cannot join physically 
(we have to set up a programme committee, define topics, evaluation 
criteria, etc.).

The topic of the workshop should be essentially an in-depth discussion of 
new industrial applications from the point of view of comprehensibility, 
and the experimental settings by which we could start measuring the value 
of an explanation, and the comprehensibility of a string of symbols. This 
includes all kinds of discussions relative to the definition of what an 
explanation is, and how to evaluate the comprehensibility of an 
explanation. Optimists can even start thinking on which kind of theories 
we should use to take comprehensibility into account: the hunt for 
"probably approximately comprehensible" learning is open!