KDD Nugget 94:9, e-mailed 94-05-13 Contents: * G. Piatetsky-Shapiro, ISR: Microsoft success using neural network for direct marketing * B. Wuthrich, A draft of a manuscript on Knowledge Discovery * Y. Kodratoff, The comprehensibility manifesto and CFP for ECML'95 workshop Industrial applications of ML and comprehensibility The KDD Nuggets is a moderated list for the exchange of information relevant to Knowledge Discovery in Databases (KDD), e.g. application descriptions, conference announcements, tool reviews, information requests, interesting ideas, clever opinions, etc. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + Back issues, FAQ, and other KDD-related information are now available + via Mosaic, URL http://info.gte.com/~kdd/ or + by anonymous ftp to ftp.gte.com, cd /pub/kdd, get README +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ If you have something relevant to KDD, send it to kdd@gte.com ; Add/delete requests to kdd-request@gte.com -- Gregory Piatetsky-Shapiro (moderator) ********************* Official disclaimer *********************************** * All opinions expressed herein are those of the writers (or the moderator) * * and not necessarily of their respective employers (or GTE Laboratories) * ***************************************************************************** ~~~~~~~~~~~~ Quote of the Week ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Fri, 6 May 94 From: gps@gte.com (Gregory Piatetsky-Shapiro) Subject: ISR: Microsoft success using neural network for direct marketing March 1994 issue of Intelligent Systems Report has an interesting article entitled "Microsoft targets direct mail recipients with neural network". Microsoft sends over 40 million pieces of direct mail to more than 8 million registered users, usually in attempt to get the users to upgrade to a new version. Although the first mailing is sent to everyone, the key is to send a second, more appealing, mailing only to those most likely to respond. Prior to using a neural network, an average mailing would get a response rate of only 4.9%, but using a neural net the response rate increased to 8.2%, according to Microsoft's Jim Minervino. The application was developed using Brainmaker, a neural network tool. --------------------------------------- Date: Sat, 7 May 94 11:00:55 HKT From: beat@cs.ust.hk (DR. BEAT WUTHRICH) Subject: a preliminary TR on KDD available Right now I am teaching a postgraduate course on "Knowledge Discovery in Databases" at the Hong Kong University of Science and Technology. -------------------- Abstract This is a draft of a manuscript of a postgraduate course taught at the Hong Kong University of Science and Technology in Spring 94. The course gives an introduction into the young and fascinating field of knowledge discovery in databases. The manuscript is suited for beginners who can leave out the more advanced sections, as well as people who would like to do research in this area. This manuscript is partly incomplete. Table of Contents 1. Introduction 1.1 Course Outline 1.2 Basic Notions 1.3 A Case Study 1.4 Outlook 2. Rule Languages 2.1 Propositional Rules and Decision Trees 2.2 Datalog 2.3 FQL* 3. Uncertainty 3.1 Foundations of Probability Theory 3.2 Other Approaches to Uncertainty 3.3 Probabilistic Datalog 3.4 Probabilistic FQL* 4. Time 4.1 Foundations 4.2 Temporal Datalog 4.3 Temporal FQL* 4.4 Probabilistic Temporal FQL* 5. Learning Propostional Rules and Decision Trees 5.1 Generating Decision Trees 5.2 Choosing a Tes 5.3 Generating Probabilistic Decision Trees 5.4 Further Issues 6. Learning Datalog Rules 6.1 Generating Datalog Rules 6.2 Choosing a Specialization 6.3 Further Issues 7. Learning Probabilistic Knowledge (basically references to papers) ---------- To get it: 1) `ftp ftp.cs.ust.hk` 2) login as: anonymous 2) cd pub/techreport/postscript 3) get tr94-2.ps.gz or get tr94-2A.ps.gz get tr94-2B.ps.gz note: - tr94-2.ps.gz is the full tech rep, 84 pages long. - tr94-2A.ps.gz and tr94-2B.ps.gz is the first and second part repectively of the same tech rep, but both uncoded take less than 1 MB. (note: to decode, use gunzip tr94-2.ps.gz If that does not work, rename tr94-2.ps.gz to tr94-2.ps.z and try gunzip again. The report takes 247503 bytes compressed, 1211745 bytes uncompressed and 84 pages if printed. GPS) Unfortunately there are a couple of typos in the current version of this party incomplete manuscript. I apologize for that and I am working on enhancements and improvements. Dr. Beat Wuethrich The Hong Kong University of Science and Technology CS Dept (room 3512) Clear Water Bay Kowloon, Hong Kong ------------------------------------------ Date: Mon, 9 May 94 15:53:45 +0200 From: Yves.Kodratoff@lri.fr Subject: The comprehensibility manifesto Hi! here is a version I would be delighted to see announced in the kdd nuggets! Cheers Yves The comprehensibility manifesto Yves Kodratoff (yk@lri.lri.fr) and (unconventional) submission for an ECML'95 workshop on Industrial applications of ML and comprehensibility The importance of explanations and of comprehensibility of the results provided by an expert system or a machine learning (ML) algorithm is by no means a new idea. To my knowledge, it has been around since the 80's (see details below), but I am almost sure that others realized its importance before. This old idea did not attract much attention from a scientific community more interested in measuring the complexity of the algorithms and the accuracy of their results than the comprehensibility of the software and of the results. This attitude can be explained by the fact that we have no precise definition of what an explanation really is, that we have no way of measuring or even analyzing what a "good" explanation is: Comprehensibility is a badly defined concept, presently non measurable. This state of facts seems to me unbearable now in view of the analysis of the industrial applications done of the various ML approaches. Each time one of our favorite ML approaches has been applied in industry, each time the comprehensibility of the results, though ill-defined, has been a decisive factor of choice over an approach by pure statistical means, or by neural networks. To confirm this opinion, think over that, very recently, answering questions about the difference between ML and more application oriented data mining, G. Piatetski-Shapiro claimed that "Knowledge discovery in Data Bases (KDD) is concerned with finding *understandable* knowledge, while ML is concerned with improving performance of an agent." Rather than discussing what properly belongs to ML or not, let us rather ask the KDD community to join us. This manifesto induces from these examples (and here is its weakness) that a large number of industrial application of ML will demand good explanations, as long as the domain is understood by the experts. We are well aware that mechanizing one stage of a complex process may not require comprehensibility, but we claim that the whole process, as soon as the decisions it helps to make are important, will request a high level of comprehensibility, for the experts to validate the system, for its maintenance, and for its evolution in view of changes in the external world. Now, let us deduce the consequences of our claim. The problem we are left with is that we do not understand comprehensibility! This is why I propose to stop fleeing before the problem and define comprehensibility as an acknowledged research topic. It is a hard problem, sure enough, but are we supposed to tackle with the easy ones only? Now that it seems to be well identified as an industrial problem, we so to say cannot "afford" to go on shunning it. What kind of forces do we need to join in order to hope finding a solution? We obviously need the MLists who developed the symbolic/numeric systems able to generate understandable knowledge. Notice that we cannot work in isolation from the users, the industrialists, who know empirically what a good explanation is, or rather what a bad one is, and who are the only ones able to attribute scores to the results of our future experiments. Just as an example, the explanatory value of special "nuggets" has been introduced in the ML community by P. Riddle because of her study of manufacturing at Boeing, not to ease a tension internal to the ML field. The KDD community, cited above, is obviously concerned. We need also specialists in knowledge acquisition (KA), the research topic of which is how to help a user to make his/her knowledge understandable to the machine. They are thus used to work on the inverse problem to ours, and their experience in the topic will be invaluable. Specialists in explanations for the expert systems (ES) have already provided definitions and taxo- nomies of explanations, they are the pioneers of the field: There exists now a large body of workers that follow and deepen the ideas that led Clancey to NEOMYCIN. Our problem would be more specifically to define a measure of comprehensibility on the explanations generated by their systems. Psychologists and more particularly pedagogists should be also part of this game since they are used to analyze what a student understands really out of a set of explanations, that is, what are for a human the internally genera- ted explanations. Another type of interesting knowledge should come from the specialists in the social sciences who could help us to define the social contexts in which comprehensibility can take place. Finally, it is obvious that statistics do not demand obscurity, and some efforts are done to ease up the interpretations of the results. Those statisticians interested in these efforts would be most welcome. All this looks pretty well as a new theme for Cognitive Science, and we must acknowledge that AI in general is deeply embedded in Cognitive Science. Nevertheless, the ambiguous status of AI is again very typical here since there are yet many problems, all relative to comprehensibility, that are to be solved in the frame of Computer Science. Let us start by underlining a few important problems related to Cognitive Science. Since comprehension is perhaps the most context-dependent of all human activities we cannot avoid holding positions in the symbolic/situated cognition debate. Can we define situated comprehensibility? Are we able to start an ontology of the different contexts in which comprehension is possible? What is the exact status of comprehensibility in a situated cognition? Do we believe that the situated character of comprehension precludes communication, that we must thus confuse lack of comprehensibility and situated? My personal answer is no, but it is clearly an important debate, illustrated by the industrial applications that rejected neural networks on the ground of their lack of comprehensibility. Do we follow Clancey in thinking that symbolic representation are simple shadows of what we must explain? In our (symbolic) implementations how can we evaluate the loss due to symbolization, and how can we translate it to make it understandable to the human expert? How could it be possible that the situated knowledge representations generated during problem solving combine efficiency and comprehensibility? I would also like to insist on four issues related to Computer Science be- cause they are sometimes hidden by other concerns. The first one, to the best of my knowledge, to work on these topics has been R. S. Michalski who stated a "comprehensibility postulate" in his famous paper on the star methodology. This work requests two remarks. First remark is that the star algorithm can be well perceived as a statistical classification method in which comprehensibility as been introduced as a constraint on the description obtained. This shows that Michalski can be credited of being the first scientist to create a program in which efficiency and comprehensibility have been synthesized in the same algorithm. This effort, which I think very important, opposes several subsequent attempts to disconnect efficiency and comprehensibility in different and even possibly unrelated modules. At any rate, this choice should be discussed and explained. The second remark is that when Michalski gives an overview of ML a few pages before, co-authoring with others, he describes ML surprisingly enough without the smallest hint to his own concept of comprehensibility. This shows how still shocking is for some people the idea of a work taking into account ill-defined comprehensibility. The first to work on industrial applications of ML, D. Michie, often stated in front of our community, for instance in his address at EWSL'87 in Bled, that one of the main features of ID3-like algorithms, as opposed to so many statistical systems that use also information compression, is their ability to generate easy to understand decision trees. I remember also that at this meeting, I. Bratko argued that, depending on the experts, decision trees might be more understandable than the rules one extracts from them. All these are early examples of the realization that comprehensibility is an essential factor to an ML algorithm. As stated above, this has been confirmed many times by subsequent industrial applications. From the research point of view, it underlines that comprehensibility-decreasing changes in the representation should be carefully considered before acceptance. A thorough discussion of the importance of learning hyperrectangles, obviously leading to understandable results is needed, together with a look at the possible ways to make understandable other approaches using other shapes to cover the examples. People that used diagonals or ellipses have always justified their approach by an increase in accuracy. It is not sure at all that they always kill comprehension, it is probable that a representational change is needed in such a way that it will lead to even better further understanding. More generally, all people concerned by changes in representation or invention of new predicates, as for instance people working on constructive induction, should be also interested by our proposal. Another topic of interest should be "knowledge architectured" neural networks (NN)  la Shavlik who has shown very neatly that introducing knowledge to built the network, and to compute its initial weights and activation thresholds, not only increases accuracy, but also helps subsequent interpretation of the learned NN by rules containing n-of-m conditions. Even more easily, genetic algorithms (GA) can be tuned in such a way that the strings of bits that are learned are easily translated back into meaningful information. A last example, yet not acknowledged as linked to comprehensibility, I would like to cite is the effort for avoid absurd classifications that recognize an irrelevant item as belonging to a class. Such is, for example, W. Emde's reaction to the old car recognized has being sick of the measles by a knowledge-based system. Even if the system is supposedly equipped of the best explanatory mechanisms, it would have hard times explaining this result in any convincing way. This example shows well that they are other measurable quantities than accuracy, here the amount of falsely recognized items in the test set, that measure some amount of comprehensibility. From Emde's preliminary experiments it is clear that decreasing the amount of false recognition may also decrease dramatically accuracy. What are we to choose, accuracy or no false recognition? Is it possible to preserve accuracy in some ways? What is the best architecture that would allow us to get alternatively accurate, or (exclusive and not exclusive or) non absurd results? Similarly, let us cite G. Nakhaeizadeh results at Daimler-Benz. I know that he was not at all inspired by comprehensibility of results, but by immediate industrial concerns. Yet, he and his group devised a cost-driven ID3 which avoids performing false recognition that would be very expensive. As Emde, he acknowledges that in some cases he obtains huge decreases in accuracy when optimizing low cost. As you can see, the community is not really empty-handed when facing the problem of understandable learning, and I am convinced that we shall be able to find very soon objective definitions of what comprehensibility is, with the help of the users to judge our results, with the combined forces of ML, KDD, KA, explanatory ES, knowledge intensive NN and GA, pedagogy, sociology, and of statisticians eager to communicate better with their users. This is why I invite all interested parties to join a workshop on "Industrial applications of ML and comprehensibility", I plan to organize next to ECML'95 in Heraklion. Would not it be a beautiful symbol that a new science comes to existence so near to Knossos where the Labyrinth has been built a few years ago? Before sending papers, send me your view of the problem of comprehensibility or your industrial experience, and how you could contribute to the workshop, even if you cannot join physically (we have to set up a programme committee, define topics, evaluation criteria, etc.). The topic of the workshop should be essentially an in-depth discussion of new industrial applications from the point of view of comprehensibility, and the experimental settings by which we could start measuring the value of an explanation, and the comprehensibility of a string of symbols. This includes all kinds of discussions relative to the definition of what an explanation is, and how to evaluate the comprehensibility of an explanation. Optimists can even start thinking on which kind of theories we should use to take comprehensibility into account: the hunt for "probably approximately comprehensible" learning is open!