KDD Nuggets -- October 20, 1993 Contents: * John Major: Interestingness * Gregory Piatetsky-Shapiro: preprint of a paper available * rec.humor.funny: Dangers of data mining (you can fall into a shaft) The KDD Nuggets is an informal list for the dissemination of information relevant to Knowledge Discovery in Databases (KDD), such as announcements of conferences/workshops, tool reviews, application success/failure stories, interesting ideas, outrageous opinions, etc. If you have such a contribution, please email it to kdd%eureka@gte.com Mail requests to be added/deleted also to kdd%eureka@gte.com. (Note: If you receive this message you are on the KDD Nuggets list!) -------------------------------------------------- From: ustrvcdh@ibmmail.COM (John Major) > From: Wray Buntine > For my part, I think "interestingness" is best explained using > the language of decision theory, i.e. utility theory and > probability theory. I'm not sure how this covers "interestingness," but it is the right approach to define "value." Also, there are some "interesting" ramifications. Say a decision process scores 1 util for each correct classification and has a success rate, therefore expected utility, of "E" per decision. KDD comes up with a rule {or other "knowledge product"} applicable in "A" of the cases, with accuracy F > E. Assuming independence, a new, augmented decision process will have expected utility AF+(1-A)E = E+A(F-E). Therefore, the new rule adds value A(F-E), right? Not so fast. First, will the new rule be used accurately in the field? If fully automated, perhaps, but if not, F-E may not be so high. More, will the rule be used at all? {Is A nonzero?} Is it too complex to be understood? Is it too counterintuitive to be accepted by management? {I've seen both.} This shows the relevance of some features other than performance -- like complexity and novelty. Through decision theory, they have a chance of entering the analysis in a coherent way. To get off the ground, one would like to know about the accuracy and efficiency of humans interpreting knowledge structures. Does anyone know of any work in this area since Collins and Quillian's 1969-1972 papers? John A. Major Travelers (* JMAJOR, John Major, NHRS/21MS, 7-0104, 10/14/93, 01:02p *) -------------------------------------------------- From: Gregory Piatetsky-Shapiro (gps@gte.com) A compressed postscript file of a preprint of the paper: C. Matheus, P. Chan, G. Piatetsky-Shapiro, Systems for Knowledge Discovery in Databases, Special issue on Learning and Discovery in Databases, IEEE Transactions on Knowledge and Data Engineering, Dec 1993 can be obtained by anonymous ftp % ftp ceylon.gte.com Connected to ceylon.gte.com. 220 ceylon.gte.com FTP server Name (ceylon.gte.com:): anonymous 331 Guest login ok, send ident as password. Password: 230 Guest login ok, access restrictions apply. ftp> cd pub/matheus/kd 250 CWD command successful. ftp> binary 200 Type set to I. ftp> get kdd.ps.Z 200 PORT command successful. 150 Opening BINARY mode data connection for kdd.ps.Z (115569 bytes). 226 Transfer complete. local: kdd.ps.Z remote: kdd.ps.Z 115569 bytes received in 0.64 seconds (1.8e+02 Kbytes/s) ftp> quit 221 Goodbye. > -------------------------------------------------- Dangers of data mining (you can fall into a shaft) >from rec.humor.funny: From New Scientist, 28 august 93, Feedback column: "The National Westminster Bank admitted last month that it keeps personal information about its customers-such as their political affiliation-on computer. But now Computer Weekly reveals that a financial institution, sadly unnamed, has gone one better and moved into the realm of personal abuse. The institution decided to mailshot 2000 of its richest customers, inviting them to buy extra services. One of its computer programmers wrote a program to search through its databases and select its customers automatically. He tested the program with an imaginary customer called Rich Bastard. Unfortunately, an error resulted in all 2000 letters being addressed "Dear Rich Bastard". The luckless programmer was subsequently sacked."