KDD Nuggets -- September 23, 1993 Contents: * Wray Buntine: "interestingness" should be defined via decision theory * Jerzy W. Grzymala-Busse: Re: KDD success * Sandra Oudshoff: tools for information harvesting? The KDD Nuggets is an informal list for the dissemination of information relevant to Knowledge Discovery in Databases (KDD), such as announcements of conferences/workshops, tool reviews, application success/failure stories, interesting ideas, outrageous opinions, etc. If you have such a contribution, please email it to kdd%eureka@gte.com Mail requests to be added/deleted also to kdd%eureka@gte.com. (Note: If you received this message you are on the list!) -------------------------------------------------- Date: Thu, 12 Aug 93 12:09:15 PDT From: Wray Buntine > We should elaborate some clear definitions of fundamental terms of KDD > (e.g. "pattern"), clarify some vague concepts (e.g. "interestingness of > finding"), identify the main subgroups of applications (discovery in > databases (relational, time series ...), scientific discovery, image > database discovery ...) and describe the (methodical) differences between > these subgroups. Sounds great, but I just hope this doesn't end up in creating yet another jargon. So far we have machine learning, statistics, neural networks, statistical physics, information theory and computational learning theory people addressing learning type things with all their own vocabulary. So I'd hope some unification is possible. For my part, I think "interestingness" is best explained using the language of decision theory, i.e. utility theory and probability theory. This is certaintly the case for what I've seen at KDD'91 and '93. There is a well developed theory AND practice of decision theory, so don't need to reinvent the wheel here, and it certainly will lead to cleaner specification of "interesting" plus derivation of matching learning algorithms. Wray Buntine ------------------------------------------------------ Date: Thu, 9 Sep 93 14:57:05 CDT From: Jerzy@cs.ukans.edu Cc: "Jerzy W. Grzymala-Busse" Subject: Re: KDD success A prototype of an expert system to assess preterm labor risk for pregnant women was developed recently by a team under general guidance of M. Van Dyne (Intellidyne, Inc.), who was also responsible for development of the main system. J. Grzymala-Busse (U. of Kansas) has been responsible for rule induction from databases using system LERS (Learning from Examples based on Rough Sets). L. Woolery (U. of Missouri) has been an expert in the domain and analyzed data statistically. The project was funded by the National Institutes of Health, grant number 1 R43 NR02899-01A1. In the project three databases (provided by a local perinatal center and two national home uterine monitoring companies) containing information about preterm and fullterm delivery were processed. Every database was divided by half, 50% entities were used for rule induction by LERS and remaining 50% for validation. The sizes of databases are the following: 52 variables and 3204 entities (first database), 77 variables and 2436 entities (second database) and 82 variables and 7224 examples (third database). Rules generated from LERS on the databases were combined into a single rule base in a prototype expert system. Each rule was given a priority score. Because of low accuracy rates, rules generated from the third database were not included in the prototype expert system. Data from each of the three databases were tested on the prototype expert system combined rule base to determine the final predictive accuracy rates. Currently used manual tools to assess preterm labor risk have 17 - 38 % accuracy in determining preterm risk (McLean, Walters & Smith (1993). The system developed in the first phase of the project has the following accuracies for predicting preterm or fullterm delivery: 89% (first database), 59% (second database), and 53% (third database). --------------------------------------------------- From: oudshoff@sun032.research.ptt.nl (Sandra Oudshoff) Subject: tools for information harvesting? Organization: PTT Research, Groningen, The Netherlands Date: Thu, 9 Sep 1993 09:18:00 GMT Dear fellow netters, Our AI group here is very interested in buying a commercial product that can be used for information harvesting, i.e. extracting rules or trends automatically from (large) amounts of data. We have heard about "Information Harvesting" from the company with the same name. If you have any information about / experience with similar tools, I would greatly appreciate it if you shared that with me through e-mail. I`ll summarize on the net if there is interest. Any and all info or help will be greatly appreciated. Thanks in advance, Sandra Oudshoff e-mail to: a.m.oudshoff@research.ptt.nl