KDD Nuggets #5 -- October 28, 1993

Contents: 
	* Interestingness: Robert Demolombe, Darrell Conklin, GPS
	* Douglas H. Fisher: new AI/Stats list created
	* Michael Brodie: Economist: AI is revolutionizing the Credit Business
Requests:
	* Richard Forsyth: machine learning in geographic/spatial databases?
	* Douglas H. Fisher: hierarchical clustering ?

The KDD Nuggets is an informal list for the dissemination of information
relevant to Knowledge Discovery in Databases (KDD), such as announcements 
of conferences/workshops, tool reviews, application success/failure stories,
interesting ideas, outrageous opinions, etc. 

If you have such a contribution, please email it to kdd%eureka@gte.com 
Mail requests to be added/deleted also to kdd%eureka@gte.com.
    -- Gregory Piatetsky-Shapiro

--------------------------------------------------
From demolombe@tls-cs.cert.fr  Wed Oct 20 13:19:46 1993
Subject: Interestingness

There are many  works  in  logic on  the concept  of  "relevance",  or
"topic" or "subject  matter" or "aboutness" or "relatedness".   We are
working on the definition of a logic for reasoning about links between
a sentence  and topics.  The  initial motivation was to help  users to
retrieve  information.   In  that  context  "interesting"  topics  are
defined  as  those   topics  that   are  related  to  the  query,  and
"interesting"  additional  answer    is  defined  as  additional
information related to these interesting topics.

Is this related to "interestingness"?

Are there possible aplication of  this work to  KDD, in the sense that
interesting topics may be used to focus the search on a restricted set
of rules or patterns?

I am definitly not a specialist in KDD, who could give me an answer?

If one is interested we have a preliminary version of a paper entitled:
"Reasoning about "is about" " which is available on request.

Robert Demolombe

---------------
From: conklin@qucis.queensu.ca (Darrell Conklin)
Date: Wed, 27 Oct 93 10:18:51 EDT
Subject: On "interestingness"

Techniques for knowledge discovery include conceptual clustering, and
its incremental counterpart, concept formation.  These techniques
typically describe objects by sets of attribute/value pairs (features)
and group similar objects together into a hierarchy of concepts.
Concepts are represented by intensional definitions, which represent
recurrent patterns of features.

There are many (sometimes conflicting) evaluation methods for the
"interestingness" of a concept, including "rediscovery",
predictiveness of features given concept membership, ability to
compress data, and so on.  Here is another idea.

Consider a concept C: we can ask, is there some subset S of the
features in C that are highly predictive of the others (the features
in C-S)?  That is, the concept C could be useful for inference if

	P(C-S | S) = P((C-S)&S)/P(S) = P(C)/P(S)

is sufficiently high, and the regularity or concept C has been
observed a sufficient number of times.  (I leave the definition of
"sufficient" to a KDD theorist).  The expression above can easily be
evaluated --- without a scan through the whole database --- if both C
and S are discovered concepts with an attached frequency of occurrence
field.

A similar technique has been used by Rooman and Wodak (Nature 335,
1988) and in my own research to evaluate discovered associations
between protein sequence and structure.  Concept formation systems
offer a reasonable technique for uncovering regularities in the data,
while constraining the search space over possible regularities.


---------------
From: gps@gte.com (Gregory Piatetsky-Shapiro)
Subject: Interestingness is subjective

In practical applications, interestingness of a piece of knowledge is usually
related to whether this knowledge can lead to some useful action.  Thus,
objective (syntactic, statistical, information-theoretic, logical, etc)
measures of interestingness, are an important but no the onlu component of 
overall interestingness.  Another, subjective, component is necessary. 

In our current system which looks for Key changes in health care data, such
subjective component is the degree of *discretion* over a particular finding.
Thus, an increase in costs due to normal pregnancies is not very interesting,
since there is no action item.  An increase in costs due to premature 
babies is very interesting, since there are well-known prevention techniques.

We get from experts the values of discretion for the basic elements in 
the knowledge base.  Then, interest of any combination of these elements can
be computed as a simple, but domain-dependent function of interest of 
basic elements. 

-------------------------------------------------- 
From: Michael Brodie 
Subject: AI is revolutionizing the Credit Business

The Economist, Sept. 25, 1993, article describes how AI is used to
identify cardholder preferences, interests, and qualifications.  They
claim a 19,000 item rulebook and will eventually be used to process
all transactions.

--------------------------------------------------
From: dfisher@vuse.vanderbilt.edu (Douglas H. Fisher)
Subject: AI/Stats list


A new mailing list for those interested in AI and Statistics
has been created. Requests to be added to this mailing list
should be directed to

    ai-stats-request@watstat.uwaterloo.ca

Organization of the 1995 International AI and Statistics Workshop
has begun. Look for announcements in this and other newsgroups.

Doug Fisher, General Chair

------------------------------------------------------------
----------------- Requests ---------------------------------
From: RS_FORSYTH@cv.uwe.ac.uk
Subject: machine learning in geographic/spatial databases

hello out there
i have a student looking into applications of learning/induction/discovery
algorithms to remote-sesning & geographic databases. we have only tracked
down 3 or 4 refs so far (all but one in the KDD-93 proceedings). is anyone
out there well up on the state of that particular art? if so i'd greatly
appreciate info on where to look, whom to contact & so forth.

thanks,
richard forsyth.
(UWE Bristol, UK)

--------------------------------------------------
From: dfisher@vuse.vanderbilt.edu (Douglas H. Fisher)
Subject: query


The forms of iterative optimization in clustering that I
am familiar with begin with some initial clustering,
and then iteratively move single objects around
in search of a better clustering according to some
objective measure.

I have built a system that forms an initial hierarchical
clustering, and then moves top-down through the hierarchy,
at each level `reclassifying' entire clusters (subtrees)
in search of a better partition. This top-down pass
terminates at leaves, where single objects are reclassified
in the global hierarchical structure. In general, several
top-down passes may be necessary before the hierarchical
clustering `stabilizes'.

If you know of published work along similar lines, either
similar systems, or work related to the more general issue
of reclassifying object sets (versus single objects), 
then please send me citations at  dfisher@vuse.vanderbilt.edu
P.S. I already know of one piece of related work by Nevins at 
Georgia State.

Thank you, Doug Fisher