KDD Nugget 94:14, e-mailed 94-07-26
Contents: 
	* G. Piatetsky-Shapiro, new in KD Mine, http://info.gte.com/~kdd/
	* David Page (in ML-list), Inductive Learning Competition
	* LA Times, Any data in the computer can be used against you
	* CFP: Special Issue of the AI Journal on Empirical AI
	* Ross Quinlan:  Revised Version of C4.5

The KDD Nuggets is a moderated list for the exchange of information 
relevant to Knowledge Discovery in Databases (KDD, also known as Data Mining),
e.g. application descriptions, conference announcements, tool reviews, 
information requests, interesting ideas, clever opinions, etc.
It has been coming out about every two-three weeks, depending on the
quantity and urgency of submissions.. 

 Back issues of nuggets, a catalog of data mining tools, useful references,
FAQ, and other KDD-related information are now available 
 at Knowledge Discovery Mine, URL http://info.gte.com/~kdd/  or 
 by anonymous ftp to ftp.gte.com, cd /pub/kdd, get README

E-mail contributions to kdd@gte.com
Add/delete requests to kdd-request@gte.com 
	-- Gregory Piatetsky-Shapiro (moderator)

********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) * 
* and not necessarily of their respective employers (or GTE Laboratories)   *
*****************************************************************************

~~~~~~~~~~~~ Quotable Quote  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PC Magazine (July 1994) reports the results of their contest for the best new
name for the Information Superhighway.  With runner-ups like Algorebahn,
Byteway, and Route 100010, the winner was Kevin Kwaku who suggested that while 
the Information Superhighway is a bad name, it could be a great Acronym, 
standing for "Interactive Network For Organizing, Retrieving, Manipulating, 
Accessing, and Transferring Information On National Systems, Unleashing 
Practically Every Rebellious Human Intelligence, Gratifying Hackers, And
Yahoos." 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----------------------------
From: G. Piatetsky-Shapiro (gps@gte.com)
Subject: new in KD Mine URL http://info.gte.com/~kdd
Date: Thu, 14 July 1994 

----------
In Other Informations Servers 
	<URL "http://info.gte.com/~kdd/other-servers.html">

<URL "http://www.cosmic.uga.edu/maincat.html#45">
COSMIC's Program Catalog (Cybernetics Section) -- Programs developed by NASA.
Contains pointers to many useful systems ... 

<URL "http://sigart.acm.org/otherAI.html"> 
SIGART (ACM Special Interest Group on Artificial Intelligence) 
information is available on-line.  The site also has 
<URL "http://sigart.acm.org/otherAI.html">
a nice list of other AI related resources.

----------
In Siftware
	<URL "http://info.gte.com/~kdd/siftware.html">
	updated info for BBN Cornerstone, IDIS

----------
In Homepages
	<URL "http://info.gte.com/~kdd/homepages.html">
	finally, a homepage for GTE Laboratories KDD project.

-----------------------------
Date: Tue, 19 Jul 94 22:59:19 BST
From: David.Page@comlab.ox.ac.uk
Subject: Inductive Learning Competition

 TO THE INTERNATIONAL ML COMMUNITY: NEW EAST-WEST CHALLENGE


              Donald Michie, Stephen Muggleton
              David Page and Ashwin Srinivasan
        Oxford University Computing Laboratory, UK.

How do  today's  inductive  inference  algorithms  stack  up
against  human brains?  We here announce an inductive theory
formation challenge, in the  form  of  3  competitions.  (1)
Readers  are invited to induce rules from a set of 20 train-
descriptions  developed  from  Ryszard  Michalski's  classic
presentation of 5 Eastbound and 5 Westbound trains more than
10 years ago.  The 10  new  trains  originate  from  Stephen
Muggleton's  pseudo-random  train-generator, coded in Prolog
and outputting trains encoded as Prolog  facts.  These  were
subjected to filtering and class-labelling sufficient to en-
sure  that  at  least  two  moderately  simple   classifying
theories lie hidden in the final 20 trains. By kind donation
of Oxford University Press, the simplest  theory  submitted,
whether  of  human  or  machine  authorship,  wins a copy of
Richard Gregory's handsome "Oxford Companion  to  the  Mind"
(35  Pounds  Sterling,  US$49.95).  (2) Competition 2 is for
sub-symbolic learning, based  on  a  predictive  performance
criterion  rather  than explicit theory formation. For this,
the prize is a free copy of "Machine  Learning,  Neural  and
Statistical    Classification"   (eds.   D.   Michie,   D.J.
Spiegelhalter and C.C. Taylor, Ellis Horwood Series  in  Ar-
tificial   Intelligence),   1994   (39.95  Pounds  Sterling,
US$67.95).  (3) In Competition 3 each of  5  subtasks  takes
the  same  5  trains vs. 5 trains format as in (1), with the
difference that each  subtask  was  generated  randomly  and
pre-classified arbitrarily. Further details in the form of a
compressed tar file are obtainable at:

URL = ftp://ftp.comlab.ox.ac.uk/pub/Packages/ILP/trains.tar.Z
FTP site = ftp.comlab.ox.ac.uk
FTP file = pub/Packages/ILP/trains.tar.Z

------------------------------
Date: Thu 21 Jul 94 00:33:29-PDT
From: Ken Laws <LAWS@ai.sri.com>
Subject: Any data in the computer can be used against you
(this is extract from THE COMPUTISTS' COMMUNIQUE, Full Moon Edition 
-- GPS)
...
    Federal pretrial discovery rules introduced in 12/93 require 
companies to hand over a list of all available [and relevant?] 
electronic data and to refrain from deleting any.  "Anything
you put in a computer can and will be used against you in a
court of law."  Companies should limit the number of saved email 
messages, and should warn employees about forwarding to outsiders 
(or even to other employees).  [Leslie Helm, LA Times, 6/16/94.  
Chaos Corner, 6/22/94.]  (That goes against the grain,
doesn't it?)
...

------------------------------
Date: Mon, 25 Jul 94 16:22:31 -0500
From: David Hart <dhart@cs.umass.edu>
To: alife@cognet.ucla.edu, ai-ed@sun.com, ail-l@austin.onu.edu,
        ai-medicine@medmail.Stanford.EDU, cbr-med@cs.uchicago.edu,
        ai-stats@watstat.uwaterloo.ca, DAI-List@mcc.com,
        genetic-programming@cs.stanford.edu, ml@ics.uci.edu,
        ir-l%uccvma.bitnet@vm1.nodak.edu, nl-kr@cs.rpi.edu,
        siggen@black.bgu.ac.il, empiricists@csli.stanford.edu,
        lantra-l%finhutc.bitnet@cunyvm.cuny.edu, corpora@nora.hd.uib.no,
        qphysics@cs.washington.edu, vision-list@teleos.com, kdd@gte.com
Subject: CFP: AIJ Special Issue Devoted to Empirical AI
Content-Type: TEXT/plain; charset=US-ASCII

Reply-to: dhart@cs.umass.edu


                             Call for Papers

        Special Issue of the Artificial Intelligence Journal 
           Devoted to Empirical Artificial Intelligence

              Editors:  Paul Cohen (cohen@cs.umass.edu) and 
			Bruce Porter (porter@cs.utexas.edu)

We are looking for papers that characterize and explain the
behaviors of systems in task environments.  Papers should report
results of studies of AI systems, or new techniques for studying
systems.  The studies should be empirical, by which we mean "based
on observation" (not exclusively "experimental," and certainly not
exclusively statistical hypothesis testing).  Examples (some of which
are already in the AI literature) include:

    A report of performance comparisons of message-understanding 
      systems, explaining why some systems perform better than
      others in some task environments

    A study of commonly-used benchmarks or test sets, explaining why 
      a simple algorithm performs well on many of them

    A study of the empirical time and space complexity of an 
      important algorithm or sample of algorithms

    Results of corpus-based machine-translation projects

    A paper that introduces a feature of a task that suggests why 
      some task instances are easy and others difficult, and tests 
      this claim

    Theoretical explanations (with appropriate empirical backing) 
      of unexpected empirical results, such as constant-time 
      performance on the million-queens problem 

    A statistical procedure for comparing performance profiles 
      such as learning curves

    A resampling method for confidence intervals for statistics 
      computed from censored data (e.g., due to cutoffs on run times)

    A paper that postulates (on empirical or theoretical grounds) 
      an equivalence class of systems that appeared superficially 
      different, providing empirical evidence that, on some 
      important measures, members of the class are more similar 
      to each other than they are to nonmembers. 

The empirical orientation will not preclude theoretical articles; it
is often difficult to explain and generalize results without a
theoretical framework.  However, the overriding criterion for papers
will be whether they attempt to characterize, compare, predict,
explain and generalize what we observe when we run AI systems.

This is an atypical special issue because many of us think there is
nothing special about empirical AI.  It isn't a subfield or a
particular topic, but rather a methodology that applies to many
subfields and topics.  We are concerned, however, that despite the
scope of empirical AI, it might be underrepresented in the pages of
the Artificial Intelligence Journal.  This special issue is an
experiment to find out: if the number of submitted, publishable papers
is high, then we may conclude that the Journal could publish a higher
proportion of such papers in the future, and this issue might be
inaugural rather than special. 

Three principles will guide reviewers: Papers should be interesting,
they should be convincing, and in most cases they should pose a
question or make a claim.  A paper might be unassailable from a
methodological standpoint, but if it is an unmotivated empirical
exercise (e.g., "I wonder, for no particular reason, which of these
two algorithms is faster"), it won't be accepted.  In the other
corner, we can envision fascinating papers devoid of convincing
evidence.  Different interpretations of "convincing" are appropriate
at different stages of projects and for different kinds of projects;
for example, the standards for hypothesis testing are stricter than
those for exploratory studies, and the standards for new empirical
methods are of a different kind, pertaining to power and validity.
If, however, the focus of a paper is a claim, then convincing evidence
must be provided.

                      Deadline: Jan. 10, 1995.  

Please contact either of the editors as soon as possible to tell us
whether you intend to submit a paper, and include a few lines
describing the paper, so we can gauge the level of interest and the
sorts of work we'll be receiving.

Request:  Due to the broad nature of this call, it will be difficult
to reach all potential contributors.  So, please tell a friend...


            The Editorial Board for this issue includes:

B. Chandrasekaran, Eugene Charniak, Mark Drummond, John Fox, Steve
Hanks, Lynette Hirschman, Adele Howe, Rob Holte, Steve Minton, Jack
Mostow, Martha Pollack, Ross Quinlan, David Waltz, Charles Weems.


***

Dave Hart
UMass, Amherst
dhart@cs.umass.edu

------------------------------

Date: Wed, 20 Jul 1994 15:12:03 +1000
From: Ross Quinlan <quinlan@ml2.cs.su.oz.au>
Subject: Revised Version of C4.5

There have been several small changes (minor bug fixes and improvements)
since the code was published in 1992.  If you have Release 5 (i.e. the
disk from Morgan Kaufmann), you can obtain the altered files by anonymous
ftp from ftp.cs.su.oz.au, directory pub/ml, file patch.tar.Z.  The file
Modifications summarizes the changes since Release 5.

Needless to say, it is advisable to retain the old files until you are
satisfied with Release 6!

Ross Quinlan


------------------------------