KDD Nugget 94:14, e-mailed 94-07-26 Contents: * G. Piatetsky-Shapiro, new in KD Mine, http://info.gte.com/~kdd/ * David Page (in ML-list), Inductive Learning Competition * LA Times, Any data in the computer can be used against you * CFP: Special Issue of the AI Journal on Empirical AI * Ross Quinlan: Revised Version of C4.5 The KDD Nuggets is a moderated list for the exchange of information relevant to Knowledge Discovery in Databases (KDD, also known as Data Mining), e.g. application descriptions, conference announcements, tool reviews, information requests, interesting ideas, clever opinions, etc. It has been coming out about every two-three weeks, depending on the quantity and urgency of submissions.. Back issues of nuggets, a catalog of data mining tools, useful references, FAQ, and other KDD-related information are now available at Knowledge Discovery Mine, URL http://info.gte.com/~kdd/ or by anonymous ftp to ftp.gte.com, cd /pub/kdd, get README E-mail contributions to kdd@gte.com Add/delete requests to kdd-request@gte.com -- Gregory Piatetsky-Shapiro (moderator) ********************* Official disclaimer *********************************** * All opinions expressed herein are those of the writers (or the moderator) * * and not necessarily of their respective employers (or GTE Laboratories) * ***************************************************************************** ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PC Magazine (July 1994) reports the results of their contest for the best new name for the Information Superhighway. With runner-ups like Algorebahn, Byteway, and Route 100010, the winner was Kevin Kwaku who suggested that while the Information Superhighway is a bad name, it could be a great Acronym, standing for "Interactive Network For Organizing, Retrieving, Manipulating, Accessing, and Transferring Information On National Systems, Unleashing Practically Every Rebellious Human Intelligence, Gratifying Hackers, And Yahoos." ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ----------------------------- From: G. Piatetsky-Shapiro (gps@gte.com) Subject: new in KD Mine URL http://info.gte.com/~kdd Date: Thu, 14 July 1994 ---------- In Other Informations Servers COSMIC's Program Catalog (Cybernetics Section) -- Programs developed by NASA. Contains pointers to many useful systems ... SIGART (ACM Special Interest Group on Artificial Intelligence) information is available on-line. The site also has a nice list of other AI related resources. ---------- In Siftware updated info for BBN Cornerstone, IDIS ---------- In Homepages finally, a homepage for GTE Laboratories KDD project. ----------------------------- Date: Tue, 19 Jul 94 22:59:19 BST From: David.Page@comlab.ox.ac.uk Subject: Inductive Learning Competition TO THE INTERNATIONAL ML COMMUNITY: NEW EAST-WEST CHALLENGE Donald Michie, Stephen Muggleton David Page and Ashwin Srinivasan Oxford University Computing Laboratory, UK. How do today's inductive inference algorithms stack up against human brains? We here announce an inductive theory formation challenge, in the form of 3 competitions. (1) Readers are invited to induce rules from a set of 20 train- descriptions developed from Ryszard Michalski's classic presentation of 5 Eastbound and 5 Westbound trains more than 10 years ago. The 10 new trains originate from Stephen Muggleton's pseudo-random train-generator, coded in Prolog and outputting trains encoded as Prolog facts. These were subjected to filtering and class-labelling sufficient to en- sure that at least two moderately simple classifying theories lie hidden in the final 20 trains. By kind donation of Oxford University Press, the simplest theory submitted, whether of human or machine authorship, wins a copy of Richard Gregory's handsome "Oxford Companion to the Mind" (35 Pounds Sterling, US$49.95). (2) Competition 2 is for sub-symbolic learning, based on a predictive performance criterion rather than explicit theory formation. For this, the prize is a free copy of "Machine Learning, Neural and Statistical Classification" (eds. D. Michie, D.J. Spiegelhalter and C.C. Taylor, Ellis Horwood Series in Ar- tificial Intelligence), 1994 (39.95 Pounds Sterling, US$67.95). (3) In Competition 3 each of 5 subtasks takes the same 5 trains vs. 5 trains format as in (1), with the difference that each subtask was generated randomly and pre-classified arbitrarily. Further details in the form of a compressed tar file are obtainable at: URL = ftp://ftp.comlab.ox.ac.uk/pub/Packages/ILP/trains.tar.Z FTP site = ftp.comlab.ox.ac.uk FTP file = pub/Packages/ILP/trains.tar.Z ------------------------------ Date: Thu 21 Jul 94 00:33:29-PDT From: Ken Laws Subject: Any data in the computer can be used against you (this is extract from THE COMPUTISTS' COMMUNIQUE, Full Moon Edition -- GPS) ... Federal pretrial discovery rules introduced in 12/93 require companies to hand over a list of all available [and relevant?] electronic data and to refrain from deleting any. "Anything you put in a computer can and will be used against you in a court of law." Companies should limit the number of saved email messages, and should warn employees about forwarding to outsiders (or even to other employees). [Leslie Helm, LA Times, 6/16/94. Chaos Corner, 6/22/94.] (That goes against the grain, doesn't it?) ... ------------------------------ Date: Mon, 25 Jul 94 16:22:31 -0500 From: David Hart To: alife@cognet.ucla.edu, ai-ed@sun.com, ail-l@austin.onu.edu, ai-medicine@medmail.Stanford.EDU, cbr-med@cs.uchicago.edu, ai-stats@watstat.uwaterloo.ca, DAI-List@mcc.com, genetic-programming@cs.stanford.edu, ml@ics.uci.edu, ir-l%uccvma.bitnet@vm1.nodak.edu, nl-kr@cs.rpi.edu, siggen@black.bgu.ac.il, empiricists@csli.stanford.edu, lantra-l%finhutc.bitnet@cunyvm.cuny.edu, corpora@nora.hd.uib.no, qphysics@cs.washington.edu, vision-list@teleos.com, kdd@gte.com Subject: CFP: AIJ Special Issue Devoted to Empirical AI Content-Type: TEXT/plain; charset=US-ASCII Reply-to: dhart@cs.umass.edu Call for Papers Special Issue of the Artificial Intelligence Journal Devoted to Empirical Artificial Intelligence Editors: Paul Cohen (cohen@cs.umass.edu) and Bruce Porter (porter@cs.utexas.edu) We are looking for papers that characterize and explain the behaviors of systems in task environments. Papers should report results of studies of AI systems, or new techniques for studying systems. The studies should be empirical, by which we mean "based on observation" (not exclusively "experimental," and certainly not exclusively statistical hypothesis testing). Examples (some of which are already in the AI literature) include: A report of performance comparisons of message-understanding systems, explaining why some systems perform better than others in some task environments A study of commonly-used benchmarks or test sets, explaining why a simple algorithm performs well on many of them A study of the empirical time and space complexity of an important algorithm or sample of algorithms Results of corpus-based machine-translation projects A paper that introduces a feature of a task that suggests why some task instances are easy and others difficult, and tests this claim Theoretical explanations (with appropriate empirical backing) of unexpected empirical results, such as constant-time performance on the million-queens problem A statistical procedure for comparing performance profiles such as learning curves A resampling method for confidence intervals for statistics computed from censored data (e.g., due to cutoffs on run times) A paper that postulates (on empirical or theoretical grounds) an equivalence class of systems that appeared superficially different, providing empirical evidence that, on some important measures, members of the class are more similar to each other than they are to nonmembers. The empirical orientation will not preclude theoretical articles; it is often difficult to explain and generalize results without a theoretical framework. However, the overriding criterion for papers will be whether they attempt to characterize, compare, predict, explain and generalize what we observe when we run AI systems. This is an atypical special issue because many of us think there is nothing special about empirical AI. It isn't a subfield or a particular topic, but rather a methodology that applies to many subfields and topics. We are concerned, however, that despite the scope of empirical AI, it might be underrepresented in the pages of the Artificial Intelligence Journal. This special issue is an experiment to find out: if the number of submitted, publishable papers is high, then we may conclude that the Journal could publish a higher proportion of such papers in the future, and this issue might be inaugural rather than special. Three principles will guide reviewers: Papers should be interesting, they should be convincing, and in most cases they should pose a question or make a claim. A paper might be unassailable from a methodological standpoint, but if it is an unmotivated empirical exercise (e.g., "I wonder, for no particular reason, which of these two algorithms is faster"), it won't be accepted. In the other corner, we can envision fascinating papers devoid of convincing evidence. Different interpretations of "convincing" are appropriate at different stages of projects and for different kinds of projects; for example, the standards for hypothesis testing are stricter than those for exploratory studies, and the standards for new empirical methods are of a different kind, pertaining to power and validity. If, however, the focus of a paper is a claim, then convincing evidence must be provided. Deadline: Jan. 10, 1995. Please contact either of the editors as soon as possible to tell us whether you intend to submit a paper, and include a few lines describing the paper, so we can gauge the level of interest and the sorts of work we'll be receiving. Request: Due to the broad nature of this call, it will be difficult to reach all potential contributors. So, please tell a friend... The Editorial Board for this issue includes: B. Chandrasekaran, Eugene Charniak, Mark Drummond, John Fox, Steve Hanks, Lynette Hirschman, Adele Howe, Rob Holte, Steve Minton, Jack Mostow, Martha Pollack, Ross Quinlan, David Waltz, Charles Weems. *** Dave Hart UMass, Amherst dhart@cs.umass.edu ------------------------------ Date: Wed, 20 Jul 1994 15:12:03 +1000 From: Ross Quinlan Subject: Revised Version of C4.5 There have been several small changes (minor bug fixes and improvements) since the code was published in 1992. If you have Release 5 (i.e. the disk from Morgan Kaufmann), you can obtain the altered files by anonymous ftp from ftp.cs.su.oz.au, directory pub/ml, file patch.tar.Z. The file Modifications summarizes the changes since Release 5. Needless to say, it is advisable to retain the old files until you are satisfied with Release 6! Ross Quinlan ------------------------------