KDD Nugget 94:18, e-mailed 94-10-14 Contents: * J. Catlett, Data mining in dramatic literature * D. Fisher, CFP: Fifth workshop on AI and Statistics * G. Patnaik, CFP: Computational AI applications in Geophysical Sciences * G. Piatetsky-Shapiro, ComputerWorld: Parallel processing mines retail data * G. Piatetsky-Shapiro, AI Magazine published KDD-93 report * T. Mitchell, Machine Learning course available via WWW The KDD Nuggets is a moderated list for the exchange of information relevant to Knowledge Discovery in Databases (KDD, also known as Data Mining), e.g. application descriptions, conference announcements, tool reviews, information requests, interesting ideas, clever opinions, etc. It has been coming out about every two-three weeks, depending on the quantity and urgency of submissions. Back issues of nuggets, a catalog of data mining tools, useful references, FAQ, and other KDD-related information are now available at Knowledge Discovery Mine, URL http://info.gte.com/~kdd/ or by anonymous ftp to ftp.gte.com, cd /pub/kdd, get README E-mail contributions to kdd@gte.com Add/delete requests to kdd-request@gte.com -- Gregory Piatetsky-Shapiro (moderator) ********************* Official disclaimer *********************************** * All opinions expressed herein are those of the writers (or the moderator) * * and not necessarily of their respective employers (or GTE Laboratories) * ***************************************************************************** ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ He had been eight years upon a project for extracting sunbeams out of cucumbers, which were to be put in vials hermetically sealed, and let out to warm the air in raw inclement summers. Jonathan Swift (1667-1745) _Gulliver's Travels_ (1726) ``A Voyage to Laputa, etc.'' ch. 5 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Tue, 27 Sep 94 13:37 EDT From: catlett@research.att.com (Jason Catlett) Subject: Data mining in dramatic literature If you see fit, the following could be included in KDD Nuggets. Jason Catlett Data mining in dramatic literature A recent play ``The ghost in the machine'' by David Gilman centers on on discovery in databases. Specifically, one of its characters is a musicologist studying a composition generated by computer using random numbers (``chance operations'' in John Cage's terms) and modified by a set of rules predetermined by the composer. The musicologist reports that after running programs that look for substrings occuring in other databases he found a long quote of the chorale ``A mighty fortress is our God,'' against astronomical odds. Other topics of the play include truth and morality. I found it entertaining but not statistically satisfying. It is playing until 9 October at the Perry St Theatre, 31 Perry St (west of Seventh Av South), in Greenwich Village in New York City. Seats are $15, showtimes Wed-Sat at 8pm and Sunday at 3pm. Reservations: 212 522 1402. Questions to The Barrow Group: 212 522 1421 Less directly related is Tom Stoppard's ``Arcadia'', which has had a long run in London, but which I haven't yet seen produced in the US. The play was completed soon before the announcement of the proof of Fermat's last theorem; much dialogue is devoted to this topic, as well as landscape gardening, carnal embrace, computer simulation and AI speculations reminiscent of Ada Lovelace. I enjoyed it. -------------------------------------------- Date: Sat, 1 Oct 1994 10:42:24 +0600 From: dfisher@vuse.vanderbilt.edu (Douglas H. Fisher) Subject: AI/Stats Workshop Preliminary Call for Participation Fifth International Workshop on ARTIFICIAL INTELLIGENCE and STATISTICS January 4-7, 1995 Ft. Lauderdale, Florida TECHNICAL and TUTORIAL PROGRAM: This is the fifth in a series of workshops that has brought together researchers in Artificial Intelligence and in Statistics to discuss problems of mutual interest. To encourage interaction and a broad exchange of ideas, there will be 20 discussion papers in single session meetings over three days (Jan. 5-7). Two poster sessions will provide the means for presenting and discussing the remaining research papers. Attendance at the workshop is *not* limited to paper presenters. The three days of research presentations will be preceded by a day of tutorials (Jan. 4). The tutorial topics, presenters, and approximate times are: (1) Machine Learning 9:00AM - 12:15PM (Dr. David Aha, Naval Research Lab) (2) Statistical Methods for Inducing 9:00AM - 12:15PM Models from Data (Prof. Steffen Lauritzen, Aalborg U.) (3) Probabilistic Models of Causality 2:00PM - 5:15PM (Prof. Glenn Shafer, Rutgers U.) (4) Statistical Models for Function 2:00PM - 5:15PM Estimation and Classification (Prof. Trevor Hastie, Stanford U.) Notes prepared by the tutorial presenters will be made available at the Workshop. LOCATION: The 1995 Workshop will be held at Pier Sixty Six Resort & Marina 2301 SE 17th Street Causeway Fort Lauderdale, Florida, 33316 USA. Phone: 800-327-3796 (outside Florida) 305-525-6666 Fax : 305-728-3541 The hotel is a 22 acre resort located on the intracoastal waterway. Available amenities include two pools, a 40 person hydrotherapy pool, spa, tennis courts, a children's activity club, seven restaurants and lounges, and water shuttle service to the beach. The Hotel is most conveniently reached from Fort Lauderdale International Airport, which is about 5-10 minutes by car/cab. The Hotel is approximately 45-60 minutes by car from Miami International Airport. The Resort is holding a block of rooms at the rate of $95 US dollars (for single/double) until Dec. 10, 1994. Reservations should be made before this date. The block is held under the name `SOCIETY for ARTIficial Intelligence and Statistics' (or SOCIETY ARTI). REGISTRATION: Registration for the Technical Program (plenary and poster sessions) includes a proceedings of papers submitted by authors, continental breakfasts each day of the technical program, and tentatively, two lunches and one dinner. The Workshop offers student rates and an early-registration discount. Registration rates and instructions can be found on the Registration Form at the end of this Call. Registration for tutorials can also be made in advance using the Registration Form. PROGRAM COMMITTEE: General Chair: D. Fisher Vanderbilt U., USA Program Chair: H. Lenz Free U. Berlin, Germany Members: W. Buntine NASA (Ames), USA J. Catlett AT&T Bell Labs, USA P. Cheeseman NASA (Ames), USA P. Cohen U. of Mass., USA D. Draper U. of Bath, UK Wm. Dumouchel Columbia U., USA A. Gammerman U. of London, UK D. J. Hand Open U., UK P. Hietala U. Tampere, Finland R. Kruse TU Braunschweig, Germany S. Lauritzen Aalborg U., Denmark W. Oldford U. of Waterloo, Canada J. Pearl UCLA, USA D. Pregibon AT&T Bell Labs, USA E. Roedel Humboldt U., Germany G. Shafer Rutgers U., USA P. Smyth JPL, USA Tutorial Chair: P. Shenoy U. Kansas, USA MORE INFORMATION: For more information write dfisher@vuse.vanderbilt.edu or call 615-343-4111. SPONSORS: Society for Artificial Intelligence and Statistics International Association for Statistical Computing *********** Papers accepted for Technical Program Fifth International Workshop on Artificial Intelligence and Statistics PLENARY PAPERS Almond, Schimert (MathSoft) Missing data models as meta-data Brent, Murthy, Lundberg Minimum description length induction (John Hopkins U) for discovering morphemic suffixes Buntine (NASA Ames) Software for data analysis with graphical models: basic tools Chickering, Geiger, Heckerman Learning Bayesian networks: search (MicroSoft) methods and experimental results Cohen, Gregory, Ballesteros, Two algorithms for inducing structural St Amant (U Mass) equation models from data Cooper (U Pitt) Causal discovery from observational data in the presence of selection bias Cox (US West) Using causal knowledge to learn more useful decision rules from data Decatur (Harvard U) Learning in hybrid noise environments using statistical queries Elder (Rice U) Heuristic search for model structure Gebhardt, Kruse Learning possibilistic networks from data (U Braunschweig) Kasahara, Ishikawa, Viewpoint-based measurement of semantic Matsuzawa, Kawaoka similarity between words (Nippon TT) Lubinsky (U Witwatersrand SA) Structured interpretable regression Madigan, Almond (U Washington) Test selection strategies for belief networks Malvestuto (U L'Aquila, IT) Derivation DAGs for inferring interaction models Merz (U Cal Irvine) Dynamic learning bias selection Pearl (UCLA) A causal calculus for statistical research with applications to observational and experimental studies Riddle, Frenedo, Newman Framework for a generic knowledge (Boeing) discovery tool Shafer, Kogan, Spirtes A generalization of the Tetrad (Rutgers) representation theorem St Amant, Cohen (U Mass) Preliminary design for an EDA assistant Yao, Tritchler (U Toronto) Likelihood-based causal inference POSTER PAPERS Aha, Bankert (NRL) A comparative evaluation of sequential feature selection algorithms Ali, Brunk, Pazzani Learning multiple relational rule-based (U Cal Irvine) models Almond (MathSoft) Hypergraph grammars for knowledge-based model construction Anderson, Carlson, Westbrook Tools for analyzing AI programs Hart, Cohen (U Mass) Bergman, Rivest (MIT) Picking the best expert from a sequence Blau (U Rochester) Ploxoma: Test-bed for uncertain inference Breese, Heckerman Probabilistic case-based reasoning (MicroSoft) Burke (U Nevada) Comparing the prediction accuracy of statistical models and artificial neural networks in breast cancer Catlett (ATT) Tailoring rulesets to misclassification cost Chen, Yeh Predicting stock returns with genetic (National Chengchi U) programming Cheng (U Cincinnati) Analysis and Application of the Generalized Mean-Shift Process Cozman, Krotkov (CMU) Truncated Gaussians as tolerance sets Cunningham (U Waikato) Textual data mining De Vel, Li, Coomans Non-Linear dimensionality reduction: (U James Cook, NZ) A comparative performance study DuMouchel, Friedman, Johnson Natural language processing of Hripcsak (Columbia U) radiology reports Esposito, Malerba, Semeraro A further study of pruning methods in (U degli Studi, IT) decision tree induction Feelders, Verkooijen Which method learns most from the data? (U Twente, Netherlands) Franz (CMU) Classifying new words for robust parsing Gelsema (Erasmus U, Abductive reasoning in Bayesian belief The Netherlands) networks using a genetic algorithm Harner, Galfalvy Omega-Stat: An environment for (West Virginia U) implementing intelligent modeling strategies Heckerman, Shachter A decision-based view of causality (MicroSoft) Howe (Colorado St U) Finding dependencies in event streams using local search Jenzarli (U Tampa) Solving influence diagrams using Gibbs sampling John (Stanford U) Robust linear discriminant trees Ketterlin, Gancarski, Korczak Hierarchical clustering of composite (U Louis Pasteur) objects with a variable number of components Kim (Korea Adv. Inst. of Sci. An approach to fitting large influence and Eng.) diagrams Kim, Moon (Syracuse U) Modeling life time data by neural networks Kloesgen (German Nat. Rsch.) Learning from data: Pattern evaluations and search strategies Larranaga, Murga, Poza, Structure learning of Bayesian networks Kuijpers (U Basque, by hybrid genetic algorithms Spain) Lekuona, Lacruz, Lasala Graphical models for dynamic systems (U de Zaragoza, Spain) Liu (U Kansas) Propagation of Gaussian belief functions Martin (U Cal, Irvine) A hypergeometric null hypothesis probability test for feature selection and stopping Martin (U Cal, Irvine) Evaluating and comparing classifiers: Complexity measures Murthy (John Hopkins U) Statistical preprocessing of decision trees Neufeld, Adams, Choy, Philip, Part-of-speech tagging from small Tawfik (U Saskatchewan) data sets Oates, Gregory, Cohen (U Mass) Detecting complex dependencies in categorical data Pazzani (U Cal Irvine) Searching for attribute dependencies in Bayesian classifiers Provan, Singh (Inst. for Learning ``Predictively-Optimal'' Decision Systems Res.) Bayesian Networks Risius, Seidelmann Combining statistics and AI in the (Hahn-Meitner Inst) optimization of semiconductor films for solar cells Shenoy (U Kansas) Representing and solving asymmetric decision problems using valuation networks Srkantan, Srihari Data representations in learning (SUNY Buffalo) Sun, Qiu, Cox (US West) A hill-climbing approach to construct near optimal decision trees Valtorta (U South Carolina) MENTOR: A Bayesian model for prediction and intervention in mental retardation Young, Lubinsky (UNC) Learning from data by guiding the analyst: On the representation, use, and creation of visual statistical strategies *********** Registration Form Fifth International Workshop on Artificial Intelligence and Statistics Participants may register on site. To register in advance of the Workshop send this form and a check (in US dollars) made to the order of **Society for Artificial Intelligence and Statistics** in the appropriate amount to: Doug Fisher Department of Computer Science Box 1679, Station B Vanderbilt University Nashville, Tennessee 37235 USA Advance registration discounts apply if registration is received by Dec. 1, 1994. Name: ________________________________________ Affiliation: _________________________________ Phone: _______________________________________ Fax: _________________________________________ Email: _______________________________________ Address: _____________________________________ _____________________________________ _____________________________________ Technical Program -- check one: ____ Technical Program (regular, by Dec. 1, 1994): $245 ____ Technical Program (student, by Dec. 1, 1994): $155 ____ Technical Program (regular, after Dec. 1, 1994): $295 ____ Technical Program (student, after Dec. 1, 1994): $195 Technical Program Subtotal: $____ Tutorial Program -- check applicable tutorials, if any. Note that the tutorial times may conflict; to avoid conflict at most one selection from (1) and (2), and one selection from (3) and (4) may be made. ____ (1) Machine Learning ____ (regular, by Dec. 1): $ 70 ____ (student, by Dec. 1): $ 45 ____ (regular, after Dec. 1): $ 80 ____ (student, after Dec. 1): $ 55 ____ (2) Statistical Methods for Inducing Models from Data ____ (regular, by Dec. 1): $ 70 ____ (student, by Dec. 1): $ 45 ____ (regular, after Dec. 1): $ 80 ____ (student, after Dec. 1): $ 55 ____ (3) Probabilistic Models of Causality ____ (regular, by Dec. 1): $ 70 ____ (student, by Dec. 1): $ 45 ____ (regular, after Dec. 1): $ 80 ____ (student, after Dec. 1): $ 55 ____ (4) Statistical Models for Function Estimation and Classification ____ (regular, by Dec. 1): $ 70 ____ (student, by Dec. 1): $ 45 ____ (regular, after Dec. 1): $ 80 ____ (student, after Dec. 1): $ 55 Tutorial Program Subtotal: $____ Technical and Tutorial Total: $____ *********** -------------------------------------------- From: gbp@cts.com (Gagan Patnaik) Date: Sun, 2 Oct 1994 22:46:55 -0700 X-Mailer: Mail User's Shell (7.2.5 10/14/92) To: kdd@gte.com Subject: CFP: Computational AI applications in Geophysical Sciences Gregory, please post this CFP on KDD list. This announcement is also being posted on relevant newsgroups on the USENET. Regards, Gagan ----------------------------------------------------------------------- Symposium on the APPLICATION OF ARTIFICIAL INTELLIGENCE COMPUTING IN GEOPHYSICS Jointly sponsored by the Int'l Association of Seismology and Physics of the Earth's Interior (IASPEI) and the Society of Exploration Geophysicists (SEG, USA) to be held under the auspices of the XXI General Assembly of the International Union of Geodesy and Geophysics (IUGG) July 2 - 14, 1995 at Boulder, Colorado, USA Hosted by the U. S. National Academy of Sciences Organized by the American Geophysical Union (AGU) and the University of Colorado at Boulder Symposium Date: JULY 12, 1995 (Wednesday) Abstract Submission Deadline: FEBRUARY 1, 1995 Papers in the form of ORAL or POSTER presentation are sought on all aspects of Artificial Intelligence computing applications in Geophysical Sciences including but not limiting to, NEURAL COMPUTING FUZZY SET THEORY (SOFT COMPUTING) EVOLUTIONARY COMPUTING (GENETIC ALGORITHMS) AUTOMATED REASONING TECHNIQUES KNOWLEDGE-BASED SYSTEMS MACHINE LEARNING AND KNOWLEDGE ACQUISITION DATABASE MINING AND KNOWLEDGE DISCOVERY. This symposium is one of several geophysical symposia and workshops being held during the general assembly of the IUGG associations. For this interdisciplinary symposium, abstract submissions will be accepted for any topic related to Geophysical Sciences *with* computing applications from the above broad definition of Artificial Intelligence techniques. The emphasis is on *applications* to Geophysical Problems related to the Earth and it's Environment. The Geophysical topics of interest for this interdisciplinary symposium include but are not limited to, SOLID EARTH GEOPHYSICS, OCEAN SCIENCES, HYDROLOGY, METEOROLOGY AND SPACE-BASED TECHNOLOGIES APPLIED TO THE EARTH AND IT'S ENVIRONMENT For example, some of the problem areas from Solid Earth Geophysics are, Earthquake and Explosion Seismology, Petroleum Geophysics and Reservoir Modeling (Resource Exploration and Production), Engineering Geophysics (e.g., Seismic Hazard Assessment and Earthquake Engineering), Environmental Geophysics (techniques utilized for subsurface investigations and environmental remediation), and Mining Geophysics. The common themes that bind all presentations in this symposium are the Artificial Intelligence computing techniques applied to processing, interpretation and management of scientific data from Geophysical observation, simulation and modeling. ABSTRACT SUBMISSION Presentations of results on completed work, as well as work-in-progress, are encouraged. At least one paper submitted by every author will be accepted for either an oral or poster presentation. Additional papers from the same author will also be considered. One camera-ready copy and two additional copies of each abstract must be submitted in the prescribed format, and received before the deadline (February 1, 1995). Each abstract printed in the specified format should clearly indicate the symposium code and title "Application of Artificial Intelligence Computing in Geophysics". Abstract format, fees, and instructions for Electronic submission will be announced shortly. Mailing address to send (original + 2 copies): IUGG XXI General Assembly c/o American Geophysical Union 2000 Florida Avenue, N.W. Washington, D.C. 20009 Please also send one copy directly to one of the conveners listed at the end of this message (electronic mail preferred). SOCIAL EVENTS and FREE CIRCULATION OF SCIENTISTS The symposium as part of the IUGG General Assembly will be held at the University of Colorado, Boulder, campus. Boulder, Colorado is located at the base of the foothills of the Rocky Mountains. Many exciting social events and geological field trips are being planned for participants and accompanying persons. The past general assembly, IUGG-94, was held in Wellington, New Zealand, and was attended by participants from more than 40 countries. "The Organizing Committee fully supports the basic policy of nondiscrimination and affirms the rights of scientists throughout the world to adhere to or associate with international scientific activity without restrictions based on nationality, race, color, age, religion, political philosophy, ethnic origin, citizenship, language, or sex. The Committee affirms its support of the Int'l Council of Scientific Unions (ICSU) principle of nondiscrimination and endorses the guidelines by the ICSU Standing Committee on the Free Circulation of Scientists". REGISTRATION AND HOUSING All participants are required to register and pay appropriate fees. There will be reduced fees for students and doctoral candidates under the age of 30. There will also be a charge for accompanying persons who are not attending the scientific programs. Registration and fees information will be provided in the next announcement. There will be a number of Hotels to select from with room rates ranging from $65 - $120 (U.S. dollars) per night (details in the next announcement). Campus housing will also be available. The University of Colorado at Boulder has set aside a large number of dormitory rooms. Campus lodging includes breakfast and dinner and will cost roughly $50 (U.S. dollars) per night. Family housing accommodation and campus parking will also be available. FURTHER INFORMATION AND CONTACTS Various announcements relating to IUGG-95 are being published in EOS (a weekly publication of the American Geophysical Union; issues: April 5, April 19, and August 23, 1994); excerpts from which are included in this message. This message and additional information including the next announcement will also be made available on the INTERNET. Some information of general nature about the IUGG XXI General Assembly may also be obtained by contacting the American Geophysical Union. (Telephone: +1 202 462 6900, Fax: +1 202 328 0566, Email: iugg_xxiga@kosmos.agu.org). For *this symposium* related matters, or for further assistance, please contact one of the conveners: Dr. Gagan B. Patnaik Dr. Fred Aminzadeh Advanced Geocomputing Technologies UNOCAL Corporation P.O.Box - 927477 5460 East La Palma San Diego, CA 92192-7477 Anaheim, California 92817 Phone: +1 619 535 4840 Phone: +1 714 693 6990 Fax: +1 619 535 4890 Fax: +1 714 693 5824 Email: gpatnaik@aip.org Email: fred.aminzadeh@st.unocal.com or, g.patnaik@ieee.org ----------------------------------------------------------------------- Date: Mon, 3 Oct 1994 From: gps@gte.com (Gregory Piatetsky-Shapiro) Subject: ComputerWorld: Parallel processing mines retail data Sep 26, 1994 ComputerWorld has an article by Charles Babcock, entitled "Parallel processing mines retail data" (p. 6). The article describes Wal-Mart experience in using AT&T Global Information Systems massively parallel machine (formerly known as Teradata). With that machine, Wal-Mart is able to load 20 million transactions per day and also to support 2300 complex SQL queries that mine the central database for information. -------------------------------------------- Date: Tue, 11 Oct 1994 From: gps@gte.com (Gregory Piatetsky-Shapiro) Subject: AI Magazine (Fall 1994) published KDD-93 report. The article, by Gregory Piatetsky-Shapiro (GTE Laboratories), Christopher Matheus (GTE Laboratories), Padhraic Smyth (JPL), Ramasamy Uthurusamy (GM Research Laboratories) is entitled: KDD-93: Progress and Challenges in Knowledge Discovery in Databases It gives a long report on KDD-93 workshop and presents our viewpoints on the directions of KDD. A slightly earlier version was published as KDD Nugget 93:7 (Nov 1993) and is in http://info.gte.com/~kdd/nuggets/93/n7.txt -------------------------------------------- Date: Thu, 29 Sep 94 8:10:28 EDT From: Tom.Mitchell@cs.cmu.edu (via ML list) Subject: Machine Learning course available MACHINE LEARNING COURSE NOTES AVAILABLE ON MOSAIC The lecture slides and syllabus for CMU's course on Machine Learning are now available on the web. Feel free to use the slides or handouts in your own courses if you find them helpful. This material is from the fall 1994 course at CMU, offered to upper-level undergraduates and graduate students. Suggestions for improvements are solicited! Also, any good homework problems. (suggestions -> Tom.Mitchell@cmu.edu). The URL is http://www.cs.cmu.edu:8001/afs/cs.cmu.edu/usr/avrim/www/ML94/courseinfo.html Tom Mitchell and Avrim Blum ------------------------------