KDD Nugget 94-7, mailed 1994/04/18 Contents: * G. Piatetsky-Shapiro, Time: Attack of the Data Miners Business Week: Gold Mine of Data in Customer Service ComputerWorld: Data is money, but people are special US Census Bureau is now on WWW at http://www.census.gov/ * Tej Anand, AT&T Data Mining Conference * Larry Ai, TRW Smart Charts for Pharmaceuticals * Edwin Pednault, MDL workshop at ML/COLT 94 * Roberto Zicari, CFP: Theory and Practice of Object Systems ********Quote of the week: ***************************************** Many business people ... want to learn more about and are in support of [the information superhighway] because they don't want to end up as roadkill. Al Gore, Feb 1994 Information superhighway into the home: turning couch potatoes into couch commandos Louise Kehoe, Financial Times (London), 8 March 1994 ********************************************************************** The KDD Nuggets is a moderated list for the exchange of information relevant to Knowledge Discovery in Databases (KDD), e.g. application descriptions, conference announcements, tool reviews, information requests, interesting ideas, clever opinions, etc. If you have something relevant to KDD, send it to kdd@gte.com ; Add/delete requests to kdd-request@gte.com -- Gregory Piatetsky-Shapiro (moderator) ********************* Official disclaimer ****************************** * All opinions expressed in this list are those of the writers (or the * * moderator), and not necessarily of their respective employers. * ************************************************************************ ------------------------------------ Date: Mon, 18 Apr 94 From: gps@gte.com (Gregory Piatetsky-Shapiro) Subject: Reviews of articles on Data Mining -- Time (April 11, 1994, p. 34-35) article "Attack of the Data Miners", by John Skow, gives a largely unsympathetic portrayal of how the quantitative analysts ("quants") came to dominate the trading on Wall Street, especially in various "derivatives". -- Business Week: Gold Mine of Data in Customer Service (March 21, 1994 -- Business Week reports on how companies use their customer records for improving service -- "it's proving a boon to design and marketing". Competitive pressure forces companies to probe the gold mine of customer-service data they have been gathering for so long. Massively parallel computers from Teradata and other manufacturers are used to seek out "faint but significant" patterns. Companies including Whirlpool, AT&T, ConRail, and Otis, all report significant results in improving customer service. -- Ester Dyson writes in ComputerWorld (Data is Money, but people are special, April 4, 1994, p. 33) arguing that people's information privacy should be protected. While there are many potential applications from data mining in improved sales, marketing, etc, publishing a data "image" of a person, she writes, just like improperly publishing a photo image of a non-public figure, is improper, and can result in financial damages. Date: March 18, 1994 From: United States Bureau of the Census on What is New in Mosaic The United States Bureau of the Census has opened an information server on the internet. Please explore our service and tell us what you think. Connect to our beta WWW site by pointing your client software to http://www.census.gov/. We will support many services, including gopher and ftp. We also plan to offer a majordomo mail server in the near future. If you have problems, questions, suggestions, etc, send email to gatekeeper@census.gov. ------------------------------------ Date: Sat, 02 Apr 94 07:40:56 EST From: "Anand, Tej" Subject: AT&T Data Mining Conference An internal AT&T workshop on Knowledge Discovery in Databases was held at San Diego on February 21 and 22. The attendance was fairly diverse including technical as well as sales & marketing associates. There is a fair amount of interest in finding tools and techniques that can help users deal with large (in some cases, terabytes) amounts of data. This was an exploratory workshop bringing together people who might have needs or interests in this area. It was not clear that the knowledge discovery systems in the marketplace addressed user needs effectively or the discovery techniques being developed were capable of dealing with really large volumes of data without sophisticated pre-processing. Usama Fayyad gave a presentation on work being done at JPL and there were software demonstration of the Recon database mining tool from Lockheed, the ReMind case based reasoning tool, and Extract a tool for integrating disparate legacy data into a relational model. ------------------------------------ Date: Wed, 6 Apr 94 11:03:29 PDT From: larry@esl.com (Larry Ai) Subject: TRW Smart Charts for Pharmaceuticals The following is a brief description of our product. TRW Smart Charts for Pharmaceuticals is a competitive analysis tool which will automate the laborious process of converting volumes of pharmaceutical market data into meaningful tables, improving pharmaceutical market analysis results and reducing associated costs. TRW Smart Charts will help business planners and market analysts keep pace with the increasing amounts of data brought by on-line access. With TRW Smart Charts, users generate the tables they want in minutes, not hours or days. There is no need to wait for another expert to conduct an on-line search and compile the results into a usable format. Government analysts and decision makers have benefited from the automated text extraction, relational data base, and user interface technology developed by ESL, a TRW company, for demanding applications. Now TRW Smart Charts is putting those technologies to work to reduce the information overload faced by analysts and decision makers in the pharmaceutical industry. ----------------------------------------------------------------------- Date: Wed, 6 Apr 94 16:33:57 EDT From: epdp@big.att.com (Edwin Pednault) Subject: MDL workshop at ML/COLT 94 conference Workshop on Applications of Descriptional Complexity to Inductive, Statistical, and Visual Inference Sunday, July 10, 1994 Rutgers University New Brunswick, New Jersey Held in Conjunction with the Eleventh International Conference on Machine Learning (ML94, July 11-13, 1994) and the Seventh Annual Conference on Computational Learning Theory (COLT94, July 12-15, 1994). Interest in the minimum description-length (MDL) principle is increasing in the machine learning and computational learning theory communities. One reason is that MDL provides a basis for inductive learning in the presence of noise and other forms of uncertainty. Another reason is that it enables one to combine and compare different kinds of data models within a single unified framework, allowing a wide range of inductive-inference problems to be addressed. Interest in the MDL principle is not restricted to the learning community. Inductive-inference problems arise in one form or another in many disciplines, including information theory, statistics, computer vision, and signal processing. In each of these disciplines, inductive-inference problems have been successfully pursued using the MDL principle and related descriptional complexity measures, such as stochastic complexity, predictive MDL, and algorithmic probability. The purpose of this workshop is two fold: (1) to provide an opportunity to researchers in all disciplines involved with descriptional complexity to meet and share results; and (2) to foster greater interaction between the descriptional complexity community and the machine learning and computational learning theory communities, enabling each group to benefit from the results and insights of the others. To meet these objectives, the format of the workshop is designed to maximize opportunities for interaction among participants. In addition, a tutorial on descriptional complexity will be held prior to the workshop to encourage broad participation. The tutorial and workshop may be attended together or individually. The topics of the workshop will include, but will not be limited to, - Applications of descriptional complexity to all forms of inductive inference, including those in statistics, machine learning, computer vision, pattern recognition, and signal processing. - Rates of convergence, error bounds, distortion bounds, and other convergence and accuracy results. - New descriptional complexity measures for inductive learning. - Specializations and approximations of complexity measures that take advantage of problem-specific constraints. - Representational techniques, search techniques, and other application and implementation related issues. - Theoretical and empirical comparisons between different descriptional complexity measures, and with other learning techniques. WORKSHOP FORMAT The workshop will be held on Sunday, July 10, 1994. Attendance will be open. However, those who wish to attend should contact the organizers prior to the workshop at the address below. To maximize the opportunity for interaction, the workshop will consist primarily of poster presentations, with a few selected talks and a moderated wrap-up discussion. Posters will be the primary medium for presentation. This medium was chosen because it encourages close interaction between participants, and because many more posters can be accommodated than talks. Both factors should encourage productive interaction across a wide range of topics despite the constraints of a one-day workshop. Depending on the number and quality of the submissions, arrangements may be made to publish a book of papers after the workshop under the auspices of the International Federation for Information Processing Working Group 14.2 on Descriptional Complexity. SUBMISSIONS Posters will be accepted on the basis of extended abstracts that should not exceed 3000 words, excluding references (i.e., about six pages of text, single spaced). Separate one-page summaries should accompany the submitted abstracts. The summary pages of accepted abstracts will be distributed to all interested participants prior to the workshop, and should be written accordingly. Summaries longer than one page will have only their first page distributed. Six copies of each extended abstract and two copies of each summary page must be received at the address below by May 18, 1994. Acceptance decisions will be made by June 10, 1994. Copies of the summary pages of accepted abstracts will be mailed to all those who submit abstracts and to those who contact the organizers before the decision date. Because we expect the audience to be diverse, clarity of presentation will be a criterion in the review process. Contributions and key insights should be clearly conveyed with a wide audience in mind. Authors whose submissions are accepted will be expected to provide the organizers with full-length papers or revised versions of their extended abstracts when they arrive at the workshop. These papers and abstracts will be used for the publisher's review. Authors may wish to bring additional copies to distribute at the workshop. IMPORTANT DATES May 18 Extended abstracts due June 10 Acceptance decisions made, summary pages distributed July 10 Workshop PROGRAM COMMITTEE Ed Pednault (Chair), AT&T Bell Laboratories. Andrew Barron, Yale University. Ron Book, University of California, Santa Barbara. Tom Cover, Stanford University. Juris Hartmanis, Cornell University. Shuichi Itoh, University of Electro-Communications. Jorma Rissanen, IBM Almaden Research Center. Paul Vitanyi, CWI and University of Amsterdam. Detlef Wotschke, University of Frankfurt. Kenji Yamanishi, NEC Corporation. CONTACT ADDRESS Ed Pednault AT&T Bell Laboratories, 4G-318 101 Crawfords Corner Road Holmdel, NJ 07733-3030 email: epdp@research.att.com tel: 908-949-1074 ----------------------------------------------------------------- Tutorial on Descriptional Complexity and Inductive Learning One of the earliest theories of inductive inference was first formulated by Solomonoff in the late fifties and early sixties. It was expanded in subsequent and, in some cases, independent work by Solomonoff, Kolmogorov, Chaitin, Wallace, Rissanen, and others. The theory received its first citation in the AI literature even before its official publication. It provides a basis for learning both deterministic and probabilistic target concepts, and it establishes bounds on what is computationally learnable in the limit. Over time, this theory found its way into several fields, including probability theory and theoretical computer science. In probability theory, it provides a precise mathematical definition for the notion of a random sample sequence. In theoretical computer science, it is being used among other things to prove lower bounds on the computational complexity of problems, to analyze average-case behavior of algorithms, and to explore the relationship between the succinctness of a representation and the computational complexity of algorithms that employ that representation. Interest in the theory diminished in artificial intelligence in the mid to late sixties because of the inherent intractability of the theory in its most general form. However, research in the seventies and early eighties led to several tractable specializations developed expressly for inductive inference. These specializations in turn led to applications in many disciplines, including information theory, statistics, machine learning, computer vision, and signal processing. The body of theory as it now stands has developed well beyond its origins in inductive inference, encompassing algorithmic probability, Kolmogorov complexity, algorithmic information theory, generalized Kolmogorov complexity, minimum message-length inference, the minimum description-length (MDL) principle, stochastic complexity, predictive MDL, and related concepts. It is being referred to collectively as descriptional complexity to reflect this evolution. This tutorial will provide an introduction to the principal concepts and results of descriptional complexity as they apply to inductive inference. The practical application of these results will be illustrated through case studies drawn from statistics, machine learning, and computer vision. No prior background will be assumed in the presentation other than a passing familiarity with probability theory and the theory of computation. Attendees should expect to gain a sound conceptual understanding of descriptional complexity and its main results. The tutorial will be held on Sunday, July 10, 1994. ----------------------------------------------------------------------- From: zicari@informatik.uni-frankfurt.de Subject: TAPOS Date: Mon, 11 Apr 94 23:59:49 MESZ Call for Papers Theory and Practice of Object Systems (TAPOS) ============================================= Editors in chief: Karl Lieberherr, Northeastern University, Boston, Massachusetts Roberto Zicari, Johann Wolfgang Goethe University, Frankfurt, Germany ************** Aims and Scope ************** Theory and Practice of Object Systems is an archival, peer reviewed journal dedicated to publishing high quality research results. Papers will be selected primarily in areas of Object Technology, including, but not limited to: - Programming Languages and Models - Foundations, Semantics, Type Theory - Database Management Systems and Database Languages - Concurrency - Distribution - Software Engineering and Software Development Tools and Environments - Formal Specification - Metrics and Evaluation - Analysis and Design Methods - Novel Applications - Operating Systems Contributions in other areas of object-based computing are also welcome. Research contributions on these aspects will be collected under the interdisciplinary umbrella of the object-oriented approach they have in common rather than from the point of view of the parent discipline. Theoretical papers should either break significant new ground or unify and extend existing theories. Systems papers should emphasize the underlying principles and important discoveries, backed up by architectural and implementation details. Published quarterly, Theory and Practice of Object Systems (TAPOS) disseminates new, but long lasting concepts and results of high quality useful to researchers and practitioners of object technology. The main goal of TAPOS is to make a fundamental contribution to the growth and consolidation of a scientific object community with high intellectual standards. The journal is a service to the object community in that it provides a forum for stringently refereed, noteworthy, and relevant results. ********************* How to submit a paper ********************* The editors-in-chief encourage the submission of contributions from all parts of the world. Five (5) copies of submitted articles should be sent to one of the editors-in-chief. The editors-in-chief will assign the article to an Associate Editor whose subject area expertise is appropriate to the article's subject. Published papers will include the name of the Associate Editor who managed the refereeing process. A special transfer of copyright agreement, signed and executed by the author, must be provided when an article is accepted. (If the article is a work made for hire, the agreement must be signed by the employer.) Copies of the copyright agreement may be obtained from the editors-in-chief through e-mail. The corresponding author will receive 25 free reprints. There is no page charge to authors, unless color printing is requested. Papers are processed with the understanding that they have not been published, submitted or accepted for publication elsewhere. Please submit your paper to either of the following addresses: Professor Karl Lieberherr Editor, TAPOS Northeastern University College of Computer Science 125 Cullinane Hall Boston, MA 02115-9959 U.S.A. lieber@CCS.neu.EDU Professor Roberto Zicari Editor, TAPOS Johann Wolfgang Goethe-Universitaet Fachbereich Informatik (20) Robert Mayer Strasse 11-15 D-60325 Frankfurt am Main, Germany zicari@informatik.uni-frankfurt.de All other correspondence concerning reprints, subscriptions, etc. should be sent to John Wiley & Sons, Inc. ATTN: S. Straub 605 Third Ave. New York, NY 10158-0012 E-mail: sstraub@jwiley.com ****************** Associate editors: ****************** Professor Gul Agha Department of Computer Science 1304 W Springfield Ave University of Illinois Urbana, IL 61801 Areas: concurrent programming languages, semantics, parallel computing ----------------------------------------------------------- Dr. H. V. Jagadish Computing Systems Research Laboratory, MH 2T204 AT&T Bell Laboratories 600 Mountain Ave. Murray Hill, NJ 07974 Areas: object-oriented databases ----------------------------------------------------------- Professor TSE Maibaum Head, Department of Computing Imperial College of Science Technology and Medicine 180 Queen's Gate London SW7 2BZ UK Areas: formal methods, specification and implementation, concurrency and real time, modularisation. ----------------------------------------------------------- Dr. Jose Meseguer Computer Science Laboratory SRI International 333 Ravenswood Avenue Menlo Park, CA 94025, USA Areas: mathematical foundations of OOP, formal specification of OO systems, declarative approaches to concurrent OOP ----------------------------------------------------------- Professor Atsushi Ohori Research Institute for Mathematical Sciences Kyoto University Sakyo-ku, Kyoto 606-01 Japan Areas: type systems data models database programming language ----------------------------------------------------------- Dr. Harold Ossher, H1-B26 IBM T. J. Watson Research Center P. O. Box 704 Yorktown Heights, NY 10598 Areas: software composition system structure software development environments object-oriented languages ----------------------------------------------------------- Dr. Remo Pareschi Rank Xerox Research Centre 6, chemin de Maupertuis F-38240 Meylan France Areas: concurrency and distribution, object-oriented logic programming languages, object coordination schemas ----------------------------------------------------------- Professor Michael I. Schwartzbach Computer Science Department Aarhus University Ny Munkegade DK-8000 Aarhus C Denmark Areas: type systems, semantics, theory, implementation. ----------------------------------------------------------- Professor Mario Tokoro Department of Computer Science Keio University 3-14-1 Hiyoshi, Kohoku-ku, Yokohama 223 Japan Areas: Concurrent and Distributed Computation Models, Programming Languages, Operating Systems, and MultiAgent Systems. ----------------------------------------------------------- Professor Akinori Yonezawa Dept. of Information Science Faculty of Science University of Tokyo Hongo, Bunkyo-ku Tokyo 113 Japan Areas: concurrency, algorithms, language design, language implementation. ----------------------------------------------------------- Editorial board: Abiteboul Serge, INRIA, Paris, France Bertino Elisa, University of Milano, Milano, Italy Bruce Kim, Williams College, Williamstown, MA Cardelli Luca, Digital, Systems Research Center, Palo Alto, CA Freeman-Benson Bjorn, Carleton University, Ottawa Canada Gehani Narain, AT&T Bell Labs, Murray Hill, NJ Ghezzi Carlo, Politecnico di Milano, Milano, Italy Gutknecht Juerg, Swiss Federal Institute of Technology, Zurich, Switzerland King Roger, University of Colorado, Boulder, CO Koskimies Kai, University of Tampere, Tampere, Finland Mandrioli Dino, Politecnico di Milano, Milano, Italy Mitchell John, Stanford University, Palo Alto, CA Palsberg Jens, Northeastern University, Boston, MA Pirahesh Hamid, IBM Almaden, Almaden, CA Reif John, Duke University, Durham, NC Reuter Andreas, University of Stuttgart, Stuttgart, Germany Scholl Marc, University of Ulm, Ulm, Germany Soley Richard, Object Management Group, Framingham, MA Zdonik Stanley, Brown University, Providence, RI The latest version of this document can be obtained at any time by sending mail to majordomo@ccs.neu.edu with the only contents line info tapos ================================================================== Free Sample issue and Subscription Order Form Theory and Practice of Object Systems Please enter my subscription to Theory and Practice of Object Systems Volume 1, 1995, 4 issues, ISSN 1074-3227 at the rate I have selected Personal rate __ $60 US and Can. __ $80 Outside North America Institutional rate __ $170 US and Can. __ $210 Outside North America Prices include shipping, handling, and packing charges worldwide. Air service included in the subscription price outside the U.S. Personal rate subscriptions are available to individuals and must be prepaid. Subscriptions are entered on a calendar-year basis only. The latest issue, as well as all published issues of the current volume, will be shipped after your payment is received. __Please send me a FREE sample issue. Method of Payment: __Check enclosed. All checks must be drawn on a U.S. bank and payable to John Wiley & Sons. __Purchase order enclosed. __Charge my credit card __MasterCard __Visa __American Express Card Number _______________________________ Expiry _____ Signature______________________________________________ Name____________________________________ John Wiley & Sons, Inc. ATTN: S. Straub 605 Third Ave. New York, NY 10158-0012 E-mail: sstraub@jwiley.com ============================================= Please add yourself to the TAPOS mailing list. You can subscribe/unsubscribe to the list with two simple commands. subscribe tapos Subscribe yourself. unsubscribe tapos Unsubscribe yourself. Commands should be sent in the body of an email message to "majordomo@ccs.neu.edu". Commands in the "Subject:" line NOT processed. If you have any questions or problems, please contact "domomaster@ccs.neu.edu". You can post a message to the entire list by sending mail to "tapos@ccs.neu.edu". ==============================================