KDD Nuggets 94:2, 1994-01-25 Contents: * Sarab: KDD Successes in Europe * GPS: Firms look at massively parallel computers for data mining; Market for data mining is estimated at $500M today (ComputerWorld). * GPS: Spec. Issue of IEEE TKDE on Learning and Discovery in Databases * Marcel Holsheimer -- Data Mining Report available The KDD Nuggets is an informal moderated list for spreading the information relevant to Knowledge Discovery in Databases (KDD), e.g. application descriptions, conference announcements, tool reviews, information requests, interesting ideas, outrageous opinions, etc. Contributions to kdd@gte.com; Add/delete requests to kdd-request@gte.com -- Gregory Piatetsky-Shapiro ------------------------------------ From: Sarab Subject: KDD Successes Date: Tue, 25 Jan 94 14:46 BST A magazine called Computing had an article on Database Mining in the issue of the 20th of January. It covered three commercial products for database Mining : NetMap - developed by John Galloway, an Australian Economist/Engineer Logica's Data Mariner Matrica's 4Thought The European Database Mining Community seem to prefer the use of Neural Networks for database mining with most of the commercial products based on networks. About KDD successes : 1. Last year the British Army embarked on a major review of the Army's carres management structure. Using data from payroll, personnel and pension files, NetMap helped to identify the essential groups in personnel and reorganize the key structures of the Army Personnel Centre. This helped in reducing the personnel infrastucture to 700-800 operated from one site from the original 2600 in 11 seperate organizations. 2. NetMap was used by the Dutch Police to round up a mafia ring of 90 narcotics and murder suspects . 3. NetMap was also used by the Serious Fraud Office in UK to highlight a mortgage fraud. 4. CIC Video is one of the major distributors of pre-recorded films to shops and video stores. Its marketing executives had believed for a while that there was a relation between the success of a film in the cinema and its demand on video. But they had great difficulty in generating emperical evidence. Using 4Thought, CIC was able to highlightrelationships between some "very obscure" pieces of data, allowing a much more accurate prediction about video sales. 5. Thomas Cook's, tour operators, marketing department used data mining techniques to match characteristics of prospective travellers and preferred holiday destinations to enable better target mailings. --------------------------------- From: gps@gte.com (Gregory Piatetsky-Shapiro) Subject: Firms look at massively parallel computers for data mining Date: 25 Jan 1994 The article by E. Booker, "Firms set to mine for supercomputing gold", ComputerWorld, January 17, 1994, p. 65, reports that some of the country's top retailing, financial, and transporation companies are ready to become supercomputer users. Some of these (unnamed) companies are interested in data mining, using massively parallel processing (MPP) systems. The article quotes an IBM manager saying that "Many commercial users ... will soon be getting a terabyte of information a day". David Frankel, director of technology at Smaby Group, Inc, estimated that the market for Data Mining today is about $500 million. --------------------------------- From: gps@gte.com (Gregory Piatetsky-Shapiro) Subject: Special issue on Learning and Discovery in Databases of IEEE Transactions on Knowledge and Data Engineering. Date: 21 Jan 1994 Contents: ============ Regular papers ================ Guest Editors Introduction, N. Cercone and M. Tsuchiya Systems for Knowledge Discovery in Databases, C. J. Matheus, P. Chan, and G. Piatetsky-Shapiro Database Mining: A Performance Perspective, R. Agrawal, T. Imielinski, A. Swami Abstract-Driven Pattern Discovery in Databases, V. Dhar and A. Tuzhilin Inductive Learning in Deductive Databases, S. Dzeroski and N. Lavrac Learning Transformation Rules for Semantic Query Optimization: A Data-Driven Approach, S. Shekhar, B. Hamidzadeh, A. Kohli, and M. Coyle ============ Concise papers ============= From Data Properties to Evidence, D. Bell Inductive Database Relations, F. Bergadano A Framework for Knowledge Discovery and Evolution in Databases, J. P. Yoon and L. Kerschberg Induction of Rules Subject to a Quality Constraint: Probabilistic Inductive Learning, Ozden Gur-Ali and William Wallace =========== Correspondence papers ============ Knowledge Discovery in Molecular Databases, D. Conklin, S. Fortier, and J. Glasgow A History Approach for Automatic Metadata Inference in VLSI Design Database, Tzi-cker Chiueh and R. H. Katz Methodologies and Experience in the Development and Maintenance of Predictive Models of Large Databases, B. R. Gaines and P. Compton Discovery of Inexact Concepts from Structural Data, L. B. Holder and Diane J. Cook Avoiding Misconstruals in Database Systems: A Default Logic Approach, A. S. Hemerly, M. A. Casanova and A. L. Furtado Calculating Salience and Breadth of Knowledge, L. Rau -------------------------------------------------- From: marcel@cwi.nl (Marcel Holsheimer) Subject: Data Mining report available Date: Tue, 18 Jan 1994 16:28:39 GMT (note: this report is extensive -- 78 pages, but not very up to date. It gives a general overview of various learning methods and looks in detail at ID3, AQ15, CN2, DBLearn, Meta-Dendral, and RADIX/RX. Does not cover any of the newer material, e.g. nothing about the 91-93 KDD workshops. -- Gregory Piatetsky-Shapiro) This following report can be obtained by ftp: _________________________________________________________________ DATA MINING The Search for Knowledge in Databases Marcel Holsheimer, Arno Siebes Abstract Data mining is the search for relationships and global patterns that exist in large databases, but are `hidden' among the vast amounts of data, such as a relationship between patient data and their medical diagnosis. These relationships represent valuable knowledge about the database and objects in the database and, if the database is a faithful mirror, of the real world registered by the database. One of the main problems for data mining is that the number of possible relationships is very large, thus prohibiting the search for the correct ones by simple validating each of them. Hence, we need intelligent search strategies, as taken from the area of machine learning. Another important problem is that information in data objects is often corrupted or missing. Hence, statistical techniques should be applied to estimate the reliability of the discovered relationships. The report provides a survey of current data mining research, it presents the main underlying ideas, such as inductive learning, and search strategies and knowledge representations used in data mine systems. Furthermore, it describes the most important problems and their solutions, and provides an survey of research projects. CR subject classification (1991): Database applications (H.2.8), Information search and retrieval (H.3.3), Learning (I.2.6) concept learning, induction, knowledge acquisition, Clustering (I.5.3) keywords: database applications, machine learning, inductive learning, knowledge acquisition, data summarization _____________________________________________________________________ The report can be obtained by anonymous ftp: & ftp ftp.cwi.nl Name (ftp.cwi.nl:marcel): ftp 331 Guest login ok, send ident (your e-mail address) as password. Password: ftp binary ftp cd pub/CWIreports/AA ftp get CS-R9406.ps.Z ftp bye ________________________________________________________________________ Marcel Holsheimer | Centre for Mathematics and Computer Science (CWI) phone +31 20 592 4134 | Kruislaan 413, Amsterdam, The Netherlands