KDD Nuggets 94:2, 1994-01-25
Contents: 
	* Sarab: KDD Successes in Europe
	* GPS: Firms look at massively parallel computers for data mining;
	  Market for data mining is estimated at $500M today (ComputerWorld).
	* GPS: Spec. Issue of IEEE TKDE on Learning and Discovery in Databases
	* Marcel Holsheimer  -- Data Mining Report available

The KDD Nuggets is an informal moderated list for spreading the
information relevant to Knowledge Discovery in Databases (KDD), e.g.
application descriptions, conference announcements, tool reviews, 
information requests, interesting ideas, outrageous opinions, etc.

Contributions to kdd@gte.com; Add/delete requests to kdd-request@gte.com 
 -- Gregory Piatetsky-Shapiro 

------------------------------------
From: Sarab <CBBR23@ujvax.ulster.ac.uk>
Subject: KDD Successes
Date: Tue, 25 Jan 94 14:46 BST

A magazine called Computing had an article on Database Mining in the issue of
the 20th of January. It covered three commercial products for database
Mining :
		NetMap - developed by John Galloway, 
			an Australian Economist/Engineer
		Logica's Data Mariner
		Matrica's 4Thought

The European Database Mining Community seem to prefer the use of Neural
Networks for database mining with most of the commercial products based
on networks. About KDD successes :
	1. Last year the British Army embarked on a major review of
the Army's carres management structure. Using data from payroll,
personnel and pension files, NetMap helped to identify the essential
groups in personnel and reorganize the key structures of the Army
Personnel Centre. This helped in reducing the personnel infrastucture
to 700-800 operated from one site from the original 2600 in 11
seperate organizations.  

	2. NetMap was used by the Dutch Police to round up a mafia
ring of 90 narcotics and murder suspects .

	3. NetMap was also used by the Serious Fraud Office in UK to
highlight a mortgage fraud.

	4. CIC Video is one of the major distributors of pre-recorded
films to shops and video stores. Its marketing executives had believed
for a while that there was a relation between the success of a film in
the cinema and its demand on video. But they had great difficulty in
generating emperical evidence.  Using 4Thought, CIC was able to
highlightrelationships between some "very obscure" pieces of data,
allowing a much more accurate prediction about video sales.  5. Thomas
Cook's, tour operators, marketing department used data mining
techniques to match characteristics of prospective travellers and
preferred holiday destinations to enable better target mailings.

---------------------------------
From: gps@gte.com (Gregory Piatetsky-Shapiro)
Subject:  Firms look at massively parallel computers for data mining
Date: 25 Jan 1994 

The article by E. Booker, "Firms set to mine for supercomputing gold",
ComputerWorld, January 17, 1994, p. 65, reports that some of the country's
top retailing, financial, and transporation companies are ready to 
become supercomputer users.  Some of these (unnamed) companies are 
interested in data mining, using massively parallel processing (MPP)
systems. The article quotes an IBM manager saying that "Many commercial
users ... will soon be getting a terabyte of information a day". 

David Frankel, director of technology at Smaby Group, Inc, estimated
that the market for Data Mining today is about $500 million. 

---------------------------------
From: gps@gte.com (Gregory Piatetsky-Shapiro)
Subject: Special issue on Learning and Discovery in Databases of 
IEEE Transactions on Knowledge and Data Engineering. 
Date: 21 Jan 1994 

Contents: 
============ Regular papers ================
Guest Editors Introduction, N. Cercone and M. Tsuchiya 

Systems for Knowledge Discovery in Databases, 
	C. J. Matheus, P. Chan, and G. Piatetsky-Shapiro

Database Mining: A Performance Perspective, 
	R. Agrawal, T. Imielinski, A. Swami	

Abstract-Driven Pattern Discovery in Databases, V. Dhar and A. Tuzhilin

Inductive Learning in Deductive Databases, S. Dzeroski and N. Lavrac

Learning Transformation Rules for Semantic Query Optimization: 
  A Data-Driven Approach, S. Shekhar, B. Hamidzadeh, A. Kohli, and M. Coyle

============ Concise papers =============
From Data Properties to Evidence, D. Bell

Inductive Database Relations, F. Bergadano

A Framework for Knowledge Discovery and Evolution in Databases, 
	J. P. Yoon and L. Kerschberg

Induction of Rules Subject to a Quality Constraint: Probabilistic
	 Inductive Learning, Ozden Gur-Ali and William Wallace
	
=========== Correspondence papers ============
Knowledge Discovery in Molecular Databases, 
	D. Conklin, S. Fortier, and J. Glasgow

A History Approach for Automatic Metadata Inference in VLSI
	Design Database, Tzi-cker Chiueh and R. H. Katz

Methodologies and Experience in the Development and Maintenance
	of Predictive Models of Large Databases, B. R. Gaines and P. Compton

Discovery of Inexact Concepts from Structural Data, 
	L. B. Holder and Diane J. Cook

Avoiding Misconstruals in Database Systems: A Default Logic Approach, 
	A. S. Hemerly, M. A. Casanova and A. L. Furtado

Calculating Salience and Breadth of Knowledge, L. Rau

--------------------------------------------------
From: marcel@cwi.nl (Marcel Holsheimer)
Subject: Data Mining report available
Date: Tue, 18 Jan 1994 16:28:39 GMT

(note: this report is extensive -- 78 pages, but not very up to date.
It gives a general overview of various learning methods and looks in
detail at ID3, AQ15, CN2, DBLearn, Meta-Dendral, and RADIX/RX.  Does not
cover any of the newer material, e.g. nothing about the 91-93 KDD workshops.
-- Gregory Piatetsky-Shapiro)

This following report can be obtained by ftp:
   _________________________________________________________________

                            DATA MINING

                The Search for Knowledge in Databases

                    Marcel Holsheimer, Arno Siebes


                              Abstract
Data mining is the search for relationships and global patterns that
exist in large databases, but are `hidden' among the vast amounts of
data, such as a relationship between patient data and their medical
diagnosis. These relationships represent valuable knowledge about the
database and objects in the database and, if the database is a
faithful mirror, of the real world registered by the database.

One of the main problems for data mining is that the number of
possible relationships is very large, thus prohibiting the search for
the correct ones by simple validating each of them. Hence, we need
intelligent search strategies, as taken from the area of machine
learning.

Another important problem is that information in data objects is often
corrupted or missing. Hence, statistical techniques should be applied
to estimate the reliability of the discovered relationships.

The report provides a survey of current data mining research, it
presents the main underlying ideas, such as inductive learning, and
search strategies and knowledge representations used in data mine
systems. Furthermore, it describes the most important problems and
their solutions, and provides an survey of research projects.

CR subject classification (1991):
Database applications (H.2.8),
Information search and retrieval (H.3.3),
Learning (I.2.6) concept learning, induction, knowledge acquisition,
Clustering (I.5.3)

keywords: database applications, machine learning, inductive learning,
knowledge acquisition, data summarization
  _____________________________________________________________________

The report can be obtained by anonymous ftp:

& ftp ftp.cwi.nl
Name (ftp.cwi.nl:marcel): ftp
331 Guest login ok, send ident (your e-mail address) as password.
Password:
ftp binary
ftp cd pub/CWIreports/AA
ftp get CS-R9406.ps.Z
ftp bye

  ________________________________________________________________________
Marcel Holsheimer     | Centre for Mathematics and Computer Science (CWI)
phone +31 20 592 4134 | Kruislaan 413, Amsterdam, The Netherlands