KDD Nuggets Index

To KD Mine: main site for Data Mining and Knowledge Discovery.

To subscribe to KDD Nuggets, email to kdd-request

Past Issues: 1996 Nuggets, 1995 Nuggets, 1994 Nuggets, 1993 Nuggets

Data Mining and Knowledge Discovery Nuggets 96:24, e-mailed 96-07-30

News:

* U. Fayyad, KDD-96 Final Update and KDD-97 News

* GPS, Data Mining From Text -- Summary of Responses

* S. Streltsov, Data Mining on the Web: lifestyle comparison
Publications:

* J. Widom, SIGMOD record on-line

http://bunny.cs.uiuc.edu/sigmod/sigmod_record
Meetings:

* V. Aroca, Data Mining Conference, Barcelona, Spain, Oct 10, 1996

--
Data Mining and Knowledge Discovery community,
focusing on the latest research and applications.

Contributions are most welcome and should be emailed,
with a DESCRIPTIVE subject line (and a URL, when available) to (kdd@gte.com).
E-mail add/delete requests to (kdd-request@gte.com).

Nuggets frequency is approximately weekly.
Back issues of Nuggets, a catalog of S*i*ftware (data mining tools),
and a wealth of other information on Data Mining and Knowledge Discovery
is available at Knowledge Discovery Mine site, URL http://info.gte.com/~kdd.

-- Gregory Piatetsky-Shapiro (moderator)

********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) *
* and not necessarily of their respective employers (or GTE Laboratories) *
*****************************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I'm convinced a new kind of social responsibility is emerging
-- an imperative to be succinct. Just as we've had to curtail
our gaseous emissions in an increasingly smoggy world,
the information glut demands that we be more economical about
what we say, write, and post on-line. With time an ever
more valuable commodity, the long-winded are beginning to
resemble people who open their car door at a stoplight
to dump trash onto the street. -- David Shenk, Wired, 7/96.
[Thanks to Ken Laws Computists list]

Previous 1 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Usama Fayyad (fayyad@MICROSOFT.com)
Subject: final update before KDD-96
Date: Fri, 26 Jul 1996 19:43:55 -0700

** The KDD-96 final program is acessible via the kdd-96 homepage, or
directly at:
http://www.research.microsoft.com/research/datamine/kdd96-program/kdd96
-program.htm

The program looks very impressive, and we have packed it with
special events and presentations of very interesting papers.
Check it out!

** Plans for 1997: The steering committee had prolonged debates
on whether to collocate with SIGMOD, VLDB, AAAI, or some other
venue. The final decision on venue and conference chairs was
as follows:

>KDD-97, will continue to be a AAAI conference, but
>will go independent in terms of location. In our first
>attempt to collocate near a major conference in one
>of the other constituent communitues: Statistics and
>Databases, we have decided to locate KDD-97 in the
>Los Angeles area (tentatively Newport Beach or Laguna
>Beach), right after the major ASA-97 conference in
>Anaheim. Proposed KDD-97 dates are August 13-15, 1997
>(Wed-Fri), which will overlap with last 1.5 days of ASA.
>
> The new organization will consist of:
>
> 1. Three PC co-chairs representing AI, DBs, and Stats (respectively):
> David Heckerman (Microsoft Research, USA)
> Heikki Mannila (University of Helsinky, Finland)
> Daryl Pregibon (AT&T Research, USA)
>
> 2. Publicity Chair: Paul Stolorz (Jet Propulsion Laboratory, USA)
>
> 3. General Chair: Ramasamy Uthurusamy (General Motors, USA)
>

Again, thank you all for making KDD-96 a success. So far, the
early registration numbers have broken last year's attendance
(> 350 attendees). So I expect attendance will be very large.
last year we had more on-site registrations than early regstrations!
Let us hope this does not repeat this year: our plans were for
a max of 400 attendees!!!

thanks again, on behalf of our program chairs, and I hope
to see you all in Portland.

Cheers,
Usama, KDD-96 general chair.

Previous 2 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Tue, 30 Jul 1996
From: Gregory Piatetsky-Shapiro (gps0@gte.com)
Subject: Data Mining from Text
Here are several summaries of work on data mining from text,
in response to Jean-Luc SIMONI (simoni@tabarly.saclay.cea.fr) query
in Nuggets 96:23.

----
Date: Thu, 18 Jul 1996 16:21:02 -0700 (MST)
From: RORWIG@BPA.ARIZONA.EDU
Subject: full-text document data mining

We are doing categorization of full-text documents using Hopfield nets,
Kohonen self-organizing maps, and genetic algorithms. While this may not
be dead-center data mining as you might interpret it I feel it is a useful
approach.

To see some of the utility, we have demonstrations set up on
http://ai.bpa.arizona.edu. Hsinchun Chen is the principal investigator and
has several publications.

If I can answer any questions, please let me know.

--Richard Orwig, Ph.D.
rorwig@bpa.arizona.edu
----
Date: Fri, 19 Jul 1996 08:47:51 -0400
From: Cheong Yu (kcy@lexis-nexis.com)
Subject: Data Mining full-text documents

Jean-Luc,

Regarding your question from KDD Nuggets, I have found very few
research papers which deal specifically with full-text documents.
Here are two which deals with document classification:

Feldman, R. and Dagan, I. Knowledge Discovery in Textual Databases
(KDT). Proceedings of the First International Conference on Knowledge
Discovery and Data Mining (KDD-95). 1995.

Dagan, I. and Feldman, R. Keyword-Based Browsing and Analysis of
Large Document Sets. Proceedings of the Fifth Annual Symposium on
Document Analysis and Information Retrieval. 1996.

If you have any other references, please forward them to me, as we are
very interested in data mining full-text as well.

-- Cheong

----
From: Christopher W Clifton (clifton@linus.mitre.org)
Date: Fri, 19 Jul 1996 09:50:33 -0400
Subject: Data mining from full text

Ronen Feldman http://www.cs.biu.ac.il:8080/~feldman/ has a
paper in this KDD (with Haym Hirsh of Rutgers in the U.S.A.) I think he
is primarily interested in categorization/classification of documents.
I'm also doing some work here at MITRE (with the idea of starting with natural
language tools to extract concepts from data, then mining those concepts).
Our work concentrates on rules (e.g. association rules, sequences)
among concepts (persons, locations, etc.) in the documents.

Don't have anything published yet. I'll let you know when we do.

I'd be interested in hearing about your specific interests -- I
might be able to give you a few additional pointers if something
you say jogs my memory.

-Chris Clifton (clifton@mitre.org)

Previous 3 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: simon1@cgl.bu.edu (Simon Streltsov)
Subject: Data mining on the web: lifestyle comparison
Date: Sun, 21 Jul 1996 22:52:53 -0400 (EDT)

I uploaded a simple example that compares lifestyle in different cities
by counting patterns using Altavista http://cad.bu.edu/mining.html

If you find it useful, please refer from your web page.
Thanks. Simon

Here are some 'results':

Lifestyle comparison: data mining using Altavista

New San Cambridge
York Francisco Boston MA Indianapolis

Fun 1 2.3 2.8 0.75 1.3
Fun-to-work ratio 1 2 1.6 0.17 1.63
Data mining 1 1.75 1.75 3.06 0
Clinton/Dole ratio 2.86 6.12 2.7 20.67 2.45

Simon Streltsov

E-mail simon1@bu.edu Department of Manufacturing
Engineering
http://cad.bu.edu/go/simon.html Boston University
Ph 617/353-4209 15 St Mary Str
Fax 617/353-5548 Boston MA 02215

Data mining using Web search engines

We compare lifestyles in several cities by analyzing easily available Web
information - we simply construct queries like 'Boston near fun' and compare
number of hits. [Note: this requires using Altavista Advanced Search query. GPS]
The results are not precise but we get the overall picture.

The goal of this exercise is to demonstrate the opportunities - and
pitfalls! - of mining unstructured web data. Please contribute to the
discussion!

Executive summary:

Recommedation: Live & have fun in Boston while mining data in Cambridge.

* You can have fun in Boston and San-Francisco but Boston has more ties
to New York.
* Even hard work does not give New Yorkers large salary - maybe they just
want more?!
* There is no other place like Cambridge if you want to work in data
mining, but they have no life.
* Is it because of their political views? No - liberals in San-Francisco
have fun!

Previous 4 Next Top

>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 24 Jul 1996 13:34:51 -0500
From: Jennifer Widom (widom@DB.Stanford.EDU)
Subject: (DBWORLD) SIGMOD Record on-line
Message-ID: (199607241834.NAA20352@ricotta.cs.wisc.edu)

I'm delighted to report that it has been decided recently that SIGMOD
Record will be made available on-line to the general public via the
World Wide Web. All portions of the Record from the September 1993
issue through the upcoming September 1996 issue that are available in
electronic form (primarily postscript) are now on-line, and all future
issues will be placed on-line as well. Unfortunately, at this time
the SIGMOD conference proceedings - which also comprise the June issue
of SIGMOD Record each year - are not available on-line.

The SIGMOD Record web site is now being maintained by SIGMOD Services
Chair Marie-Anne Neimat at HP Labs. The new URL is:
http://bunny.cs.uiuc.edu/sigmod/sigmod_record

Enjoy.
Jennifer Widom
Editor-in-Chief
SIGMOD Record

Previous 5 Next Top

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Return-Path:
Date: Mon, 29 Jul 1996 18:42:40 -0400
From: Gregory Piatetsky-Shapiro (gps0@eureka)
To: kdd
Subject: [varoca@ictnet.es: ]
Content-Length: 7289

------- Start of forwarded message -------
From: varoca@ictnet.es
Date: Tue, 23 Jul 1996 08:24:11 +0200

II CONFERENCE

DATA MINING

HOTEL PLAZA, Barcelona

Thursday 10th October 1996

MAIN EXPERIENCES IN EUROPE
TECNOLOGIES - TOOLS - SERVICES

PROGRAMME KEY CONTRIBUTIONS

SPONSORS INFORMATION

-------------------------------------------------------------------------------

DATA MINING

How to extract information contained in the companies databases? Nowadays,
there exists a new generation of techniques and tools with a capacity to
analyze in an intelligent and 'automatic' way different to data units, in order
to get useful knowledge, through patterns, rules and profiles of performance.

* Customer's Profiles per Product
* Sales and Demands Prediction
* Shapping of Industrial Processes
* Fraud Detection
* Final Product Quality Prediction
* Quality Parameters Analysis of Providers
* Credit Scoring
* In Marketing, Target Identification
* Benefit Prediction of Customer / Operation

Objective:

* During the last years, the companies have computerized their processes
and transactions. It has lead us to create important databases with
historic information and data that, normally, doesn't generate advanced
knowledge.

* Nowadays, there are computerized methods based in the artificial training
that allow us to extract knowledge from the databases which are shaped
like models, profiles, classifications, variable evolution predicting,
decision help systems, hypothesis validation,...

* This new type of software and learning systems, based in neural networks,
rules induction learning or genetic algorithm., have already proved their
usefulness in different sectors such as: financial- insurance, industry,
services, health, pharmaceutical industry.

-------------------------------------------------------------------------------

PROGRAMME

MORNING: TECNOLOGIES / TOOLS / FOCUS GROUPS

09.00 Registration

09.30 Data Mining: Introduction - Tecnologíes - Application Areas

Miquel Sellés. Director of the Information Tecnologies Center of the
Institut Català de Tecnologia

10.15 State of The Art in Data Mining and Knowledge Discovery in Databases

Willi Kloesgen. German National Research Center for IT

11.00 Data Mining for Data Owners and Users

Colin Shearer. Director of Integral Solutions Ltd. (UK)

11.45 Coffee

12.10 How to approach the projects in Data Mining: Resources / Investments /
Agents

David Nettleton. Director of TAD Sistemas

12.50 An approach to Data Warehouse

Manuel Briso. Project office of Data Warehouse . Sun Microsystems

13.30 Lunch

AFTERNOON: APLICATIONS AND EXPERIENCES

15.30 Data Mining in the Financial Sector.

Kevin Moore,. Business Analyst of NatWest Bank .

16.10 The Experience of Data Mining in an Insurance Company

Evaristo Prieto. Director of Information Systems. Sanitas, S.A.

16.50 Data Mining in BT

Ken Totton. Manager of Data Mining Group. BT Laboratories (UK)

17.30 Coffee

17.50 A Predictive Application in the Cement Quality Control

Luis Santapau. Process Engineer of Cementos Molins, S.A.

18.30 Round Table: How to be Successful in the Data Mining Application ?

Colin Shearer. Director of Integral Solutions Ltd. (UK)
Cristóbal Arenas. Oracle
Gener López. Operative Development Director of Bankpyme
David Nettleton. Director of TAD Sistemas
Evaristo Prieto. Information Systems Director Sanitas, S.A.
Christopher Hawkins.Finance Manager. Halfords
Ken Totton. Manager of the Data Mining Group. BT Laboratories (UK)

-------------------------------------------------------------------------------

KEY CONTIBUTIONS

* Bankpyme
* BT Laboratories
* Cementos Molins
* Interministerial Commission of Science and Technology (CICYT)
* German National Research Center for IT
* Institut Català de Tecnologia
* Integral Solutions Ltd.
* Ministry of Industry and Energy
* NatWest Bank
* Oracle
* Sanitas, S.A.
*
* German National Research Center for IT
* Institut Català de Tecnologia
* Integral Solutions Ltd.
* Ministry of Industry and Energy
* NatWest Bank
* Oracle
* Sanitas, S.A.
* SUN Microsystems
* TAD Systems

-------------------------------------------------------------------------------

SPONSORS

* Ministry of Industry and Energy PEIN IV
* Interministerial Commission of Science and Technology (CICYT)

-------------------------------------------------------------------------------

INFORMATION

ORGANIZATION: Institut Català de Tecnologia

CONTENTS:

* Used tecnologies in Data Mining, Possibilities and Performance.
* The necessary characteristics of a Data Mining Tool. How to use them. How
they work.
* Applicable Areas of the Data Mining Tecnology. Necessary Recourses and
Inversions.
* How to Focus on a Project. The Integral Development of an Application.

Find The Answer to A:

Technologies:

* What Tecnologies Intervene in 'Data Mining'?
* What will the outcome be?
* Are they easily applied? In which cases?

Tools:

* What should a Data Mining tool contemplate?
* Could I use them in my company?
* How do they work?

Projects:

* In which areas could I apply Data Mining?
* Which resources should I dedicate?
* How much inversion is necessary?
* How should I focus on a project?

SHIFTS ADDRESSED TO: Senior Directors and Techincians of the following Areas :

* Marketing
* Commercial
* Planning/Finances
* I+D
* Production
* Information Systems/Computing In Laboratories
* Industrial Processes
* Quality

Of The sectors:

* Finances
* Insurance
* Industry
* Pharmaceutical Laboratories
* Hospitals - Health
* Public Administration

INSCRIPTION:

Until the30-9-96 Apart from 1-10-96
Inscription of Shifts 45.000 Ptas. + VAT 50.000 Ptas. + VAT
Documentation 10.000 Ptas. + VAT 10.000 Ptas. + VAT

It Includes :

* Documentation.
* Coffee Breaks.
* Breakfast at Work.
* Simultaneous Translation

For any additional information or inscriptions, please contact

INTER-CONGRÉS, S.A.
Tel. 93-459.35.65
Fax. 93-459.44.68
E-Mail: Information and Inscription (...@ictnet.es)

-------------------------------------------------------------------------------

The Institut Català de Tecnologia would appreciate your comments.
Please send your mail to: webmaster@ictnet.es
Copyright © 1996 Institut Català de Tecnologia

[Image]

------- End of forwarded message -------

Previous 6 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~