*
V. Aroca, Data Mining Conference, Barcelona, Spain, Oct 10, 1996
--
Data Mining and Knowledge Discovery community,
focusing on the latest research and applications.
Contributions are most welcome and should be emailed,
with a DESCRIPTIVE subject line (and a URL, when available) to (kdd@gte.com).
E-mail add/delete requests to (kdd-request@gte.com).
Nuggets frequency is approximately weekly.
Back issues of Nuggets, a catalog of S*i*ftware (data mining tools),
and a wealth of other information on Data Mining and Knowledge Discovery
is available at Knowledge Discovery Mine site, URL http://info.gte.com/~kdd.
-- Gregory Piatetsky-Shapiro (moderator)
********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) *
* and not necessarily of their respective employers (or GTE Laboratories) *
*****************************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I'm convinced a new kind of social responsibility is emerging
-- an imperative to be succinct. Just as we've had to curtail
our gaseous emissions in an increasingly smoggy world,
the information glut demands that we be more economical about
what we say, write, and post on-line. With time an ever
more valuable commodity, the long-winded are beginning to
resemble people who open their car door at a stoplight
to dump trash onto the street. -- David Shenk, Wired, 7/96.
[Thanks to Ken Laws Computists list]
Previous1NextTop
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Usama Fayyad (fayyad@MICROSOFT.com)
Subject: final update before KDD-96
Date: Fri, 26 Jul 1996 19:43:55 -0700
The program looks very impressive, and we have packed it with
special events and presentations of very interesting papers.
Check it out!
** Plans for 1997: The steering committee had prolonged debates
on whether to collocate with SIGMOD, VLDB, AAAI, or some other
venue. The final decision on venue and conference chairs was
as follows:
>KDD-97, will continue to be a AAAI conference, but
>will go independent in terms of location. In our first
>attempt to collocate near a major conference in one
>of the other constituent communitues: Statistics and
>Databases, we have decided to locate KDD-97 in the
>Los Angeles area (tentatively Newport Beach or Laguna
>Beach), right after the major ASA-97 conference in
>Anaheim. Proposed KDD-97 dates are August 13-15, 1997
>(Wed-Fri), which will overlap with last 1.5 days of ASA.
>
> The new organization will consist of:
>
> 1. Three PC co-chairs representing AI, DBs, and Stats (respectively):
> David Heckerman (Microsoft Research, USA)
> Heikki Mannila (University of Helsinky, Finland)
> Daryl Pregibon (AT&T Research, USA)
>
> 2. Publicity Chair: Paul Stolorz (Jet Propulsion Laboratory, USA)
>
> 3. General Chair: Ramasamy Uthurusamy (General Motors, USA)
>
Again, thank you all for making KDD-96 a success. So far, the
early registration numbers have broken last year's attendance
(> 350 attendees). So I expect attendance will be very large.
last year we had more on-site registrations than early regstrations!
Let us hope this does not repeat this year: our plans were for
a max of 400 attendees!!!
thanks again, on behalf of our program chairs, and I hope
to see you all in Portland.
Cheers,
Usama, KDD-96 general chair.
Previous2NextTop
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Tue, 30 Jul 1996
From: Gregory Piatetsky-Shapiro (gps0@gte.com)
Subject: Data Mining from Text
Here are several summaries of work on data mining from text,
in response to Jean-Luc SIMONI (simoni@tabarly.saclay.cea.fr)
query
in Nuggets 96:23.
We are doing categorization of full-text documents using Hopfield nets,
Kohonen self-organizing maps, and genetic algorithms. While this may not
be dead-center data mining as you might interpret it I feel it is a useful
approach.
To see some of the utility, we have demonstrations set up on http://ai.bpa.arizona.edu.
Hsinchun Chen is the principal investigator and
has several publications.
If I can answer any questions, please let me know.
Regarding your question from KDD Nuggets, I have found very few
research papers which deal specifically with full-text documents.
Here are two which deals with document classification:
Feldman, R. and Dagan, I. Knowledge Discovery in Textual Databases
(KDT). Proceedings of the First International Conference on Knowledge
Discovery and Data Mining (KDD-95). 1995.
Dagan, I. and Feldman, R. Keyword-Based Browsing and Analysis of
Large Document Sets. Proceedings of the Fifth Annual Symposium on
Document Analysis and Information Retrieval. 1996.
If you have any other references, please forward them to me, as we are
very interested in data mining full-text as well.
-- Cheong
----
From: Christopher W Clifton (clifton@linus.mitre.org)
Date: Fri, 19 Jul 1996 09:50:33 -0400
Subject: Data mining from full text
Ronen Feldman http://www.cs.biu.ac.il:8080/~feldman/
has a
paper in this KDD (with Haym Hirsh of Rutgers in the U.S.A.) I think he
is primarily interested in categorization/classification of documents.
I'm also doing some work here at MITRE (with the idea of starting with natural
language tools to extract concepts from data, then mining those concepts).
Our work concentrates on rules (e.g. association rules, sequences)
among concepts (persons, locations, etc.) in the documents.
Don't have anything published yet. I'll let you know when we do.
I'd be interested in hearing about your specific interests -- I
might be able to give you a few additional pointers if something
you say jogs my memory.
-Chris Clifton (clifton@mitre.org)
Previous3NextTop
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: simon1@cgl.bu.edu
(Simon Streltsov)
Subject: Data mining on the web: lifestyle comparison
Date: Sun, 21 Jul 1996 22:52:53 -0400 (EDT)
I uploaded a simple example that compares lifestyle in different cities
by counting patterns using Altavista http://cad.bu.edu/mining.html
If you find it useful, please refer from your web page.
Thanks. Simon
Here are some 'results':
Lifestyle comparison: data mining using Altavista
New San Cambridge
York Francisco Boston MA Indianapolis
Fun 1 2.3 2.8 0.75 1.3
Fun-to-work ratio 1 2 1.6 0.17 1.63
Data mining 1 1.75 1.75 3.06 0
Clinton/Dole ratio 2.86 6.12 2.7 20.67 2.45
Simon Streltsov
E-mail simon1@bu.edu
Department of Manufacturing
Engineering http://cad.bu.edu/go/simon.html
Boston University
Ph 617/353-4209 15 St Mary Str
Fax 617/353-5548 Boston MA 02215
Data mining using Web search engines
We compare lifestyles in several cities by analyzing easily available Web
information - we simply construct queries like 'Boston near fun' and compare
number of hits. [Note: this requires using Altavista Advanced Search query. GPS]
The results are not precise but we get the overall picture.
The goal of this exercise is to demonstrate the opportunities - and
pitfalls! - of mining unstructured web data. Please contribute to the
discussion!
Executive summary:
Recommedation: Live & have fun in Boston while mining data in Cambridge.
* You can have fun in Boston and San-Francisco but Boston has more ties
to New York.
* Even hard work does not give New Yorkers large salary - maybe they just
want more?!
* There is no other place like Cambridge if you want to work in data
mining, but they have no life.
* Is it because of their political views? No - liberals in San-Francisco
have fun!
Previous4NextTop
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 24 Jul 1996 13:34:51 -0500
From: Jennifer Widom (widom@DB.Stanford.EDU)
Subject: (DBWORLD) SIGMOD Record on-line
Message-ID: (199607241834.NAA20352@ricotta.cs.wisc.edu)
I'm delighted to report that it has been decided recently that SIGMOD
Record will be made available on-line to the general public via the
World Wide Web. All portions of the Record from the September 1993
issue through the upcoming September 1996 issue that are available in
electronic form (primarily postscript) are now on-line, and all future
issues will be placed on-line as well. Unfortunately, at this time
the SIGMOD conference proceedings - which also comprise the June issue
of SIGMOD Record each year - are not available on-line.
How to extract information contained in the companies databases? Nowadays,
there exists a new generation of techniques and tools with a capacity to
analyze in an intelligent and 'automatic' way different to data units, in order
to get useful knowledge, through patterns, rules and profiles of performance.
* Customer's Profiles per Product
* Sales and Demands Prediction
* Shapping of Industrial Processes
* Fraud Detection
* Final Product Quality Prediction
* Quality Parameters Analysis of Providers
* Credit Scoring
* In Marketing, Target Identification
* Benefit Prediction of Customer / Operation
Objective:
* During the last years, the companies have computerized their processes
and transactions. It has lead us to create important databases with
historic information and data that, normally, doesn't generate advanced
knowledge.
* Nowadays, there are computerized methods based in the artificial training
that allow us to extract knowledge from the databases which are shaped
like models, profiles, classifications, variable evolution predicting,
decision help systems, hypothesis validation,...
* This new type of software and learning systems, based in neural networks,
rules induction learning or genetic algorithm., have already proved their
usefulness in different sectors such as: financial- insurance, industry,
services, health, pharmaceutical industry.
09.30 Data Mining: Introduction - Tecnolog�es - Application Areas
Miquel Sell�s. Director of the Information Tecnologies Center of the
Institut Catal� de Tecnologia
10.15 State of The Art in Data Mining and Knowledge Discovery in Databases
Willi Kloesgen. German National Research Center for IT
11.00 Data Mining for Data Owners and Users
Colin Shearer. Director of Integral Solutions Ltd. (UK)
11.45 Coffee
12.10 How to approach the projects in Data Mining: Resources / Investments /
Agents
David Nettleton. Director of TAD Sistemas
12.50 An approach to Data Warehouse
Manuel Briso. Project office of Data Warehouse . Sun Microsystems
13.30 Lunch
AFTERNOON: APLICATIONS AND EXPERIENCES
15.30 Data Mining in the Financial Sector.
Kevin Moore,. Business Analyst of NatWest Bank .
16.10 The Experience of Data Mining in an Insurance Company
Evaristo Prieto. Director of Information Systems. Sanitas, S.A.
16.50 Data Mining in BT
Ken Totton. Manager of Data Mining Group. BT Laboratories (UK)
17.30 Coffee
17.50 A Predictive Application in the Cement Quality Control
Luis Santapau. Process Engineer of Cementos Molins, S.A.
18.30 Round Table: How to be Successful in the Data Mining Application ?
Colin Shearer. Director of Integral Solutions Ltd. (UK)
Crist�bal Arenas. Oracle
Gener L�pez. Operative Development Director of Bankpyme
David Nettleton. Director of TAD Sistemas
Evaristo Prieto. Information Systems Director Sanitas, S.A.
Christopher Hawkins.Finance Manager. Halfords
Ken Totton. Manager of the Data Mining Group. BT Laboratories (UK)
* Bankpyme
* BT Laboratories
* Cementos Molins
* Interministerial Commission of Science and Technology (CICYT)
* German National Research Center for IT
* Institut Catal� de Tecnologia
* Integral Solutions Ltd.
* Ministry of Industry and Energy
* NatWest Bank
* Oracle
* Sanitas, S.A.
*
* German National Research Center for IT
* Institut Catal� de Tecnologia
* Integral Solutions Ltd.
* Ministry of Industry and Energy
* NatWest Bank
* Oracle
* Sanitas, S.A.
* SUN Microsystems
* TAD Systems
* Used tecnologies in Data Mining, Possibilities and Performance.
* The necessary characteristics of a Data Mining Tool. How to use them. How
they work.
* Applicable Areas of the Data Mining Tecnology. Necessary Recourses and
Inversions.
* How to Focus on a Project. The Integral Development of an Application.
Find The Answer to A:
Technologies:
* What Tecnologies Intervene in 'Data Mining'?
* What will the outcome be?
* Are they easily applied? In which cases?
Tools:
* What should a Data Mining tool contemplate?
* Could I use them in my company?
* How do they work?
Projects:
* In which areas could I apply Data Mining?
* Which resources should I dedicate?
* How much inversion is necessary?
* How should I focus on a project?
SHIFTS ADDRESSED TO: Senior Directors and Techincians of the following Areas :
* Marketing
* Commercial
* Planning/Finances
* I+D
* Production
* Information Systems/Computing In Laboratories
* Industrial Processes
* Quality
Of The sectors:
* Finances
* Insurance
* Industry
* Pharmaceutical Laboratories
* Hospitals - Health
* Public Administration
INSCRIPTION:
Until the30-9-96 Apart from 1-10-96
Inscription of Shifts 45.000 Ptas. + VAT 50.000 Ptas. + VAT
Documentation 10.000 Ptas. + VAT 10.000 Ptas. + VAT
The Institut Catal� de Tecnologia would appreciate your comments.
Please send your mail to: webmaster@ictnet.es
Copyright � 1996 Institut Catal� de Tecnologia