*
Torgeir Dingsoeyr, Question: Integration between DM and CBR ? *
Bharat Rao, Question: Clustering samples in high-dimensional
BOOLEAN space Publications: *
GPS: ComputerWorld: Data Mining for Fools Gold,
Positions: *
Laurence Jacobs, Switzerland: Data Mining Jobs at Credit Suisse Meetings: *
GPS, KDD-98 poster and last call for KDD-98 tutorial proposals *
Trish Carbone, Reminder: First Federal Data Mining Symposium,
December 16-17, 1997, Washington, DC.
--
Data Mining and Knowledge Discovery community, focusing on the
latest research and applications.
Submissions are most welcome and should be emailed, with a
DESCRIPTIVE subject line (and a URL) to gps.
Please keep CFP and meetings announcements short and provide
a URL for details.
KD Nuggets frequency is 2-3 times a month.
Back issues of KD Nuggets, a catalog of data mining tools
('Siftware'), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site
at
********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Vermonter's Guide to Computer Lingo
Excerpted from, of all things, a newsletter from the Cyberian Outpost
Main Frame: The part of the barn that holds the roof up.
Previous1NextTop
From: Michael Beddows (mbeddows@kstream.com)
Subject: Applied Healthcare/Data Mine Cures Common Gold
Date: Wed, 3 Dec 1997 11:10:33 -0600
Applied Healthcare/Data Mine Cures Common Gold
December 3, 1997
Information Week via Individual Inc. : If you'd like to know who's most
susceptible to a certain illness, what treatments are most likely to
help, and how much it will cost to treat them, United HealthCare Corp.
can provide the answer-courtesy of data mining.
United HealthCare, a Minnetonka, Minn., health-care services provider
with a network of 13 million patients at affiliated U.S. hospitals and
health facilities, has been working on data mining, databases, and data
marts for nearly a decade. It has also acquired more than a dozen
health-care-related companies over the last few years, some of which
have data mining initiatives of their own. So about a year ago, United
HealthCare formed a separate subsidiary to provide information from its
data mining activities to both the company's business units and outside
customers.
That new subsidiary, Applied HealthCare Informatics, has already
generated $50 million in revenue and turned a profit. It sells data
mining and data warehousing services to more than 200 customers,
including the federal government, pharmaceuticals makers, medical-device
makers, corporate employee- benefit departments, and other health-plan
providers. For these customers, Applied HealthCare will run a data
warehouse on either its own computers or the customer's.
Creating Opportunity
The unit builds on the parent company's tradition, says Kevin Roche,
Applied HealthCare's CEO. 'The thrust of our innovation is taking United
HealthCare's health-care knowledge, experience, and tools and creating
an external business opportunity,' he says.
To do that, Applied HealthCare also generates reports on paper and
CD-ROM for clients, based on their data and research requests. Customers
can also access their data warehouses via dial-up modems. Down the line,
Applied HealthCare will make that data available to customers over the
Internet and through private Web sites built for individual customers.
The goal:to help clients make better business and clinical decisions,
and to make a profit doing so. 'Through our data mining, we can help an
employee- benefits organization figure out what it will cost to set up
wellness programs such as smoking cessation-and what the payback will
be,' says Bob Jahreis, VP of Applied HealthCare.
To date, the company's largest data warehouse is that of its parent. The
files contain more than 1 billion rows of code, say company officials.
Much of it includes statistics from United HealthCare's patient-claims
files. Other data in the warehouses include information from insurance
carriers, occupational health agencies, and government sources. This
wide range of samples and sources helps Applied HealthCare answer
queries.
At the heart of Applied HealthCare's data mining service are DB2
databases running on IBM mainframes and Oracle databases running on
Hewlett-Packard 9000 systems. Applied HealthCare is also evaluating
databases and tools from Sybase Inc. and Red Brick Systems Inc., says
Roche, and has a set of query tools developed in-house and built around
C++. 'We've been looking at off-the-shelf query tools, but most of them
aren't well-suited for the range of health-care data we're working
with,' Roche says. 'The key is to collect data that is reliable.'
While data mining isn't new to health care, 'most internal systems have
been built around patient billing systems, not clinical systems,' says
Ted Schaler, an analyst at Forrester Research and a former healthcare
software director. 'If a company can offer data mining services coming
in through the back door, combining clinical information as well as
financial-related data, that's a big benefit.' And a healthy one, too.
Previous2NextTop
Date: Wed, 03 Dec 1997 15:58:51 +0000
From: Julian Clinton (julianc@isl.co.uk)
Subject: CRISP Data Mining Process Model Workshop
Site:
In July 1997, a consortium of leading data mining suppliers and major
industrial organizations made a significant move towards a standard
process model for data mining: CRoss-Industry Standard Process Model for
Data Mining (CRISP-DM).
The collaborative project CRISP-DM is driven by NCR Systems Engineering
(Denmark), Integral Solutions Ltd. (United Kingdom), Daimler-Benz
Aktiengesellschaft (Germany), and OHRA (Netherlands). This project is
partly funded by the European Community as part of the ESPRIT program.
The overall goal in CRISP-DM is the development of a standard process
model for data mining which is both industry-neutral and
tool-independent. The CRISP-DM process model will make data mining
projects faster, more efficient, more reliable, more manageable, and
less costly. The process model will also reduce skills needed to perform
data mining projects successfully.
One of the key factors to the success of the CRISP-DM initiative is the
CRISP-DM Special Interest Group (SIG). This group brings together
practitioners and end users across all industries in order to discuss
the needs for a standard process model for data mining, to share
experiences in performing data mining projects, and to contribute to the
development of a standard process model for data mining.
The 1st CRISP-DM SIG workshop took place at Amsterdam (Netherlands) on
November, 20th, 1997, and received excellent support. More than 20
participants from all over Europe and the US presented their views on
data mining as a process and their expectations for a standard data
mining process model. Workshop participants included representatives of
data mining vendors (e.g. Syllogic, Data Distilleries and Attar
Software), system suppliers (e.g. Cap Gemini and ICL Retail),
management consultancies (e.g. Deloitte & Touche and Price Waterhouse)
as well as large-scale industrial companies (e.g. British Telecom and
ABB).
In summary, the 1st CRISP-DM SIG workshop stressed the following points:
o data mining users are not always technology experts,
o a standard methodology for data mining must provide a framework for
capturing and re-using experiences, and for guiding data mining projects
at different levels of skills,
o business concerns are as important as technology aspects, and must be
addressed by any process model.
'This workshop has confirmed the need for a standard, cross-industry
process model,' said Jens Hejlesen of NCR, CRISP-DM project manager.
'There was overwhelming agreement that data mining needs a common
process model, and from the SIG members' input we are confident that our
work is going in the right direction. Most importantly, there is a
recognition that the data mining market needs such a standard now if we
are to see this technology adopted as infrastructure by Global 2000
companies.'
Members are still being recruited for the CRISP SIG, and further
workshops are planned during the next few months. An email discussion
forum will also be established and a newsletter will be published.
Anybody interested in joining the CRISP SIG should contact:
*** crisp@dbag.ulm.daimlerbenz.com.
***
------------------------------------------------------------------------
Julian Clinton (julianc@isl.co.uk)
Integral Solutions Limited,
Berk House, Basing View, Basingstoke, Hants RG21 4RG, UK
Tel. +44 (0)1256 355899, Fax. +44 (0)1256 363467
URL.
Integration between Data Mining and Case-Based Reasoning
I am currently writing a diploma (small thesis) at the Norwegian
University of Science and Technology on integration of Data Mining and
Case-Based Reasoning. I would like to come in contact with people who have
experience in this field. I am aware of the work by Heckerman/Breese, Aha,
Faltings, and the work done at the University of Rostock, University of
Salford, University of Helsinki and at NEC corporation, but am intereseted
in looking at more integrated prototypes/systems. Especially systems with
'deep' integration, if some exist.
Yours,
Torgeir Dingsoyr
dingsoyr@idi.ntnu.no
____________________________________________________________________________
Torgeir Dingsoeyr 'You cannot simply bring together a country
dingsoyr@idi.ntnu.no
that has over 265 kinds of cheese.'
Phone: +33 1 40 78 56 09 -- Charles de Gaulle
Previous4NextTop
From: 'Rao, Bharat' (bharat@scr.siemens.com)
Subject: Clustering few samples in high-dimensional BOOLEAN space
Date: Thu, 4 Dec 1997 15:35:20 -0500
Hello,
I'm looking to cluster a dataset where the
a) data has high-dimensionality (50
b) relatively few samples ( M=O(n), and occasionally M < n)
c) and is completely Boolean (all variables are 0/1).
Can anyone point me to some existing implemented algorithms that
cluster Boolean data. (I am getting COBWEB & possibly AutoClass.)
Also, any pointers to work on constructive induction that may be
relevant
for constructing new features to help clustering would be appreciated.
Apologies if you have already seen this request on another mailing-list.
Thanks for any help,
Bharat
[Obviously clustering will be hard, and most likely
I will end up with a bunch of singleton clusters. But
I'd like to try running some existing algorithms on this
data, at least for benchmarking purposes, before trying
to develop new algorithms.]
R. Bharat Rao, E-mail:bharat@scr.siemens.com
[PGP WELCOME]
Adaptive Information & Signal Processing, Siemens Corporate Research
US Mail: 755 College Road East, Princeton, NJ 08540
Phones: (609)734-6531(O) (609)734-6565(F)
Previous5NextTop
Date: Mon, 08 Dec 1997 17:37:00 -0500
From: GPS (gps)
Subject: ComputerWorld: Data Mining for Fools Gold
Dec 1, 1997 Computerworld featured a nice cover story by
Craig Steadman, entitled ' Data mining for fool's gold'
(see
for the on-line version).
The story helped to put a dose of realism by emphasizing how easy it is
using the available data mining tools to find random and incorrect
patterns in data. One user was quoted that perhaps only 20 of the
2000 patterns found are actually new and useful.
While those twenty patterns could be quite valuable, and justify
the large investment companies like Chase Manhattan are
making in data mining project, one needs to be careful to screen
the results and verify them before presenting them to the end user
as 'computer found truth'.
Such careful analysis and scrutiny is beyond the abilities or desires of
most users, which is what I meant by my quote in the article:
``Most users don't want a jet engine,
what they want is a chauffeur-driven car to take them from point
A to point B.'' Previous6NextTop
Date: Tue, 02 Dec 1997 19:45:35 -0500
From: 'A. (Fazel) Famili' (famili@ida-ij.com)
Subject: Intelligent Data Analysis Journal - Call for Papers
INTELLIGENT DATA ANALYSIS - AN INTERNATIONAL JOURNAL
====================================================
C A L L F O R P A P E R S
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
An electronic, Web-based journal
Published by Elsevier Science
The first year of Intelligent Data Analysis journal has been with
great success. This is a quarterly journal, published by Elsevier
Science Inc. The logfile statistics accumulated by Elsevier Science
shows that the journal articles have been accessed by many people,
all over the world. The journal will be based on subscription,
starting January 1, 1998. Volume 2(1) will be on-line, January 15th,
1998.
The journal is offering a number of new features that are not
currently available in paper journals: (i) an alerting service
notifying subscribers of new papers in the journal, (ii) links to
large-scale data collections, (iii) links to secondary collection
of data related to material presented in the journal, (iv) the ability
to test new search mechanisms on the collection of journal articles,
based on Author, Subject or Title, (v) links to related bibliographic
material, and (vi) inclusion of 3-D objects and multiple color graphs.
We are also working on more features that will be announced in the
next issues of this journal. At the end of 1998, there will be a fully
searchable, archival CD-ROM containing all 1997 and 1998
Intelligent Data Analysis articles.
If you are interested in submitting a paper, please contact
the Editor-in-Chief, Dr. A. Famili (editor@ida-ij.com).
Please refer
to one of the above URL addresses to look at the articles in
Volume 1 of the IDA journal. This site also contains the journal
home page: Aims and Scope, Author Submission Guidelines, Related
Events, and more...
Best wishes,
Dr. A. Famili
Editor-in-Chief Annette Leeuwendal
famili@ida-ij.com
a.leeuwendal@elsevier.com
Previous7NextTop
From: Kalles Dimitris (Kalles.Dimitrios@cti.gr)
Subject: PhD thesis: Decision trees and domain knowledge
in pattern recognition
Date: Thu, 4 Dec 1997 10:49:47 +-200
I have finally managed to upload to our server selected parts of my
PhD thesis (submitted back in 1994) on decision trees and how domain
knowledge in the from of attribute dependencies may be exploited in
batch or incremental induction.
The material is available via the PhD link in the following URL:
You can browse the abstract and download all chapters (I could not
afford the time, as of yet, to upload a list of references, appendices
and source code).
The thesis also deals with domains of ambiguously valued attributes
and presents a viable pre-pruning variant and a study on a caching
method for speeding up induction. Most of the work in the thesis will
(hopefully) be the basis of forthcoming publications, as I hope to get
the results diffused and stir some interest. I would appreciate any
one who might cast a critical look on it and provide comments or
advice as to how the work can be polished or extended.
Dr Dimitrios Kalles
R&D Engineer
Computer Technology Institute -
>>
>>
>> Automatic Learning Techniques in Power Systems
>>
>> by
>> Louis A. Wehenkel
>> University of Liege, Institut Montefiore, Belgium
>>
>> THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND
>> COMPUTER SCIENCE
>> Volume 429
>>
>> Automatic learning is a complex, multidisciplinary field of research
>> and development, involving theoretical and applied methods from
>> statistics, computer science, artificial intelligence, biology and
>> psychology. Its applications to engineering problems, such as those
>> encountered in electrical power systems, are therefore challenging,
>> while extremely promising. More and more data have become available,
>> collected from the field by systematic archiving, or generated through
>> computer-based simulation. To handle this explosion of data, automatic
>> learning can be used to provide systematic approaches, without which
>> the increasing data amounts and computer power would be of little use.
>>
>> Automatic Learning Techniques in Power Systems is dedicated to the
>> practical application of automatic learning to power systems. Power
>> systems to which automatic learning can be applied are screened and
>> the complementary aspects of automatic learning, with respect to
>> analytical methods and numerical simulation, are investigated.
>>
>> This book presents a representative subset of automatic learning
>> methods - basic and more sophisticated ones - available from
>> statistics (both classical and modern), and from artificial
>> intelligence (both hard and soft computing). The text also discusses
>> appropriate methodologies for combining these methods to make the best
>> use of available data in the context of real-life problems.
>>
>> Automatic Learning Techniques in Power Systems is a useful reference
>> source for professionals and researchers developing automatic learning
>> systems in the electrical power field.
>>
>> 1998, 320pp. ISBN 0-7923-8068-1 PRICE : US$ 122.00
Previous9NextTop
From: Laurence Jacobs (ljacobs@kstream.com)
Subject: Switzerland: Data Mining Jobs at Credit Suisse
Date: Mon, 8 Dec 1997 12:53:37 -0600
Loyalty Based Managment is a business strategy for identifying,
locating, obtaining, keeping and growing profitable customers and
productive employees. The LBM Project at Credit Suisse, Zurich,
Switzerland, involves the integration of several technologies and
managment systems, such as,
- Data Warehousing
- Data Mining and Knowledge Discovery
- Campaign Management
Credit Suisse is looking for
Project Manager Data Mining/Senior Data Miner
Data Mining Specialist/Junior Data Miner
for work to be performed in Zurich, Switzerland.
As a Project Manager you are responsible for a serious buildup of a Data
Mining Team at Credit Suisse. You are responsible to develop productive
Mining Models with the latest toolsets in Data Mining, such as Darwin from
Thinking Machines.
These models are developed for Marketing Purposes and tested with pilot
campaigns. You will work together with external, international
Specialists (Knowledge Stream Partners) to guarantee that serious Knowledge
can be transferred to the Credit Suisse Data Mining Team.
As a Data Miner you have a Masters or Ph.D. degree in Computer Science
or another one of the hard sciences. You are experienced with the
application of technologies from Statistics or Artificial intelligence
such as Decision Trees, Neural Networks and Nearest
Neighbor methods.
If you are interested and qualified, please contact Andreas Meier at
Credit Suisse at
100567.247@compuserve.com
Previous10NextTop
Date: Mon, 08 Dec 1997 17:37:00 -0500
From: GPS (gps)
Subject: KDD-98 Poster and Call for Tutorial Proposals
Those of you who are members of AAAI have received a very artistic
KDD-98 poster, with KDD-98 new york - inspired logo (in subway style),
New York Times style call for papers, and the data mining skyscraper.
See a small copy of the images at www.kdnuggets.com/meetings.html
I also want to remind all interested researchers, that
KDD-97 featured an extremely strong and popular tutorial program at no
extra cost to conference registrants. Continuing with the tradition
started with KDD-97, KDD-98 will also offer a free tutorial program on
KDD topics. The tutorials are a great way to quickly get acquainted
with various KDD themes. We would be able to present only a limited
number of tutorials, and the selection would be guided by the
perceived quality and relevance to the conference. If you are
interested in giving a tutorial, please send a proposal outlining the
material to be covered by Dec 15, 1998 to Padhraic Smyth,
smyth@ics.uci.edu
Gregory Piatetsky-Shapiro
(in my capacity as KDD-98 General Chair) Previous11NextTop
Date: Mon, 01 Dec 1997 13:32:54 -0500
To: dbworld@cs.wisc.edu,
gps
From: Trish Carbone (carbone@mitre.org)
Subject: First Federal Data Mining Symposium
Reminder! Don't forget to register for AFCEA International's First Federal
Data Mining Symposium, to be held December 16-17, 1997, at the J.W.
Marriott Hotel in Washington, DC. Speakers include government leaders from
COSPO, Customs, IRS, NSF and others, as well as industry and academia
leaders from Virtual Gold, NCR, Manning & Napier, George Mason University
and Jet Propulsion Lab. Exhibitors will be present as well (and we still
have some space, so sign up!).
For more information, the program, and on-line registration, see
Call for Papers
THE FIFTEENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING
July 24-26, 1998
Madison, Wisconsin, USA
The Fifteenth International Conference on Machine Learning (ICML-98)
will be held at the University of Wisconsin, Madison from
July 24 to July 26, 1998. ICML-98 will be collocated with the Eleventh
Annual Conference on Computational Learning Theory (COLT-98) and the
Fourteenth Annual Conference on Uncertainty in Artificial Intelligence
(UAI-98). Seven additional conferences, including the Fifteenth National
Conference on Artificial Intelligence (AAAI-98), will also be held in
Madison (see
Submissions are invited that describe empirical, theoretical, and
cognitive-modeling research in all areas of machine learning.
Submissions that present algorithms for novel learning tasks,
interdisciplinary research involving machine learning, or innovative
applications of machine learning techniques to challenging, real-world
problems are especially encouraged.
The deadline for submissions is MARCH 2, 1998.
(An electronic version of the title page is due February 27, 1998.)
See
There are also three joint ICML/AAAI workshops being held July 27, 1998:
Developing ML Applications: Problem Definition, Task Decomposition,
and Technique Selection
Learning for Text Categorization
Predicting the Future: AI Approaches to Time-Series Analysis
The submission deadline for these WORKSHOPS is MARCH 11, 1998.
Additional details about the workshops are available via