*
Charles Elkan, Walmart usage of association rules *
J. Banholzer, Question: Adaptive Data Mining Systems ? *
I. Haimowitz, Data Mining at GE Corporate Research and Development Publications: *
GPS, Data Mining and Knowledge Discovery, volume 1, number 3
*
Michael Beddows, ComputerWorld on Industry-specific tools emerging, Positions: *
Fred J. Damerau, Natural Language Understanding at IBM, NY *
Wei Zhang, BOEING Applied Research Positions
Education: *
Ronny Kohavi, Training for Data Mining and Visualization using
SGI's MineSet,
--
readers happy holidays and exciting discoveries in the new year.
KDNuggets will be on vacation until Jan 5, 1997.
Gregory Piatetsky-Shapiro (editor).
Knowledge Discovery Nuggets (tm) is a free electronic newsletter for the
Data Mining and Knowledge Discovery community, focusing on the
latest research and applications.
Submissions are most welcome and should be emailed, with a
DESCRIPTIVE subject line (and a URL) to gps.
Please keep CFP and meetings announcements short and provide
a URL for details.
KD Nuggets frequency is 2-3 times a month.
Back issues of KD Nuggets, a catalog of data mining tools
('Siftware'), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site
at
********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Vermonter's Guide to Computer Lingo, continued ...
Excerpted from a newsletter from the Cyberian Outpost
*** The Predictive Toxicology Evaluation Challenge ***
Can an AI program contribute to true scientific discovery? An
area where this gauntlet has been thrown is that of understanding
the mechanisms of chemical carcinogenesis.
The U.S. National Institute of Environmental Health Sciences
(NIEHS) has carried out a large number of rodent carcinogenicity
tests. This has resulted in a large database of compounds clas-
sified to be either carcinogens or non-carcinogens. The
Predictive-Toxicology Evaluation project of the NIEHS provides an
objective way to compare carcinogenicity prediction methods.
The problem of predicting carcinogens presents a formidable chal-
lenge to knowledge discovery programs. Important features of
this problem are:
* involvement in true scientific discovery;
* strong competition from methods used by chemists;
* participation in objective blind-trials; and
* independent evaluation of results by an expert chemist.
This problem has been accepted as an IJCAI-97 Challenge Paper.
Details of the PTE Challenge, and on how to enter your submis-
sions are available at:
Previous2NextTop
From: 'Charles P. Elkan' (elkan@cs.columbia.edu)
Subject: Quote from Walmart
Wal-Mart knows that customers who buy Barbie dolls (it sells one
every 20 seconds) have a 60% likelihood of buying one of three types of
candy bars. What does Wal-Mart do with information like that? 'I don't
have a clue,' says Wal-Mart's chief of merchandising, Lee Scott.
Source: Palmeri, Christopher.
Believe in yourself, believe in the merchandise.
Forbes v160, n5 (Sep 8, 1997):118-124.
I'm sure that association rules and unsupervised learning in general
have some good applications in business, but it's not always obvious what
they are.
Charles
[ Here is a challenge for the readers -- what are the possible uses of
such an association for Walmart?
E.g. if they want to sell more candy of type A,
can they do it by packaging it together with Barbie dolls?
Please email your ideas to gps@kdnuggets
and I will summarize.
-- GPS] Previous3NextTop
Date: Tue, 09 Dec 1997 22:26:45 +0100
From: 'Joerg banholzer' (s_banhol@ira.uka.de)
Subject: Re: looking for adaptive Data Mining Systems
I wanted to ask for adaptive Data Mining
Systems. The Systems should learn while beeing used and for that bring
out better results. The Systems also should take note of environmental
changings such as for example, changing habits of peoples. Can yout tell
me which systems achieve these aims or where I can find more about
theses systems?
Thanks,
Joerg Banholzer eMail: s_banhol@ira.uka.de
Previous4NextTop
From: 'Haimowitz, Ira J (CRD)' (haimowitz@exc01crdge.crd.ge.com)
Subject: Data Mining at GE Corporate Research and Development
Date: Wed, 10 Dec 1997 21:30:25 -0500
The Information Technology Laboratory of the General Electric Research
and Development Center in Schenectady, NY
has a growing group in data mining and data warehousing. Our team
utilizes techniques from multiple
disciplines to analyze GE's large business data sets. Our approaches
include multivariate statistics, machine learning,
knowledge representation, interactive OLAP development, and data
warehousing. Applications include target marketing for retail and
insurance customers, market research for equipment service, analysis of
drivers for service quality, portfolio risk management by outlier
detection, and manufactured product quality.
For more information, and for employment opportunities, please visit our
Web sites:
The third issue of Data Mining and Knowledge Discovery journal has been
published. It contains
Contents:
Editorial, Usama Fayyad, pp. 237-239
Levelwise Search and Borders of Theories in Knowledge Discovery,
Heikki Mannila, Hannu Toivonen, pp. 241-258
Discovery of Frequent Episodes in Event Sequences,
Heikki Mannila, Hannu Toivonen, A. Inkeri Verkamo, pp. 259-289
Adaptive Fraud Detection, Tom Fawcett, Foster Provost, pp. 291-316
On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach
Steven L. Salzberg, pp. 317-328
Data Mining and Knowledge Discovery has a number of good research
papers in the pipeline and several excellent issues are coming out
soon.
The journal is also looking for short (3-5 pages) papers describing
significant and successful deployed applications. The turnaround on reviewing
such papers is quick and it provides a good forum for practicioners to
document their work. Please see the journal homepages at
Previous6NextTop
From: Michael Beddows (mbeddows@kstream.com)
Subject: Data Mining Offers New Challenges & New Rewards - Report
Date: Tue, 9 Dec 1997 12:10:22 -0600
Web:
Data Mining Offers New Challenges & New Rewards - Report
LONDON, ENGLAND, Newsbytes via Individual Inc. : According to a report
just released by Ovum, data mining is a new industry that is poised for
growth, and offers new challenges and opportunities for companies
seeking information.
The report, entitled 'Ovum Evaluates: Data Mining' claims that data
mining will become standard business practice by the time the Year 2001
rolls around, and will provide real benefits to early adopters.
Before you ask, Ovum defines data mining as the automated analysis of
large or complex data sets in order to discover significant patterns or
trends that would otherwise go unrecognized.
According to the report, the relative immaturity of the data mining
industry is no longer a reason for organizations to hold back on data
mining projects. One of the biggest barriers to the takeup of data
mining, the company claims, has been the relative immaturity of the
tools available.
In its report, Ovum points out that new product releases over the last
six months offer users significant improvement in terms of functionality
and usability.
The report claims that data mining brings a new degree of intelligence
and automation to the process of turning data into information for
competitive advantage. Although exciting, Ovum warns that data mining
can also be a confusing technology.
...
According to Ovum, key areas for improvement are: a better integration
of different data mining techniques and greater automation of the
modeling process; a more imaginative and informative presentation of
results, in order to make interpretation easier; support for knowledge
discovery as a process -- tools need to offer more explicit help for
users setting up and managing a data mining project; and more flexible
deployment options including support for ActiveX, Java, and HTML
(hypertext markup language).
According to Woods, the development of packaged applications based on
data mining technology is also key to mainstream acceptance.
'The first vendors -- working with value-added resellers and other
partners -- to deliver the right applications at the right price will
have an excellent opportunity to grab market leadership. There are
opportunities in areas as diverse as customer retention, network
management, Web site analysis, and data warehouse administration,' he
said.
'Ovum Evaluates: Data Mining' was authored by Eric Woods and Elisabeth
Kyral, and is available immediately from Ovum at UKP995 (US$1,700). The
report is claimed to be the result of nine months of intensive research
and contains detailed evaluations of 12 data mining tools from Angoss,
Datamind, IBM, ISL, Isoft, NeoVista, Pilot, SAS, Silicon Graphics, SPSS,
Syllogic, and Thinking Machines.
Ovum has issued a free white paper on data mining, and has published
this on the Web at
Previous7NextTop
From: Michael Beddows (mbeddows@kstream.com)
Subject: Industry-specific tools emerging
Date: Tue, 16 Dec 1997 15:36:14 -0600
ComputerWorld via Individual Inc. : In an effort to increase marketplace
acceptance, vendors are focusing their latest generation data-mining
tools on specific applications or industry segments.
The reason: 'Corporations are finding it difficult to take raw
technologies and apply them to their business. They need solutions that
can be absorbed faster so they can see the [return on investment]
faster,' said A. J. Brown, vice president of marketing at DataMind Corp.
in Redwood City, Calif.
To hide some of the complexity and broaden the base of users beyond
statisticians, vendors are packaging application-specific code along
with the mining engines. That makes the systems faster to implement and
easier to learn.
Applications that vendors are targeting include database marketing and
fraud detection. Industries for which tailored packages have been
designed include retail, banking and telecommunications.
Because data-mining technology is complex, time-consuming and expensive,
getting a return on investment (ROI) -- and even accurate results -- can
be a lengthy process [CW, Dec. 1]. Data-mining tools use advanced
techniques in mathematics and artificial intelligence to uncover
patterns in data and develop predictive models. Those models are then
used to help solve business problems.
The targeted approach is new, and most of the products are new or
emerging.
Bank of America in San Francisco is evaluating a product geared toward
commercial banks. The product line, from HyperParallel, Inc., includes
application templates called Solution Frameworks and a mining engine
called Discovery.
HyperParallel has templates for the retail industry for functions such
as markdown management; for banking for functions such as fee tolerance;
and for telecommunications. Eight templates are available now; 15 others
will be available by the end of next year, according to the company.
good place to start
Although Bank of America has a 22-person database marketing department,
including eight statisticians, it is still interested in the
banking-specific templates. 'We will customize anything that Bank of
America buys, but a template is a good place to start,' said Chris
Kelly, vice president and manager of database marketing.
Kelly said the templates will cut deployment and training time. 'I like
the concept a lot,' he added.
Kelly already has experience with HyperParallel. This year, he
outsourced a project to the company to score the likelihood of each of 6
million customers' leaving the bank. Using a decision-tree type of
algorithm called induction, HyperParallel scored customers every couple
of months. Depending on customers' profitability, Bank of America can
then decide how much to spend to retain them.
Kelly said the bank has made $4 in profit for every $1 it has spent on
customer- retention strategies, though he declined to reveal the cost of
the program. He said he outsourced the data-mining portion of the
project because in-house statisticians didn't have time to do it.
HyperParallel isn't the only data mining vendor developing niches. Both
DataMind and Unica Technologies, Inc. in Lincoln, Mass., target database
marketing. Magnify, Inc., based in Chicago, concentrates on fraud.
Knowledge Discovery One, Inc. in Austin, Texas, focuses on retailers
with its Retail Discovery Suite. And SAS Institute, Inc. in Cary, N.C.,
announced an alliance last month in which it will write interfaces
between its upcoming Enterprise Miner and a marketing campaign
management tool called ValEx from Exchange Applications, Inc. in Boston.
Fleet Financial Group, Inc., based in Boston, plans to deploy the
Enterprise Miner/ValEx combination. The banking and financial services
company will complete installation of ValEx in February. It already uses
the current- generation SAS tool and is a beta site for Enterprise
Miner.
'The goal is to build a fully integrated marketing promotion and
data-mining and analysis environment,' said Randall Grossman, senior
vice president of customer data management and analysis.
Using the current SAS tool, Grossman's team builds predictive models and
scores customers. They then must import the results into ValEx. When the
tools are integrated via application programming interfaces next year,
this process will be automated.
Grossman has projected a five-year ROI of 138% for a 70-person staff,
data warehouse and tools -- including online analytic processing, mining
and campaign management. The company invested $30 million in the
project.
some experience needed
However, Grossman, who has an academic background in economics, said he
is skeptical about mining vendors' claims that these tools are simple
enough for nonstatisticians. He said untrained business users might
spend lots of money on marketing campaigns based on misleading
correlations uncovered through data mining.
'You can get yourself in a lot of trouble,' Grossman said. 'I really
think it is important to understand the underlying models and variables.
'
Hill advises clients to evaluate all segments of a mining product --
from the algorithms to user interfaces. 'You want to make sure the
product handles all phases well,' he said. Otherwise, getting an ROI
will be a frustrating experience.
<>
Previous8NextTop
Date: Wed, 10 Dec 97 08:06:46 EST
From: 'Fred J. Damerau (862-2214)' (DAMERAU@watson.ibm.com)
To: gps
************************************************************************
********** PROGRAMMING POSITION ******************
********** NATURAL LANGUAGE UNDERSTANDING GROUP ******************
********** IBM T. J. WATSON RESEARCH LABORATORY ******************
********** YORKTOWN HEIGHTS, NEW YORK ******************
************************************************************************
The Natural Language Understanding Group, Mathematical Sciences Dept.,
IBM Thomas J. Watson Research Center (Yorktown Heights, NY) has an
opening for a Research Associate/Programmer (M.S. level). This is
a temporary, renewable one-year position. Primary job responsibility
will be the design and development of industrial strength SW in the
areas of text analysis/mining. We are looking for someone who is
interested in building systems to be deployed in real world applications
or products, i.e., in bridging the gap between research prototype and
systems impacting the real world. There is a strong emphasis on
self-motivation, broad competence in computer science/computational
linguistics, team work/communication skills, creativity and execution,
and serious programming experience (see below). Although there are no
guarantees, we expect this area to grow and so for the right person,
there is opportunity for renewal of the contract (up to 3 years) or
transition to a regular position. Here's what we're looking for:
Qualifications:
The ideal candidate would have the following knowledge and experience.
Education: MA/MS in computer science or other field with background in
computer science.
Programming languages:
Knowledge and experience in C/C++ required; Java is a plus.
Specialized Background:
Experience in implementing machine learning algorithms and/or
natural language processing algorithms is a plus.
Operating systems:
Required: Familiarity with Windows95/NT and Unix/AIX,
System programming/API experience on these operating systems not required.
General Software Development:
Familiarity with issues of large scale software development, e.g.,
API design and use, creation and integration of DLLs/Libraries,
source code control systems etc.
Candidates should send resumes and supporting letters to:
Thomas Hampp
eMail: hampp@watson.ibm.com
phone: 914-945-1714 Previous9NextTop
Date: Mon, 15 Dec 1997 11:44:44 -0800
From: zhangw@redwood.rt.cs.boeing.com
(Wei Zhang)
Subject: BOEING Applied Research Positions
The Information Management and Collaborative Technologies group in the
Applied Research and Technology division of the Boeing Company has
several openings for key technical contributors at various levels of
seniority. As part of an applied research organization, these
contributors will lead and participate in the assessment, definition,
adaptation, and deployment of advanced technologies into Boeing's
information processing environment. This environment includes a host
of very large, heterogenous information sources from the business,
engineering, and manufacturing domains, providing a great source of
motivation for applied researchers who are interested in making a real
impact with the results of their work.
Desirable areas of expertise include, but are not limited to:
-- Information Dissemination in Hybird Network Environments
-- Performance and Scalability of VLDB and Middleware systems
-- Data Mining and Warehousing
-- Information Mgmt in Asynchronous, Collaborative Environments
-- Workflow Management
QUALIFICATIONS
==============
Successful candidates possess a Ph.D. with a proven track record in
research or advanced development in the area of information management
and/or collaborative technologies. Ideal candidates have a balance of
research and practical development skills, however, exceptional
candidates with a very strong research record or advanced development
background are also encouraged to apply. Excellent communication
skills are important.
HOW TO APPLY AND LEARN MORE
===========================
Boeing can offer a competitive salary, comprehensive benefits, and the
satisfaction of contributing to the largest aerospace company in the
world. Please send your resume, including three references and salary
requirements, to:
Pamela Drew, Manager
Information Management and Collaborative Technologies Group
Attn: SEARCH
Boeing Shared Services Group
PO Box 3707, MS 7L-49
Seattle WA. 98124-2207
Applications will be accepted until the positions are filled.
Previous10NextTop
Date: Tue, 9 Dec 1997 00:33:41 -0800
From: Ronny Kohavi (ronnyk@starry.engr.sgi.com)
Subject: Training for Data Mining and Visualization using SGI's MineSet
Web:
We are pleased to announce an end-user level course for data mining and
visualization using Silicon Graphics' MineSet product. The course is
geared towards anyone interested in understanding data mining and
visualization with MineSet hands-on experience.
By attending this course, you will understand:
1. Data mining and knowledge discovery.
2. The MineSet product, capabilities, and limitations.
3. How to use MineSet to solve your business problems
and maximize the value of your data.
4. The MineSet interfaces that allow building
applications around MineSet, web-launching, and deployment.
The three-day course is provided by Silicon Graphics' Customer
Education and will be held in Mountain View, CA starting Jan 20, 1998.
The classroom is set up with Silicon Graphics workstations to
facilitate hands-on training. The class costs $1125, but we are
offering a 50% discount for the first beta class, so the registration
fee is only $562.50. Two alpha classes have already been taught.
under training,
where you can also find more information.
Space for the beta class is very limited, so register early to ensure
your place.
For technical questions about the course, please send e-mail to
mineset@postofc.corp.sgi.com
For questions about registration and payment, please call
1.800.800.4744 (option 4) or fax to 650-932-0309.
What people who have taken the MineSet alpha course have said:
* 'SGI's MineSet provides the leading visualization capabilities of any
analytical tool in the data mining space -- and that is a powerful
advantage to communicating the results of data mining analysis. The
MineSet training focuses on taking students through the steps necessary
to understand a valuable business scenario with advanced analytics.
Marketing analysts will walk out of the course with a keen interest to
deploy a pilot solution immediately -- and an understanding of where
to start.'
-- John Miller, Emergent corporation
* 'The Mineset training course at Silicon Graphics put me face to face with
the engineers responsible for the product. The instructors were
knowledgeable, helpful and able to answer any question that came up. There
was a good balance of lectures and hands-on experience. I left feeling ready
to put Mineset to work on real business problems.'
-- Michael Berry, Naviant Technology Solutions
Co-author of Data Mining Techniques for Marketing, Sales, and
Customer Support.
* 'MineSet training exceeded our expectations of what we learned about the
tool. I really enjoyed meeting all of the SGI people. The
training was excellent and the knowledge of the instructors was
helpful especially with our specific questions.'
-- Jolene Hartman, Andersen Consulting
--
Ronny Kohavi, Engineering manager, MineSet.
Maximize the value of your data with data mining and visualization.
Previous11NextTop
Date: Tue, 09 Dec 1997 08:32:33 -0800
From: Padhraic Smyth (smyth@sifnos.ics.uci.edu)
Subject: New Master's program at UC Irvine:
opportunities in KDD and data mining
Web:
New Masters Degree Program in Information and Computer Science at UC Irvine
The Department of Information and Computer Science has started a
new MS degree program including a concentration in Artificial
Intelligence that may interest readers of this group. Previously, the
department only accepted students into the Ph.D. program. Students in
the MS program can get considerable explosure to both the theory and
practice of Machine Learning and Data Mining. Faculty with research
interests in this area include:
Rina Dechter- Automated Reasoning, Constraint Networks, Bayesian Networks
Rick Granger- Neural Networks, Computational Neuroscience
Dennis Kibler- Machine Learning, Instance Based Learning, Prototype Learning
Rick Lathrop- Intelligent Systems in Molecular Biology, Machine Learning
Michael Pazzani- Machine Learning, Cognitive Science, Intelligent Agents
Padhraic Smyth- Probabilistic Learning, Data Mining, Pattern Recognition
In addition there are several other faculty with relevant interests
in areas such as Computer Science Theory, Human-Computer Interaction,
and Applied Multivariate Statistics.
The department offers a wide range of courses at the graduate
level. Specific courses relevant to machine learning and data mining
include:
- - Machine Learning
- - Probabilistic Learning: Theory and Algorithms
- - Data Mining
- - Network-Based Reasoning / Belief Networks
- - Neural Networks
- - Information Retrieval, Filtering and Classification
- - Descriptive Multivariate Statistics
- - Human Computer Interaction
Students also may take a wide range of courses outside the ``core'
areas of learning and data mining, including:
- - Representations and Algorithms for Molecular Biology
- - Software Engineering
- - Mathematical Models in Cognitive Science
- - User Interfaces
- - Analysis of Algorithms
- - Online Algorithms
The faculty have active research projects supported by NSF, ONR, AFOSR
and NASA, and are engaged in joint R&D projects with numerous industrial
sponsors on a wide variety of topics related to learning and
data mining.
Application material, including an online application, can be found
on the WWW at
On March 4-6, 1998, UCLA Extension will present the short course,
'Evolutionary Computation: Principles and Applications', on the UCLA
campus in Los Angeles.
The instructors are Melanie Mitchell, PhD, Research Professor,
Santa Fe Institute; Lawrence Davis, PhD, President, Tica Associates;
and Una-May O'Reilly, PhD, Research Fellow, AI Laboratory, MIT.
Each participant receives a copy of the book, ' An Introduction to
Genetic Algorithms', M. Mitchell (MIT Press 1996), and extensive
course notes.
This course introduces engineers, scientists, and other interested
participants to the burgeoning field of evolutionary computation.
Evolutionary computation--genetic algorithms, evolution strategies,
evolutionary programming, and genetic programming--is a collection
of computational techniques, inspired by biological evolution, to
enhance optimization, design, and machine learning. Such techniques
are increasingly used to great advantage in applications as diverse as
aeronautical design, factory scheduling, bioengineering, electronic
circuit design, telecommunications network configuration, and robotic
control.
The course fee is $1395, which includes extensive course materials.
These materials are for participants only, and are not for sale.
For a more information and a complete course description, please
contact Marcus Hennessy at:
This course may also be presented on-site at company locations.
Previous13NextTop
From: 'Alex Kogan' (kogan@rutcor.rutgers.edu)
Date: Wed, 17 Dec 1997 12:03:47 -0500
Subject: Fifth International Symposium on AI and Mathematics
Web:
The International Symposium on Artificial Intelligence and Mathematics is
the fifth of a biennial series. Our goal is to foster interactions among
mathematics, theoretical computer science, and artificial intelligence.
The meeting includes paper presentation, invited speakers, and special topic
sessions. Topic sessions in the past have covered computational learning
theory, nonmonotonic reasoning, and computational complexity issues in AI.
(Cf., 1996 Symposium.)
The editorial board of the Annals of Mathematics and Artificial Intelligence
serves as the permanent Advisory Committee for the series.