Positions: *
Alwyn Barry, Research Assistant, Bristol, UK. Using GA for
the analysis of large consumer data sets *
Scott Clendaniel, Data Mining Position at Bank of Hawaii Meetings: *
T. Carbone, First Federal Data Mining Symposium,
December 16-17, 1997, Washington, D.C.,
--
Data Mining and Knowledge Discovery community, focusing on the
latest research and applications.
Submissions are most welcome and should be emailed, with a
DESCRIPTIVE subject line (and a URL) to gps.
Please keep CFP and meetings announcements short and provide
a URL for details.
KD Nuggets frequency is 2-3 times a month.
Back issues of KD Nuggets, a catalog of data mining tools
('Siftware'), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site
at
********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
One of the facts discovered from a survey of daily specials at
USA restaurants:
Cities with a Republican mayor serve
about 15 percent more red meat specials
than those with a Democrat. Cities
with a Democrat mayor tend to serve
more fried food, goat cheese and pesto
specials.
(thanks to Tom Fawcett) Previous1NextTop
Date: Wed, 8 Oct 1997 15:28:27 -0400
From: gps
(Gregory Piatetsky-Shapiro)
Subject: KDD-97 statistics
Here are some KDD-97 statistics (thanks to AAAI and Carol Hamilton).
Number of attendees: 577, excluding exhibitors and workshop only participants
Demographics: (349 respondents to survey)
Student 11%
Staff Scientist 5%
Research Scientist 21%
Engineer 6%
Programmer/Analyst 9%
Management 10%
Consultant 9%
Systems Analyst 2%
Administrator <1%
Project Leader 7%
Univ/Coll Educator 13%
Other 6
Number of Exhibiting Companies: 17, plus 9 research demos and 49 poster
presentations Previous2NextTop
Date: Sun, 26 Oct 1997 19:43:37 -0500
From: Gregory Piatetsky-Shapiro (gps@kstream.com)
Subject: Data Mining Story in Washington Technology
URL:
Washington Technology (a biweekly supplement to Washington Post)
in Oct 23, 1997, published a good article by John Makulowich,
entitled 'Data Mining Developments Gain Attention'.
It talks about the recent developments and trends in data mining
community, and has interviews with myself (Gregory Piatetsky-Shapiro),
Gerard Montgomery, president and CEO of AbTech Corp.,
George John, Data Mining Guru at Epiphany Marketing Software,
and Mike Schiff, former executive at Oracle and now
with Current Analysis, and
Ted Meyer, chief technology officer of Fortner's.
Here are a first few paragraphs ...
----
October 23, 1997
Data Mining Developments Gain Attention
By John Makulowich
Nearly five years ago, 50 researchers who had taken part in a
Knowledge Discovery and Data Mining conference workshop
received Gregory Piatetsky-Shapiro's electronic newsletter,
Knowledge Discovery Nuggets, once a month.
The newsletter's readership has grown to 4,000. Its frequency has
increased to two to three times per month, and the community of
knowledge discovery and data mining professionals
has blossomed. And data mining, the use of statistical methods with
computers to uncover useful
patterns inside databases, continues to attract more and more attention in
the business and scientific
communities.
'We are clearly moving into the next generation of data mining systems,
toward more and more
embedded solutions. The early data mining efforts were research driven;
they did one data mining task,'
said Piatetsky-Shapiro, currently editor of the newsletter and director of
applied research at Knowledge
Stream Partners, a data mining and customer modeling company based in Chicago.
Site www.wtonline.com is free, but requires registration.
It is worth the trouble to register, since there are frequently good
technology-related articles.
Previous3NextTop
From: Rob Gerritsen (rob@xore.com)
Subject: Exclusive Ore Inc., a new data mining company
Date: Wed, 8 Oct 1997 16:47:36 -0700
Exclusive Ore Inc. is pleased to announce that it and its Internet
site
are in business. Exclusive Ore Inc. was
founded specifically as a consulting and software services company
committed to innovation and excellence in data mining.
The founders of Exclusive Ore. have had direct hands-on experience
with more than a dozen of the leading data mining products. As such,
Exclusive Ore is uniquely positioned to assist clients in the
selection and deployment of appropriate data mining tools and
technology. To data mining tool vendors Exclusive Ore can provide
added comparative perspective on features, user interface and
performance.
The Exclusive Ore Internet site contains a data mining product
features table that compares key features of approximately 15 of the
top products and has hot links directly from the features table to
each product. The site, which is still under construction, will also
have tutorials on various data mining technologies and problems,
illustrated with real examples that, at the same time, show how
various products work.
COMDEX Internet & Object World - in Frankfurt/ Main
from 7 to 10 October,1997:
Over 4,000 specialists and decision makers attended COMDEX Internet &
Object World in Frankfurt / Main to gain valuable insight in today's most
important revolution in the computer industry: The Internet and Object
Technology.
'The Internet as an agent of change in the way business is conducted in
the enterprise, and Objects as the underlying software foundation
technology', were the main theme for both the exhibits ( October 8 to 9) ,
and the conference ( October 7 to 10) at COMDEX Internet & Object World
Frankfurt `97.
In his keynote on October 8, Bill Gates, Chairman and CEO of Microsoft
Corporation, reinforced the importance the Internet for the future of
the enterprise, and presented his vision of a 'Digital nervous system'
as the future infrastructure for conducting business.
92 exhibitors (78 in 1996) , including leading companies such as Computer
Associates, Digital Equipment, IBM, Microsoft, Siemens Nixdorf, Software
AG, Sybase, and Sun Microsystems, presented their solutions, products and
services for the Internet and Object Technology in the 2,400 sqm exhibits
floor ( 2,000 sqm in 1996).
Over 840 conference delegates (780 in 1996) from 16 different countries
attended the four days of the COMDEX Internet & Object World Conferences,
to hear from over 150 international speakers issues such as ' The
Introduction of Object Technology in the Enterprise', ' The Business
value of the Internet ', and 'The Future of Electronic Commerce'.
51% of the conference delegates came from outside Germany (40% in 1996).
'When we designed COMDEX Internet & Object World Frankfurt, we wanted to
offer an international event where decision makers from all over the
world could gather important information and exchange experiences.
The results showed we have reached this goal. ' say Professor Roberto Zicari,
Chair of the Advisory Board of COMDEX Internet & Object World Frankfurt.
COMDEX Internet & Object World Frankfurt presented two Awards this year.
In the evening of October 8, the First COMDEX Internet Applications Awards
were given to the best Internet Custom Applications. The winners were:
Frankfurt Airport Authority (Germany), Robert Bosch (Germany), and
Argentaria Bank (Spain).
On October 9, the fourth Object Applications Awards were given for best
applications using object technology. The winners were: Dresdner Bank
(Germany), debis Systemhaus (Germany), Swiss Telecom PTT (Switzerland), and
Electricite`de France (France).
***
COMDEX Internet & Object World will return to Frankfurt/Main as part of a
comprehensive industry event , COMDEX Enterprise. COMDEX Enterprise will
focus on business solutions, applications, platforms and tools for the
enterprise.
COMDEX Enterprise will be held from 28 to 30 September, 1998 at the Messe
Frankfurt.
***
Previous5NextTop
From: 'Jerome H. Friedman' (jhf@stat.Stanford.EDU)
Date: Fri, 17 Oct 1997 13:41:28 -0700 (PDT)
Subject: Bump Hunting in High-Dimensional Data
Jerome H. Friedman
Stanford University
Nicholas I. Fisher
CMIS - CSIRO, Sydney
ABSTRACT
Many data analytic questions can be formulated as (noisy) optimization
problems. They explicitly or implicitly involve finding simultaneous
combinations of values for a set of ('input') variables that imply
unusually large (or small) values of another designated ('output')
variable. Specifically, one seeks a set of subregions of the input
variable space within which the value of the output variable is
considerably larger (or smaller) than its average value over the
entire input domain. In addition it is usually desired that these
regions be describable in an interpretable form involving simple
statements ('rules') concerning the input values. This paper describes
a new procedure directed towards this goal based on the notion of
'patient' rule induction. This patient strategy is contrasted with the
greedy ones used by most rule induction methods, and semi-greedy ones
used by some partitioning tree techniques such as CART. Applications
involving scientific and commercial data bases are presented.
Keywords: noisy function optimization, classification, association,
rule induction, data mining.
Note: This postscript does not view properly on some older versions
of ghostview. It seems to print OK on nearly all postscript printers.
Previous6NextTop
From: 'George H. John' (gjohn@epiphany-ms.com)
Subject: New Book Announcement
Date: Wed, 8 Oct 1997 13:17:43 -0700
Now available! _Enhancements to the Data Mining Process_ by
George H. John, a doctoral dissertation from Stanford University with
lessons for data mining practitioners, researchers, and students
alike.
'The introduction should be required reading for all data
analysts... I couldn't put it down!'
-- Prof. Jerry Friedman, co-Inventor of CART
'Insightful, readable, and with a touch of humor'
-- Herb Edelstein, President, Two Crows Corporation
The book is organized around the data mining process, with each
chapter discussing a new method for handling one step, such as data
extraction, data cleaning, or data engineering. The bulk of the
dissertation is geared towards a technical audience, but the
introductory chapter should be readable by a wide audience. All of the
technical chapters begin with motivating exmples, and with nearly 60
figures and tables, readers who wish to skip the mathematical formulas
should benefit from reading the technical chapters as well.
The thesis can be ordered from UMI at 800-521-0600, dissertation
#9723376. It is also available at my website, click on 'New Book'
from xenon.stanford.edu/~gjohn. From there you can also get the
abstract, table of contents, and advertisement for the book.
________________________________________________________________________
George H. John gjohn@epiphany-ms.com
xenon.stanford.edu/~gjohn
Data Mining Guru, Epiphany Marketing Software PhD, Stanford University
________________________________________________________________________
Previous7NextTop
Date: Sat, 11 Oct 1997 20:30:39 -0400 (EDT)
From: quinlan@rulequest.com
(Ross Quinlan)
Subject: New Tool for Predicting Values (Regression)
URL:
RuleQuest Research has just released Cubist, a system for building
rule-based piecewise linear models. As with its companion tool
C5.0/See5, simplicity and intelligibility of models is emphasized.
Cubist runs on PCs (Windows 95/NT) and on a variety of Unix
platforms. A free evaluation licence is available.
Previous8NextTop
From: Alwyn Barry CSM Staff (Alwyn.Barry@csm.uwe.ac.uk)
Date: Fri, 17 Oct 1997 13:29:15 +0100
Subject: Research Assistant Vacancy, Bristol, UK.
Funded by the UK government's 'Teaching Company Directorate', the
University of the West of England and The Database Group are about
to start a research project to apply GA research to the analysis
of large consumer data sets.
A two year vacancy exists for a 'Teaching Company Associate' (Research
Assistant) who will carry out investigations leading to the production
of a prototype tool for the analysis of large consumer data sets.
Prospective candidates should have a first class or 2:1 degree in
Computer Science with a strong background in Statistics, and
preferably a MSc of MA in Statistics, Artificial Intelligence, or
another related subject area. There may be opportunity for suitable
candidates to register for a M.Phil or PhD qualification.
The successful candidate will be based in Bristol working with
The Database Group (a respected UK-based Data Analysis company) under
the supervision of, and with access to The University of the West of
England.
For further details, email : Alwyn.Barry@csm.uwe.ac.uk
or write to :
Mrs Margret Needs,
TCS Administrator,
Faculty of Computer Studies and Mathematics,
The University of the West of England,
Coldharbour Lane,
Frenchay,
Bristol, UK.
BS16 1QY.
Previous9NextTop
Date: Mon, 13 Oct 1997 02:31:56 -0400 (EDT)
From: Autumn1095@aol.com
(Scott Clendaniel)
Subject: Data Mining Position at Bank of Hawaii
The Bank of Hawaii, a division of Pacific Century Inc. (www.boh.com) is
seeking a full-time employee to help apply state-of-the-art knowledge
discovery and data mining approaches for a variety of real-world problems.
Specifically, the position will be responsible for using techniques such as
neural networks and Bayesian approaches to find models to optimize the level
of positive customer responses to offers for credit cards and installment
loans.
Further models will need to be developed to predict customer attrition, usage
levels, profitability, and likelihood to default on credit obligations. The
position will have a fair degree of freedom in identifying and selecting the
most successful algorithm for the problems at hand, and will be expected to
monitor the fields of artificial intelligence, statistics, and database
design to constantly upgrade their skill level. The applicant must be
willing to experiment and build methodolgies in a champion/ challeger format
to constantly improve the approaches taken.
Data sources will focus on behavior-driven variables, especially credit
bureau infromation and behavior patterns (including purchase and payment
activity). There will be a very rich data set to work with.
The successful applicant will have a background in two of the following three
areas: applied statistics, artificial intelligence/ predictive modeling, and
marketing (especially direct response marketing). Emphasis on techniques
such as naiive Bayes, Bayesian networks, back-propogation neural networks,
CART, and rule induction approaches will be a definite plus. A minimum of a
masters degree is required.
Outstanding communication skills are a prerequisite. The candidate needs to
be able to work with and train colleagues with virtually no data mining
experience, as well as to present complex work to senior management with a
similar lack of data mining background.
Experience with a major financial institution would be a definite plus. The
candidate must be willing to relocate, perhaps more than once, to Hawaii,
Arizona and/ or Delaware.
Please send your resume, along with a sample of a paper you wrote on your
work, to autumn1095@aol.com,
or fax it to 808-693-1725 to the attention of
Scott Clendaniel, Vice President of Cardholder Services, Bank of Hawaii.
Previous10NextTop
Date: Wed, 22 Oct 1997 21:57:59 -0500
From: Trish Carbone (carbone@mitre.org)
Subject: First Federal Data Mining Symposium
URL:
>>Reminder: If you wish to submit a paper to the Federal Data Mining
>>Symposium, please do so by 31 October 1997!
FIRST FEDERAL DATA MINING SYMPOSIUM
'TECHNICAL ADVANCES AND APPLICATIONS IN THE GOVERNMENT COMMUNITY'
December 16-17, 1997, J.W. Marriott Hotel, Washington, D.C.
AFCEA and the participating commands proudly present the first Federal
Data Mining Symposium. The theme for Data Mining is 'Technical Advances
and Applications in the Government Community,' emphasizing the need for
better and more automated methods of analysis. As the amount of data
being collected and stored increases dramatically, it becomes imperative
that we prepare for the future, not just within the Government Community
but also within any community wherein data mining preparation, will be
helpful with decision-making in the future.
This event is for YOU if you are: a data user, analyst, administrator,
manager, developer, researcher, theoretician, or vendor who works in any
aspect of data mining or is interested in finding solutions to your
technical needs.
THE SIX MAJOR GOALS OF THE SYMPOSIUM:
* Exchange information and ideas on the role of data mining.
* Present requirements and proposed solutions.
* Provide discussions on the broad range of applicable technologies.
* Provide policy guidance applicable to DoD and civil agency
information resource managers.
* Identify and encourage service-unique and government-wide
knowledge.
* Stress the importance of this technology!
Stay up to date, share with your peers and learn what works and what
doesn't...
Sessions will cover the following critical areas:
USER REQUIREMENTS FOR BETTER ANALYSIS METHODS: DATA MINING TECHNIQUES
What requirements do government analysts and decision-makers have for
data mining? What kinds of decisions need to be made, and what data
contain that information? What are the time requirements for pulling out
the information? What types of data must be analyzed? How much
security or privacy must be assured?
APPLICATIONS OR NEW TOOLS TO SOLVE THE ANALYSIS PROBLEM
What new techniques are being used to address users' problems? Is the
concentration on statistical, machine learning, or visualization
techniques? Or perhaps the application requires a combination of
several. These discussions will describe how data mining techniques
were employed in actual applications.
LESSONS LEARNED
The question on everyone's mind is, 'Does this stuff rally work? Is
there a payoff to be gained?' Hear what developers have learned from
building applications and applying the technology to solve user
requirements.
TOOLS AND TECHNOLOGY
There are many types of data mining techniques currently being
advertised. The employment of a technique is dependent on the type of
data, the type of information desired, and the structure of the data
being analyzed. What are the tools that are available? How successful
were they in addressing a need? How does one tool compare to another?
These discussions will contribute to an understanding of the different
types of tools that are available.
LEARN THE LATEST DEVELOPMENTS IN:
* User Requirements for Data Mining * Applications for Data Mining *
Data Mining from Multimedia Data: Text/Imagery/Geospatial * Lessons
Learned from Constructing Data Mining Systems * Solutions to Data
Mining Problems: Noisy Data/Uncertain Data/Incomplete Data/Dynamic
Data * Data Cleansing as part of the Data Mining Process *
Visualization as part of the Data Mining Process * Validation and
Verification of Discovered Knowledge * Security/Provacy Concerns and
Solutions * Employment of Discovered Knowledge in Decisio Support or
Other Systems
PROGRAM AT A GLANCE
Tuesday, December 16
-----------------------------
Registration & Continental Breakfast
7:30 - 8:30 AM
THEME ONE: 'Agency Perspectives on Data Mining and Data Warehousing'
THEME TWO: 'Agency Perspectives Leading to User Requirements'
Luncheon Address & Exhibits (included in the registration fee)
THEME THREE: 'Data Mining Solutions'
SPECIAL PANEL SESSION - Reprise of TechNet '97
Symposium Reception & Exhibits (included in the registration fee)
Wednesday, December 17
----------------------------------
Registration & Continental Breakfast
7:00 - 8:30 AM
THEME FOUR: 'Data Mining Applications'
THEME FIVE: 'Technical Issues in Data Mining'
Luncheon Address & Exhibits (included in the registration fee)
THEME SIX: 'Lessons Learned'
SPECIAL PANEL SESSION - 'Data Mining and the Government - Are They Ready
for a Relationship?' Session Chairman: Dr. Larry Kerschberg, Professor
of Information Sciences, George Mason University
*program subject to change
SPEAKERS:
KEYNOTE ADDRESS, December 16
Invited Speaker: Dr. Ruth David
Deputy Director for Science and Technologty
Central Intelligence Agency
LUNCHEON ADDRESS, December 17
Invited Speaker: GEN Dennis J. Reimer, USA
Chief of Staff
United States Army
Speakers to date... With more to come!
* Ms. Valerie Boykin, Project Director, Enterprise Data Warehouse, U.S.
Customs Service
* Dr. Inderpal Bhandari, President, Virtual Gold, Inc.
* Mr. Tej Anand, Director of Knowledge Discovery Solutions, NCR
Corporation, Human Interface Technology Center
* Mr. Mike Weiner, CEO, Manning & Napier Information Services
DATA MINING EXPOSITION:
See the latest tools available for data mining! Also, if your company
would like to join as an exhibitor, call Christine Downing at
(703)631-6200 or (800)336-4583 ext. 6200 for additional information.
THE DATA MINING FACILITY
J.W. MARRIOTT HOTEL
The J.W. Marriott Hotel in Washington D.C., is the complete meeting and
convention site for the first Federal Data Mining Symposium, offering
registration, conference, exhibit, dining space and sleeping
accommodations - just seconds apart. AFCEA has set aside a block of
rooms - register by November 25, 1997 and mention you are with AFCEA.
CALL (800)228-9290 OR (202)393-2000.
Three easy ways to register...
* MAIL WITH PAYMENT TO:
AFCEA Events Department, AFCEA International Headquarters
4400 Fair Lakes Court, Fairfax, Virginia 22033-3899
* FAX WITH CREDIT CARD INFORMATION TO: (703)631-6133
Contact us for more information
Call: Michelle Japzon at (703)631-6126 or
(800)336-4583, ext. 6126
E-mail to: events@afcea.org
AFCEA supports the Americans with Disabilities Act of 1990. Attendees
with special needs should call (703)631-6126.
Registration fee includes two continental breakfasts, two luncheons, and
reception:
Early Bird Registration
by November 10, 1997 After November 11, 1997
___ AFCEA Member $350 ___ AFCEA Member $375
___ Non-Member $375 ___ Non-Member $400
___ Military/Government $175 ___ Military/Government $185
* TO JOIN AFCEA and take advantage of member discounts, check below or
call (800) 336-4583, or (703) 631-6100.
ICML-98: Call for Workshop and Tutorial Proposals
-------------------------------------------------
The Fifteenth International Conference on Machine Learning
(ICML-98) will be held at the University of Wisconsin,
Madison USA from July 24 to July 26, 1998. ICML-98 will be
co-located with the Eleventh Annual Conference on
Computational Learning Theory (COLT-98) and the Fourteenth
Annual Conference on Uncertainty in Artificial Intelligence
(UAI-98). Seven additional AI conferences, including the
Fifteenth National Conference on Artificial Intelligence
(AAAI-98), will also be held in Madison next summer (see
Since ICML is being co-located with AAAI, there will NOT be
a separate ICML workshop and tutorial program in 1998.
Instead, people interested in submitting ML-related workshop
or tutorial proposals should submit to the corresponding
AAAI program.
Members of the ML community are serving as AAAI workshop and
tutorial co-chairs, and they are aware of the plan to have
joint AAAI/ICML workshops and tutorials. Joint AAAI/ICML
workshops and tutorials will be scheduled for Monday, July 27,
1998, the day between the ICML and AAAI technical programs.
(AAAI has agreed to allow ICML attendees to attend AAAI
workshops andtutorials without requiring attendance at AAAI.)
Please note that the deadlines for these programs are near.
October 31, 1997 is the deadline for AAAI workshop proposals,
while November 14, 1997 is the deadline for AAAI tutorial
proposals.
For those who like to plan far ahead, the deadline for ICML
technical-paper submissions will be March 2, 1998. A
preliminary call for papers, as well as additional conference
information including copies of the AAAI calls for
tutorial and workshop proposals, is available at:
PS - As usual, my apologies to those who receive this
posting multiple times.
Previous12NextTop
Date: Wed, 22 Oct 1997 20:20:33 -0400 (EDT)
From: S RUBIN (rubin@cps.cmich.edu)
Subject: IEEE SMC'98 San Diego Call for Mining Papers
As technical committee chair of six tracks in next years IEEE SMC'98
Conference in San Diego (Oct. 11-14 1998, Hyatt Regency La Jolla, San
Diego, California) I would like to invite you or any of your contacts to
submit a conference paper (max 6 pages) to next years conference.
Details pertaining to the conference itself can be had from the
conference secretariat at email: smc98@coewww.rutgers.edu
or,
SMC'98 SECRETARIAT
Department of Industrial Engineering
Rutgers University
96 Frelinghuysen Rd
Piscataway NJ 08854-8018
This request is for two copies of paper titles and an abstract of less
than 300 words. This year the SMC technical committee will open six new
data mining and knowledge discovery tracks (area 28) having two sessions
of five papers each per track. In recognition of the growing importance
of data mining, the six new tracks are:
1) Algorithmic Techniques for Mining Data
2) Fuzzy Techniques for Database Query
3) Applications of Mining to Expert Systems
4) Applications of Mining in Medicine
5) Data Mining in Government and Industry
6) Data Mining and Machine Learning
Survey and/or case studies could also form the basis for papers in these
tracks. Please mail (1st class) two additional copies of your titles
and abstracts (of no more than 300 words) to the address that follows.
Note that you must also follow the directions on the preliminary
announcement and mail three copies of your abstract to the SMC'98
Secretariat. The requested two additional copies should be received by
December 08, 1997 at
Dr. Stuart H. Rubin
c/o SMC'98 Technical Committee
1604 Canterbury Trail #E
Mt. Pleasant MI 48858-4067
U.S.A.
Please indicate which of the six tracks above best describes your work.
Please allow an extra week for the forwarding of 1st class mail.
Email addresses: rubin@cps.cmich.edu
and jrsa@worldnet.att.com