News: *
GPS, KDD-97 Conference Report and KDD-97 evaluation *
Ismail Parsa, KDD-CUP-97 Summary Publications: *
GPS, Byte 7/97 on Data Mining at your Desk,
Positions: *
Laveen N. Kanal, Maryland: Intelligent Tutoring Systems Positions *
Tom Warden, Menlo Park, CA: Applied Research Position - Data Mining Meetings: *
Yves Kodratoff, EMCSR-98 Symposium: Applications of Data Mining,
April 14-17 1998, Vienna, Austria,
--
Data Mining and Knowledge Discovery community, focusing on the
latest research and applications.
Submissions are most welcome and should be emailed, with a
DESCRIPTIVE subject line (and a URL) to gps.
Please keep CFP and meetings announcements short and provide
a URL for details.
KD Nuggets frequency is about 3 times a month.
Back issues of KD Nuggets, a catalog of data mining tools
('Siftware'), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site
at
********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Surf Guru's estimates that it'll take 10,958 years to browse the
roughly 80 million public pages online (for the typical user who
visits 20 pages a day)
Annette Hamilton, ZDNet AnchorDesk
Previous1NextTop
Date: Thu, 21 Aug 1997
From: GPS (gps)
Subject: KDD-97 Conference Report and Evaluation
Knowledge Discovery and Data Mining 1997 (KDD-97) conference, held
Aug 14-17 in Newport Beach, CA, was a great success, attracting over 600
people.
The participants were treated to 7 free tutorials from leading experts
on topics which included an introduction to data mining, data visualization,
text mining, OLAP, and statistical approaches. There were also a number
of invited talks and reports from related conferences.
Among the technical highlights were papers that presented a framework
for comparing different classifiers under different cost conditions,
insights into how to best combine different classifiers and how to
analyze popular approaches like bagging, and successful application
descriptions in areas such as molecular biology and climate analysis.
Another highlight of the conference was KDD CUP competition, ably organized
by Ismail Parsa (see next item).
A large number of companies exhibited their data mining systems and helped
to generate a good interaction between researchers and developers.
This conference also attracted a large number of statisticians from the
adjacent statistical meeting. KDD-98 will be held in New York, Aug 27-30,
and will be co-located with VLDB-98. Full details on KDD-98 will be announced
soon on KDnuggets and at JPL and AAAI web sites.
and email to gps@kstream.com
the evaluation form
(and please put 'KDD-97 evaluation' in the Subject).
The comments will be kept anonymous and will help us
to make the next conference even better!
On behalf of the Knowledge Discovery Cup committee, I am pleased to
announce the winners of this year's KDD-CUP.
The GOLD MINER award goes to two contestants this year:
Charles Elkan from University of California, San Diego
with his software BNB, Boosted Naive Bayesian Classifier;
and
Urban Science Applications, Inc.
with their software gain, Direct Marketing Selection System.
These two contestants are jointly sharing the 1st and 2nd place.
The BRONZE MINER award goes to the runner-up:
Silicon Graphics, Inc
with their software MineSet.
The awards will be presented at KDD-97 in Newport Beach, CA on August
16 between 5pm and 6pm. The testing methodology will also be
presented during the ceremony.
Thank you for participating.
Ismail Parsa.
KDD-CUP-97 PROGRAM COMMITTEE
Vasant Dhar, New York University, NY, USA
Ronen Feldman, Bar-Ilan University, Ramat-Gan, Israel
Ismail Parsa, Epsilon Data Management, Burlington, MA, USA
Gregory Piatetsky-Shapiro, Knowledge Stream Partners, Cambridge, MA, USA
Performance Evaluation Criteria and Summary of Results:
-------------------------------------------------------
The contestants were evaluated based on their performance on the
validation data set. The following performance metrics were
considered:
a) Gains chart, i.e., lift table listing the cumulative percent of
responders recovered in the top quantiles of the file;
b) Receiver operating characteristics (ROC) curve analysis and the
area under the ROC curve;
c) Statistical tests, i.e., analysis of variance and various
correlational measures between the actual dependent variable and the
predicted probability estimate/score.
The results were almost always indicative of the 'photo finish'
situation between the BNB software and the Gain software. MineSet
software was the consistent runner-up following the top two constants
with very close scores.
Because the results were too close to call, we pursued additional
analyses by repeatedly sampling at random from the validation data
sets and compared the results. In terms of the performance metric,
we settled on the gains charts as the ROC curve analysis results were
closely mirroring these results. Final calls were made based on the
combination of the performance in the top 10 and 40 percent of the
file. The performance in the top 10 percent is looked at as a
measure of precision while the performance in the top 40 percent of
the file is related to the stability and marketing coverage criteria.
An overall performance metric based on the average cumulative percent
of responders recovered up to the 40th percentile of the validation
data set as a whole is listed in Table 1. Table 2 and 3 list the
average performance in the top 10 and 40 percent of the files
repeatedly sampled at random from the validation data set.
----------------------------
Table 1: Average Overall
Performance
----------------------------
Score*
----------------------------
gain 99
BNB 99
MineSet 97
----------------------------
*Rounded to the nearest digit.
---------------------------- ----------------------------
Table 2: Average Performance Table 3: Average Performance
in TOP 10% of File in TOP 40% of File
---------------------------- ----------------------------
Score* Score*
---------------------------- ----------------------------
BNB 100 gain 100
gain 97 BNB 98
MineSet 95 MineSet 98
---------------------------- ----------------------------
*Rounded to the nearest digit.
Previous3NextTop
Date: 7 Aug 1997 09:41:10 -0500 (EST)
From: GPS (gps)
Subject: Byte 7/97 on Data Mining at your Desk
URL:
The article, by Peter Hofland and Jim Utsler,
technology journalists at The Visual Consultancy Corporation in Amsterdam,
reviews software from Isoft, SAS, IVEE, Cognos, and Information Discovery,
and WizWhy.
Previous4NextTop
Date: Sun, 10 Aug 1997 20:20:04 -0700
From: Ronny Kohavi (ronnyk@starry.engr.sgi.com)
To: KDD list (gps)
Subject: Paper: Data Mining using MLC++, A Machine Learning Library in C++
The journal version of the paper 'Data Mining using MLC++, A Machine
Learning Library in C++,' which received the IEEE Ramamoorthy best paper
award at Tool with AI '96 was accepted to IJAIT, the International
Journal on AI Tools.
Previous5NextTop
Date: Fri, 8 Aug 1997 10:34:23 -0400 (EDT)
From: kanal@cs.umd.edu
(Laveen N. Kanal)
Subject: Maryland: Intelligent Tutoring Systems Positions
Two positions open now for a project to develop/implement an
GENERIC AUTHORING TOOL KIT AND SHELL
for
INTELLIGENT TUTORING SYSTEMS
Position 1 M.S. or Ph.d with strong
background and experience in C++, Windows95,
strong Math.background and skills, and
experience or strong interest in Computer Aided Education and
Training. Knowledge of UNIX, Artificial Intelligence,
Neural Net and Fuzzy Logic techniques a plus.
Position 2. M.S or Ph.D in Engineering, Computer Science or
Operations Research, with strong background
and experience in modeling non-linear phenomenon with Artificial
Neural Systems; knowledge of dynamic programming, UNIX and
experience or strong interest in Computer Aided Education and
Training. Knowledge of Case-Based Reasoning,
Fuzzy Logic/ Engineering, C++, and Windows95 a plus.
Both positions require innovative individuals who are self-motivated, able to
work well professionally in a team, able to write and speak well,
and interact well with co-workers and customers.
Send resume to L N K Corporation
6811 Kenilworth Ave, Suite 306
Riverdale, MD 20737-1333
Fax: (301) 927-7193
e-mail: nanak@lnk.com
on resume use code: ITS
L N K is an Equal Opportunity Employer
Laveen Kanal, Ph.D
Prof. Emeritus
Univ. of Maryland
President, L N K
kanal@cs.umd.edu
kanal@lnk.com
Previous6NextTop
Date: Fri, 08 Aug 1997 16:52:30 -0600
From: Tom Warden (TWARD@allstate.com)
Subject: Menlo Park, CA: Applied Research Position - Data Mining
ollowing is a job description for an opening we currently have. No URL
is available.
Applied Research Position - Data Mining
The Allstate Research and Planning Center (ARPC), a unit of the Allstate
Insurance Companies, is forming a group to conduct data mining
research. The group's major objective is to evaluate, with a variety of
techniques, the company's large operational databases for significant
new information and relationships that can be utilized to improve
Allstate's profitability. Areas of interest to the company for this
research include claims fraud detection, underwriting models, pricing,
customer retention, and investment portfolio management.
The group will be comprised of individuals with strong backgrounds in
data analysis, as well as individuals from within Allstate who possess
both insurance knowledge and quantitative skills. Allstate is an Industrial
Partner of the NCSA (National Center for Supercomputing Applications),
whose resources will be available for use by the group.
Qualified candidates will possess an advanced degree (Ph.D. preferred)
in computer science, mathematics, statistics, operations research or a
related field with a concentration in one or more of the following areas:
machine learning, artificial intelligence, data visualization, and/or
computational methods with very large datasets. Candidates should be
inquisitive, creative problem-solvers who are interested in formulating
and implementing solutions to complex business problems. They also
need to work well both independently and in a collaborative environment.
ARPC is located in Menlo Park, California. For over thirty years it has
served as the basic research facility, pioneering many information-driven
innovations, for Allstate, a publicly-traded Fortune 50 company with over
$20 billion of revenues and $70 billion of assets.
Allstate is an Equal Opportunity Employer.
Please send resumes to:
Gary Kerr (gkerr@allstate.com)
or Tom Warden (tward@allstate.com)
Allstate Research & Planning Center
321 Middlefield Road
Menlo Park, CA 94025
Resumes may be faxed to: (415) 324-9347
Previous7NextTop
Date: Tue, 19 Aug 1997 10:47:48 +0200 (MET DST)
From: Yves.Kodratoff@lri.fr
(Yves.Kodratoff@lri.lri.fr)
Subject: EMCSR-98 Symposium: Applications of Data Mining
EMCSR 98 is a large international conference on Information Systems and
Cybernetics held in Vienna (Austria). It comprises several symposia, one of
which is devoted to 'Applications of Data Mining' (DM), chaired by Yves
Kodratoff.
The papers will be refereed by a panel of well-known specialists including
specialists in Visualization, Data Bases, Statistics (and Data Analysis),
Machine Learning, and Information Systems.
All topics relevant to DM are welcome and will be carefully reported upon,
but I would like to especially welcome papers that tend to fill up the gap
between the different components of the KDD community. The following are
examples of such gap-filling-up topics, by no means are they exhaustive.
1 - We seek application papers describing results that have been obtained
by a strong interaction between the miners and the application field
specialist (describe how the interaction took place), applications that
have been facilitated by the understandability of the software used
(describe how understandability is achieved and why it has been so useful),
application successful because they use clever means of selecting
interesting patterns (describe precisely how you defined and measured
interestingness).
It is well-known that the academic value of application papers is sometimes
stretchy. In such a case, if at least one referee acknowledges the
potential of the submission, and if the paper author agrees, I will visit
(preferably by email) the author and spend with him/her the time necessary
to put his/her paper in an acceptable form.
2 - Statistical packages and Neural Nets are great but their results are
difficult to interpret. We thus look for descriptions of works issuing from
the community of statisticians and neural nettists showing how they were
able to improve on the user friendliness of their techniques; how they
manage to help the field specialist to better understand the results of the
statistical packages, how they help the user avoiding wrong interpretations
of the results, in a somewhat automated manner.
3 - Data Base queries are usually well understood, but they are strictly
deductive. Thus, we are looking for description of work issuing from the
Data Base community which describe how they introduced some kind of
inductive, or uncertain reasoning inside the queries.
4 - Visualization is always helpful but tends to consider as obvious that
the user will find interesting patterns through visualization. Thus, we
look for the description of work issuing from the Visualization community
that explain carefully the link between the principles upon which their
software relies, and the interestingness of the patterns visualized.
5 - Symbolic Machine Learning techniques tend to produce understandable
results, but they are hardly scalable to large applications. Thus, we would
like to welcome symbolic ML papers explaining how they scaled their
technique to a large application.
6 - All existing techniques tend to represent field knowledge in an
implicit way, hardly comprehensible to the field specialist. All attempts
to use explicit representation (i.e., directly understandable to the field
specialist) or for providing understandable translations of the encoded
representation are also very welcome.
Please submit 4 copies of a FULL PAPER (NOT a summary) of max. 6 pages
(funt 10, double column).
The dead line for submitting is: Oct. 17th 1997
Send all your submissions to Vienna: EMCSR 98, Oesterrchische
Studiengesellschaft fuer Kybernetik, A-1010 Wien 1, Schottengasse 3
(Austria).
In case you would like some discussion before submitting,
email to me: YK@LRI.FR
Acceptance/rejection announced by: Dec. 5th 1997.
Last version due by: Jan 30th 1998.
For more information about the whole congress look at:
CALL FOR PARTICIPATION
7th IFIP 2.6 WORKING CONFERENCE ON DATABASE SEMANTICS (DS-7)
SEARCHING FOR SEMANTICS: DATA MINING, REVERSE ENGINEERING, ETC.
October 7-10, 1997
Leysin, Switzerland
-----------------------------------------------------------------------
The IFIP 2.6 Working Group has established a tradition
of highly appreciated Data Semantics (DS) conferences, where quality
is preferred over quantity. DS-7, the seventh in the series,
follows the succesful format of the previous conferences : it will be a
four-day live-in working conference with limited attendance and
extensive time for presentations and discussions.
The topics for the 1997 DS Conference focus on those major problems that
enterprises are currently facing: the reverse engineering of old legacy
systems and applications, and the discovery of non-explicit knowledge
hidden in existing data stores. Both issues need to be dealt with
whenever database designers and application managers are committed to
reusing existing data, for performance and economic reasons.
Another major challenge for database/application designers is to be
able to complement enterprise data with data from external sources,
where the corresponding semantics is rarely fully available. Accessing
data via the Web is just an example of input from external autonomous
repositories.
More hot topics are addressed in the contributions listed below.
Demonstrations are also planned to illustrate some of the latest
products in this domain.
Conference Highlights:
We are pleased to host an invited talk by one of the leading figures in
data mining, Prof. JIAWEI HAN, from the Simon Fraser University,
British Columbia, Canada.
An invited talk will also be given by Prof. LETIZIA TANCA, from the
Politecnico di Milano and University of Verona, Italy.
A tutorial on the contribution of natural language processing techniques
for text mining will be given by Dr. MARTIN RAJMAN, of the Swiss Federal
Institute of Technology.
Previous9NextTop
Date: Sat, 9 Aug 1997 12:21:42 -0700 (PDT)
From: 'John R. Koza' (koza@CS.Stanford.EDU)
Subject: GP-98 PhD Student Workshop
FIRST CALL FOR GRADUATE STUDENT
PARTICIPATION IN A GENETIC PROGRAMMING
WORKSHOP AND PRESENTATION SESSIONS AT
GENETIC PROGRAMMING 1998 CONFERENCE (GP-98)
CHAIR: Una-May O'Reilly, MIT Artificial Intelligence Lab
PANEL (to date): David B. Fogel, Natural Selection Inc
David E. Goldberg, University of Illinois
John R. Koza, Stanford University
Una-May O'Reilly, MIT Artificial Intelligence Lab
DATE OF STUDENT WORKSHOP: Tuesday July 21, 1998
LOCATION: Memorial Union Building, 800 Langdon Street,
Madison, Wisconsin, USA (Same as site of the Genetic
Programming 1998 Conference)
DATES OF PRESENTATION SESSIONS: July 22 - 25
(Wednesday - Saturday), 1998 (During GP-98 conference)
DATE FOR SUBMISSIONS: Wednesday, January 21, 1998
(Same date as CFP of GP-98)
Previous10NextTop
Date: Wed, 13 Aug 1997 18:08:29 -0400 (EDT)
From: Russell Greiner (greiner@scr.siemens.com)
Subject: Re: 5th AI and MATH Symposium, 1st CFP
URL:
Fifth International Symposium on
ARTIFICIAL INTELLIGENCE AND MATHEMATICS
---------------------------------------
January 4-6, 1998,
Fort Lauderdale, Florida
APPROACH OF THE SYMPOSIUM
-------------------------
The International Symposium on Artificial Intelligence and Mathematics
is the fifth of a biennial series. Our goal is to foster interactions
among mathematics, theoretical computer science, and artificial
intelligence.
The meeting includes paper presentation, invited speakers, and special
topic sessions. Topic sessions in the past have covered computational
learning theory, nonmonotonic reasoning, and computational complexity
issues in AI; this year, we plan to also include one on DataMining.
INVITED TALKS will be given by
------------------------------
Robert Aumann (Hebrew University, Israel)
Joe Halpern (Cornell University)
Pat Hayes (University of West Florida)
Scott Kirkpatrick (IBM, Yorktown Heights)
William McCune (Argonne National Laboratory)
SUBMISSIONS
-----------
Authors must e-mail a short abstract (up to 200 words) in plain
text format to amai@rutcor.rutgers.edu
by SEPTEMBER 23, 1997,
and either e-mail postscript files or TeX/LaTeX source files
(including all necessary macros) of their extended abstracts
(up to 10 double-spaced pages) to
amai@rutcor.rutgers.edu
or send five copies to
Endre Boros
RUTCOR, Rutgers University
P.O. Box 5062
New Brunswick, NJ 08903 USA
to be received by SEPTEMBER 30, 1997. Authors will be notified of
acceptance or rejection by OCTOBER 31th, 1997. The final versions
of the accepted extended abstracts, for inclusion in the conference
volume, are due by NOVEMBER 30, 1997.
-----
For more information (including sponsors, programme committee, etc.),
please see