KDD Nuggets Index

--
Discovery community, focusing on the latest research and applications.

Contributions are most welcome and should be emailed,
with a DESCRIPTIVE subject line (and a URL, when available) to (kdd@gte.com).
E-mail add/delete requests to (kdd-request@gte.com).

Nuggets frequency is approximately weekly.
Back issues of Nuggets, a catalog of S*i*ftware (data mining tools),
and a wealth of other information on Data Mining and Knowledge Discovery
is available at Knowledge Discovery Mine site, URL http://info.gte.com/~kdd.

-- Gregory Piatetsky-Shapiro (moderator)

********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) *
* and not necessarily of their respective employers (or GTE Laboratories) *
*****************************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It gets harder the more you know. Because the more you find out, the
uglier everything seems. Frank Zappa (b. 1940)
thanks to Osmar Zaiane

Previous 1 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 21 Oct 1996 14:08:40 -0400
From: Gregory Piatetsky-Shapiro (gps0@gte.com)
Subject: BusinessWeek 10/28/96 story on Data Mining in finance

BusinessWeek has this table for how to use high-tech To Woo Customers

-- Discern, via data warehouses, prospective customers' precise tastes
to predict future buying decisions
-- Analyze data on existing customers to determine which are most
profitable and concentrate on them
-- Bind good customers to the firm by customizing such services as
transaction executions and interaction with the company's own models
-- Get customers to buy more services through sophisticated mail and
telephone solicitations by using buying-pattern data

See also a related story on
CAPITAL ONE: BURIED TREASURE IN CREDIT CARDS
at http://www.businessweek.com/1996/44/b349916.htm

and ON THE CUTTING EDGE: Finance firms duke it out
at http://www.businessweek.com/1996/44/b349914.htm

Previous 2 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Subject: Data mining story in Newsweek
Date: Tue, 22 Oct 1996 22:20:12 -0700
From: 'Pedro M. Domingos' (pedrod@pacific.ICS.UCI.EDU)

Hi. This week's Newsweek has quite a dramatic account of how data mining has
taken over credit-card and other debt collection ('Dunning by the Numbers',
Oct. 28, p. 86).

Pedro Domingos

Previous 3 Next Top

>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 21 Oct 96 16:49:07 PDT
From: Steve Minton (minton@ISI.EDU)
Subject: Quinlan JAIR article, 'Learning First-Order Definitions...',

Readers of this mailing list may be interested in the following article
which was just published by JAIR:

Quinlan, J.R. (1996)
'Learning First-Order Definitions of Functions',
Volume 5, pages 139-161.

Available in Postscript (528K) and compressed Postscript (156K).
For quick access via your WWW browser, use this URL:
http://www.cs.washington.edu/research/jair/abstracts/quinlan96b.html
More detailed instructions are below.

Abstract: First-order learning involves finding a clause-form
definition of a relation from examples of the relation and relevant
background information. In this paper, a particular first-order
learning system is modified to customize it for finding definitions of
functional relations. This restriction leads to faster learning times
and, in some cases, to definitions that have higher predictive
accuracy. Other first-order learning systems might benefit from
similar specialization.

The article is available via:

-- comp.ai.jair.papers (also see comp.ai.jair.announce)

-- World Wide Web: The URL for our World Wide Web server is
http://www.cs.washington.edu/research/jair/home.html
For direct access to this article and related files try:
http://www.cs.washington.edu/research/jair/abstracts/quinlan96b.html

-- Anonymous FTP from either of the two sites below.

Carnegie-Mellon University (USA):
ftp://ftp.cs.cmu.edu/project/jair/volume5/quinlan96b.ps
The University of Genoa (Italy):
ftp://ftp.mrg.dist.unige.it/pub/jair/pub/volume5/quinlan96b.ps

The compressed PostScript file is named quinlan96b.ps.Z (156K)

-- automated email. Send mail to jair@cs.cmu.edu or jair@ftp.mrg.dist.unige.it
with the subject AUTORESPOND and our automailer will respond. To
get the Postscript file, use the message body GET volume5/quinlan96b.ps
(Note: Your mailer might find this file too large to handle.)
Only one can file be requested in each message.

For more information about JAIR, visit our WWW or FTP sites, or
send electronic mail to jair@cs.cmu.edu with the subject AUTORESPOND
and the message body HELP, or contact jair-ed@ptolemy.arc.nasa.gov.

Previous 4 Next Top

>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Tue, 15 Oct 1996 15:39:06 -0700
From: Ronny Kohavi (ronnyk@starry.engr.sgi.com)
Subject: New Release of SGI MLC++

SGI MLC++ 2.0 is now available on our web page
http://www.sgi.com/Technology/mlc/

MLC++, the machine learning library in C++, is available both in
source code and in object code for Silicon Graphics (IRIX 5.3 or 6.2).
Because of the substantial effort that went into it at SGI, new
releases of MLC++ are restricted for research use.

MineSet 1.1, SGI's data mining and visualization product now uses
MLC++ as the basis for data mining, as part of a fully integrated
mining and visualization environment.

What's new in this release (for a longer version see our web pages):

- The distribution is compiled in FAST mode, which is about
30% faster.

- The utilities distribution is given using dynamically shared
objects that save space.

- Persistent categorizers are now supported. Persistent
decision trees and Naive-Bayes are implemented. This allows
a categorizer to be saved and later read in.

- Decision trees were improved as follows:

- Decision trees now provide pruning in a way similar to C4.5.
The MC4 inducer defaults to a setting very similar
to C4.5's setting.

- Gain ratio is supported as a splitting criterion. This is
implemented exactly as the C4.5 version (with all the hacks),
so that except for unknown handling and tie breakers, the
unpruned trees are the same.

- Improved output for MineSet(TM) Tree Visualizer.

- Naive-Bayes changes:

- Naive-Bayes now supports Laplace corrections.

- Naive-Bayes now outputs MineSet(TM) Evidence Visualizer format files.

- The biasVar utility has been added for the bias-variance
decomposition based on Kohavi & Wolpert ICML-96 paper.

- NBTree described in Kohavi, KDD-96 is available.

- Unlabelled instance lists are partially supported. The syntax
is to say ``nolabel'' in the names file.

--

Ronny Kohavi (ronnyk@sgi.com, http://robotics.stanford.edu/~ronnyk

Previous 5 Next Top

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Tue, 22 Oct 1996 16:18:40 -0700
From: Gonzales Consulting (gcs@henge.com)
Reply-To: gcs@henge.com
Organization: Gonzales Consulting

Washington D. C.=20

This senior manager will provide direction to a major leader in the mortg=
age servicing industry to build strategic applications that mine large qu=
antities of information. Results will be presented to a variety of audie=
nces in multiple formats and media. The successful candidate will:

Evaluate and select state-of-art hardware, software and data sets to be u=
sed in corporate data mining activities.

Recommend and implement appropriate research methodologies including neur=
al networks, classification and regression, risk management, memory-based=
reasoning, genetic algorithm, k-nearest neighbor, and standard statistic=
al analysis. Integrate data mining results into data sets that support e=
xisting and new decision support systems.

Provide leadership in interpreting results of multiple tools and techniqu=
es. Recommend and implement innovative presentation techniques.

Develop strategies and tactics necessary to design, develop, and implemen=
t applications to support business needs using selected tools and techniq=
ues.

Ten years developing and implementing large-scale financial and behaviora=
l scoring models is desired as is a Ph.D. in a quantitative discipline. =
Exceptional oral and written communications, project management and asses=
sment skills are required. Knowledge of the mortgage finance industry is=
a plus. An excellent compensation and benefits package is available inc=
luding, performance based bonus, profit sharing, major medical and reloca=
tion allowance.

Please forward resume to:

GCS - Att. LO
Fax# - (303) 861-1780
e-mail - gcs@henge.com

Previous 6 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: 'Max Bramer' (bramerma@sis.port.ac.uk)
Organization: University of Portsmouth
Date: Fri, 25 Oct 1996 10:01:39 +0100
Subject: Research Studentships at the University of Portsmouth

UNIVERSITY OF PORTSMOUTH

DEPARTMENT OF INFORMATION SCIENCE

RESEARCH STUDENTSHIPS

Applications are invited for research studentships in the
Department of Information Science. These are internally-funded
posts available for three years to commence as soon as possible.

Research students will receive a 'bursary' of stlg6,000 (stlg7,000
for students who are aged 25 or over at the time of appointment), in
addition to the payment of the University's tuition fees and standard
expenses for equipment, software, travel etc. The overall value of the
award is approximately stlg10,000 per annum. (The level of the bursary
quoted above is for session 1996/7. It is reviewed annually.)

Students are sought in the Department's main research areas:
Artificial Intelligence, Human-Computer Interaction, Software
Engineering and Computer-Aided Learning.

Applicants should normally have at least an upper second class
honours degree or the equivalent in computer science or another
relevant subject and must be citizens of one of the member states of
the European Union. They should be free to start by January 1997 at
the latest.

Applicants should send a brief CV plus outline details of their
proposed project to:

Professor M.A.Bramer
Department of Information Science
University of Portsmouth
Locksway Road
Milton
Southsea
PO4 8JF
England

to arrive by Friday November 22nd at the latest.

Please note that paper submissions either to the above address or by
fax to 01705-844006 would be preferred to email ones if possible.

For an informal discussion prior to application, contact
Professor Max Bramer, Head of Department, either by telephone
(01705-844444) or by electronic mail (bramerma@sis.portsmouth.ac.uk).

Further information about the department is available from the
World-Wide Web at http://www.sis.port.ac.uk
_______________________________________________________

Professor Max Bramer
Department of Information Science
University of Portsmouth
Milton, Southsea PO4 8JF, England
Tel: +44-(0)1705-844444 Fax: +44-(0)1705-844006
email: bramerma@sis.port.ac.uk

Previous 7 Next Top

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: 'Robert Gulliver' (gulliver@ima.umn.edu)
Subject: Data Mining workshop Nov. 18-20 at IMA, Minneapolis
Date: Wed, 23 Oct 1996 09:18:33 -0500 (CDT)

see http://www.ima.umn.edu/hpc.f4.html
workshop{Data Mining and Industrial Applications}
{November 18--20, 1996}{George Cybenko and Avner Friedman}

Data mining is becoming increasingly important in industries where
one would like to make decisions such as to mail or not to mail a
catalog, how to maximize
customer's satisfaction, what message to send on the networks to specific
groups of callers, {it etc.} The modeling issues combine methods of pattern
recognition, computer science and statistics.

Given a database, one would like to design partitions that give an
accurate description; feature analysis is required to determine where are
the information-bearing variables; non-parametric techniques and neural
networks may possibly be used to achieve very high insight. The goal of
data mining is to achieve predictive modeling, based on accuracy and
insight.

This period of concentration will bring together researchers from
industry and university in order to (i) identify the current and future
problem areas, (ii) review the mathematical and statistical approaches
which are presently being used, and (iii) discuss and determine which
research directions would be most promising.

date{Monday, November 18}

centerline{f Talks today are in Lecture Hall EE/CS 3-180}
alk{8:45 am}{Registration and Coffee}{}{Reception Room EE/CS 3-176}{}
space{-3ex}
alk{9:15 am}{Welcome and Orientation}{}
{A.~Friedman, R.~Gulliver, G. Cybenko}{} space{-5ex}
alk{9:30 am}{George Cybenko}{Dartmouth College}
{Introductory Remarks}{}
alk{10:00 am}{Coffee Break}{}{Reception Room EE/CS 3-176}{}
space{-3ex}
alk{10:30 am}{Chid Apte}{IBM Watson Research Labs}
{Data Mining and its Industrial Applications}
{{it Abstract}:
Recent advances in the areas of machine learning, pattern recognition,
and statistics, coupled with the availability of high volume data in
business and industry, have given rise to a new level of interest in
the applications of knowledge discovery and data mining. Using
analytical methods for pattern extraction, clustering, and data
modeling, it is possible to extract useful information from commerical
data and put that information to new and innovative uses to further a
business goal or strategy. The technical underpinnings of many of
these techniques will be discussed, and some key industrial
application scenarios highlighted.
}
space{3ex}
alk{2:00 pm}{Daryl Pregibon}{AT & T Bell Research}
{To be announced}{}
centerline{f Workshop discussion groups will meet at
times to be arranged.}
alk{4:00 pm}{IMA Tea (and more!)}{}{Vincent Hall 502 (The IMA Lounge)}{}
space{-5ex} A variety of appetizers and beverages will be served.

date{Tuesday, November 19}

centerline{f Talks today are in Lecture Hall EE/CS 3-180}
alk{9:00 am}{Coffee}{}{Reception Room EE/CS 3-176}{}
space{-3ex}
alk{9:30 am}{Mark Embrechts}{Rensselaer Polytechnical Inst.}
{To be announced}{}
alk{11:00--11:30}{Bala Iyer}{IBM Santa Teresa Labs}
{To be announced}{}
alk{2:00 pm}{Vipin Kumar}{University of Minnesota}
{Parallel Data Mining Algorithms}
{{it Abstract}:
During the last decade, we have seen an explosive growth in database
technology and amount of data we collected. Advances in data collection,
use of bar codes in commercial outlets and the computerization of business
transactions have flooded us with lots of data, and generated an urgent
need to analyze this data to extract more intelligent and useful
information. Data mining is the efficient and possibly unsupervised
discovery of interesting, useful and previously unknown patterns from this
data. Common patterns of interest include classification, associations,
clustering and sequential patterns. In this talk, we will present
parallel algorithms to discover classification trees and association rules.

We present parallel formulations of classification-rule-learning
algorithm based on induction. We will present two basic parallel
formulation, one is based on Synchronous Tree Construction Approach
and the other is based on the
Partitioned Tree Construction Approach. We discuss the advantages and
disadvantages of using these methods and propose a hybrid method that
employs the good features of these methods. We will also talk about how to
handle continuous attributes efficiently for this task.

We also discuss two parallel formulations, the count distribution method and
the data distribution method, for the computation of association rules. The
count distribution method scales with data size, but does not scale with
main-memory usage. The data distribution method is supposed to scale with
data size and main memory, but suffers from high communication overhead
and duplicated work. We will present a new technique, that is an improvement
of the data distribution method. This method scales with data size and main
memory, and it does not incur high communication overhead and does not
have a problem with duplicated work.

This is joint work with E. Han, G. Karypis, A. Srivastava and V. Singh.
}
space{3ex}
centerline{f Workshop discussion groups will meet at
times to be arranged.}

date{Wednesday, November 20}

centerline{f Talks today are in Lecture Hall EE/CS 3-180}

centerline{f Today's speakers will include the following: }

alk{Time TBA}{Simon Kasif}{Johns Hopkins University}
{To be announced}{}
alk{Time TBA}{Gregory Piatetsky-Shapiro}{GTE Laboratories, Waltham}
{Developing Industrial Data Mining and
Knowledge Discovery Applications: an Overview of Issues}
{{it Abstract}:
The rapid growth of business databases has overwhelmed
traditional, interactive approaches to data analysis and has created a
need for a new generation of tools for intelligent and automated
discovery in data. This paper surveys some industrial applications of
this emerging field, called data mining and knowledge discovery. We
define the goals, tasks, and methods of data mining, examine the
knowledge discovery process and the tools that support it, look at
some representative applications, and discuss the challenges for a
successful deployment of data mining applications and their adoption
by business users.
}
space{3ex}
alk{Time TBA}{Shunji Matsumoto}{Fujitsu, Limited}
{To be announced}{}
alk{Time TBA}{Alexander Morgan}{General Motors R & D Center}
{To be announced}{}

====================================================================
| | |
| Robert Gulliver | phone: (612) 624-6066 |
| IMA Associate Director | fax: (612) 626-7370 |
| Professor of Mathematics | e-mail: gulliver@ima.umn.edu |
| 514 Vincent H, 206 Church St. SE | home: 20 Melbourne Ave. SE |
| University of Minnesota | Minneapolis, MN 55414 |
| Minneapolis, MN 55455 USA | (612)379-9103 |
====================================================================

Previous 8 Next Top

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: gordon@AIC.NRL.Navy.Mil
Date: Wed, 16 Oct 96 11:28:52 EDT
Subject: Call for Workshop Proposals
Content-Length: 3435

CALL FOR WORKSHOP PROPOSALS

Fourteenth International Conference on Machine Learning

July 12, 1997

Full and half day workshops will be held on Saturday, July 12,
1997 following the technical program of the Fourteenth
International Conference on Machine Learning (ICML-97).
Workshop proposals are invited in all areas of machine learning.
Workshop attendees will be required to register for the main ICML-97
conference. There will be an additional fee for workshop attendance.

Proposals for workshops should be a maximum of three (3) pages
in length, and should contain:

o A workshop title.
o A technical description of the workshop and its objectives.
o An explanation of why the workshop is currently of interest
to the machine learning community.
o A description of the workshop format, including invited speakers,
panels, and discussion sessions. Please include the length of
time planned for the workshop and a proposed schedule.
o A description of the review and paper selection process that
will be followed.
o The names, postal and email addresses, and phone numbers of
the workshop organizers, plus identification of the chair and/or
key workshop contact.
o A proposed limit, if any, on workshop attendance.
o A list of researchers and/or research groups that work in the
area and might be interested in attending the proposed workshop
(this list need not be included in the 3 page limit).

Please send workshop proposals and related inquiries, preferably
by electronic mail in plain ASCII text, to:

Diana Gordon
Workshop Chair
EMAIL: gordon@aic.nrl.navy.mil
Naval Research Laboratory, Code 5514
4555 Overlook Avenue, S.W.
Washington, D.C. 20375-5337 USA

Proposals should be sent as soon as possible, but must be received
by December 4, 1996. Notification of acceptance or rejection will
be mailed to the organizer by January 6, 1997. Descriptions of
accepted workshops will be made available via the World-Wide Web (see
address below). Organizers of accepted workshops will be responsible
for preparing and distributing a Call for Papers and Participation
for their workshops. The ICML-97 Workshops Chair requires a copy of
each accepted Workshop CFP by January 20, 1997 for display on
the ICML-97 WWW site. The selection of papers, participants, and
other workshop organizational matters such as the assembly of
camera-ready copy of the Workshop proceedings are the responsibility
of the Workshop organizers. ICML-97 will be responsible for local
arrangements (i.e., rooms, equipment), collection of workshop
registration fees, and printing and delivery of the workshop
proceedings. ICML-97 will reimburse Workshop organizers
for reasonable and limited costs (e.g., postage for submitting
camera-ready proceedings, copying).

TIMETABLE SUMMARY

Submission of workshop proposals: December 4, 1996
Notification of acceptance or rejection: January 6, 1997
Accepted workshop CFPs due: January 20, 1997
Camera-ready proceedings due: May 28, 1997 (tentative)
Workshops: July 12, 1997

--------------------------------------------------------------
More detailed information on the workshops and ICML-97 may be
found at the ICML-97 Web site:

http://cswww.vuse.vanderbilt.edu/~mlccolt/icml97/index.html

Previous 9 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~