--
Data Mining and Knowledge Discovery community, focusing on the
latest research and applications.
Submissions are most welcome and should be emailed, with a
DESCRIPTIVE subject line (and a URL) to gps.
Please keep CFP and meetings announcements short and provide
a URL for details.
KD Nuggets frequency is 3-4 times a month.
Back issues of KD Nuggets, a catalog of data mining tools
('Siftware'), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site
at
********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We've all heard that a million monkeys banging on a million typewriters
will eventually reproduce the entire works of Shakespeare. Now, thanks
to the Internet, we know this is not true.
--Professor Robert Silensky
(thanks to Sarah Hedberg for sending this) Previous1NextTop
[Kamran Parsaye asked me to include in KD Nuggets the following letter discussing
his reservations about the KD Cup (see www.kdnuggets.com/kd-cup.txt).
While I agree with his arguments that there is a lot more to Knowledge Discovery
that clustering and classification, I think there is still significant value in the
present competition, as indicated by the large number of participants who
expressed interest in it. With lessons from KDD-Cup 97, a better competition
could be devised in the future. GPS]
Date: Tue, 10 Jun 1997 03:07:28 -0500 (CDT)
TO: Gregory Piatetsky-Shapiro, Editor KD Nuggets
FROM: Kamran Parsaye datamine@ix.netcom.com
(IDI)
RE: Emerging Standards and the KDD Cup
DATE: June 9, 1997
Following our email exchanges about the 1997 KDD-Cup,
I assume you know that I have a number reservations about
the completeness and consistency of the logic used in
the KDD Cup-Document. As you pointed out in your message,
the 1997 Cup itself may be seen as a passing issue, but I
still think we should get the logic straight since it
may give rise to some form of 'de facto industry standard'
that may endure.
The data mining industry is in its early stages of formation
now and we should all strive for consistency and clarity.
This is particularly important because data mining has a
far more complicated theory than its other decision
support counterparts -- OLAP and query processing.
We all need to exercise caution when dealing with the
fundamental issues of data mining and since KDD is a
publicly respected organization, I would like to
formally suggest a revision of the logic used in
the Cup-Document as posted in the last issue of the
KDD-Nuggets. I have sketched the beginnings of my
'list of reservations' below, and I welcome your comments.
The first issue is that the Cup-Document seems to equate
knowledge discovery with simple attribute-based approaches
to clustering. As you well know, this is far from complete
or satisfactory. Towards the end of the document some
references are made to associations, time-series and
other issues, but not clearly enough to achieve
consistency. I do agree with your comment that that
dealing with these other patterns would have taken a
lot of effort, but that does not change the fact that
ignoring them will leave a large gap in logic.
Affinity analysis, trend analysis, comparative analysis,
etc. are all essential to discovery and in my opinion
ignoring them is similar to selectively ignoring two
thirds of the periodic-table of the elements in a basic
review of chemistry. These patterns are as fundamental
to the worlds of data and knowledge as the other two
thirds of the periodic-table are fundamental to the
world of the elements.
The second (and effort-related) issue is that the
Cup-Document has spent significant energy to deal
with engineering details that should be considered as
'absolute pre-requisites' and not as criteria.
Of course, users expect that any reasonable system
should deal with both numbers and constants, should
access the database directly, and should run
client-server, etc. These pre-requisites are absolutely
necessary, but will become trivial in a year's time when
everyone has gotten around to engineering them.
Since they do not constitute a fundamental challenge,
focusing on them takes away from the attention for
the real issues. I suggest just listing them in an
appendix as basic pre-requisites so the important
issues can be clarified.
The third issue is fundamental and has to do with
the most important aspect of knowledge discovery --
i.e. the need to produce 'correct' results.
Practically speaking, this has become a really serious
issue now that most of the world has discovered the
need for multi-dimensionality in decision support.
There is no question that much of the world's data
is multi-dimensional -- as of last year, 100% of the
Fortune 500 companies were using some form of a
multi-dimensional data analysis system. And, as I
showed in my last article in Database Programming
and Design in February 1997
(available at
a set
of simple and widespread tables with about 20
records each can quickly lead many well known
data mining approaches to confusion.
Hence 100% of the Fortune 500 companies are open to
confusion with the Cup-Document. If a Fortune 500
company can not trust the results it gets from a
system from the analysis of a simple 20 record sales
database anyway, there is little to discuss about
the other issues.
Talking about Fortune 500 companies, the fourth issue
has to do with user-democracy. The Cup-Document, as is,
seems to reflect an 'analyst's view' of the world and
reads like a constitution written by-the-analyst and
for-the-analyst. The mass of business users seem to
have had no representative voice or vote in it --
it would be a safe bet to assume that over 90% of
the authors were analysts. This can be the topic of
a lengthy discussion that deserves its own forum --
but should not be ignored. This also implies a list
of follow-on issues that have to do with
democratic information distribution, the web, etc.
The fifth issue is fundamental and has to do with
'discovery power', as distinct from the first issue
above. I think the concept of what a system can
discover and tell a user about should receive more
attention in the Cup-Document. The discovery power
of a system (i.e. what the system can tell a user about)
directly impacts the benefits the user will receive.
Hence discovery power is one of the key issues for KDD
and deserves as much attention as almost any other
topic -- correctness coming first, of course.
I will discuss this in more detail in a forthcoming article
on Patterns of Knowledge that I will send you later, and
will also later post at
These five issues are just the beginnings of my
'list of reservations' about the current version of
the Cup-Document and a large number of other issues
have not even been mentioned yet -- with due
acknowledgment to the limitations of space and time.
The fact that we could discuss each of these in far
greater detail simply shows how much caution and
depth is needed in dealing with data mining at
a serious level.
I do hope that this partial 'list of reservations' will
begin a discussion that will shed light on the need for
a rich context of discourse for knowledge discovery.
Thanks,
Kamran.
Previous2NextTop
Date: Tue, 10 Jun 1997 19:25:49 +0100
From: Alex Goodall (Alex@aiintelligence.com)
Subject: AI Information Bank now On-Line
**** Announcement ****
Apologies if you receive mulitple copies of this from different sources.
Please distribute as you see fit.
THE AI INFORMATION BANK LAUNCHED
9th June 1997
-------------
AI Intelligence is pleased to announce that the AI Information Bank is
now available via
The Bank is set to become the most comprehensive resource on the Web
covering commercial Artificial Intelligence (AI). It lists products and
suppliers alphabetically, and includes pages covering specific
technologies, such as:
Knowledge-Based Systems, Data Mining, Neural Nets, Fuzzy Logic, Case-
Based Reasoning, Genetic Algorithms and more.
Many people we speak with are expressing the view that there is a
resurgence of commercial interest in AI and its associated technologies.
The timing for the launch of the Bank is therefore most appropriate.
The AI Information Bank was conceived and designed by Alex Goodall and
Charles Langley. It is being made available as a service from
AI Intelligence - publisher of the AI Watch newsletter and the
AI Perspectives reports.
If you are a supplier wishing to have your information displayed
in the Bank, please look at
Previous3NextTop
Date: Sun, 08 Jun 1997 22:12:13 -0400
From: 'Michael J. A. Berry' (mjab@ent.mrj.com)
Subject: Data Mining Techniques for Marketing, Sales and Customer
Support
I believe that readers of this list may be interested in our recently
published book, 'Data Mining Techniques for Marketing, Sales and Customer
Support.' This book has just been published by John Wiley & Sons. The
primary audience is the technically literate marketing manager, but there is
much to interest data mining practitioners as well.
Previous4NextTop
Date: Fri, 06 Jun 1997 15:12:24 +0200
From: Federico Pietro 357382/IF (federic@dei.unipd.it)
Subject: KDD and Data Mining in Italian
We are a team of students at Padova University in Inforamtics
Engineering, you may be interested in adding to KDD nuggets a work on
Data Mining and Knoledge discovery in Italian we did for a course in
Computer Networks. There is an overview on KDD techniques, on assocation
rule discovery, episode matching and web mining in postscript format.
There is also a list of links to pages and articles we used.
Previous5NextTop
[The following is a commercial announcement. GPS]
Date: Thu, 22 May 1997 15:12:57 -0500
From: 'J.P.Brown' (jpbrown@hal-pc.org)
Subject: An Upgrade
This is just a note to emphasize that there are some new ideas around.
Neither the sterile manipulation of data files, nor the knee-jerk
application of 19th Century statistical short cuts will produce
management-friendly Conclusions and Recommendations.
The two things that are most necessary to give aid and comfort to
management are:
1: Convincing evidence that the information from the past is going
to be allowed to tell its tale, Objectively.
2: Reassurance that effective and continuing efforts are being made
to detect incipient Change.
The goal of absolute objectivity may be too much to expect in the
business world, but my basic principle is to use AutoClassification as
the first step in Business Analysis. This is a process which uses raw
data to 'predict' some key results (that you already know). The system,
and the classification which does this successfully, can then be used to
make real predictions.
Of course, every system to be used in this way needs an on-going Change
Alarm, so that necessary adjustments can be made. Management should
appreciate a system that is continually checking itself, and which would
also provide an objective early warning of any major disruption.
This is all part of SuperInduction which can be seen at
I would be glad to discuss (defend) this application.
J.P.
Previous6NextTop
Date: Sun, 8 Jun 1997 21:39:30 -0700
From: Ronny Kohavi (ronnyk@starry.engr.sgi.com)
Subject: Silicon Graphics' MineSet version 1.2
Silicon Graphics' MineSet version 1.2 Increases Functionality
with Web and PC Connectivity.
-------------------------------------------------------------
Silicon Graphics released MineSet(TM) version 1.2 on 3 June 1997.
MineSet 1.2 is the newest version of its flagship data mining suite of
integrated visual and analytical data mining tools. Users can now
interact with the full power of MineSet from their PCs using
OpenGL(R)-enabled X-servers, such as Hummingbird Communications'
Exceed3D. MineSet 1.2 also enables web launching of MineSet tools on
Silicon Graphics and PCs with OpenGL X-servers for easier interface
and pre-canned mining operations. MineSet enables users to discover
previously unknown patterns, hidden opportunities and trends by
extracting information from data. MineSet automatically mines the
data using powerful algorithms and allows analysis through intuitive,
multi-dimensional visual tools.
MineSet is unique in the industry because of its integration of data
access, data transformation, data mining and visual data mining. The
integration and ease-of-use measurably increases decision support
productivity by bringing exploratory data analysis methods to
analysts. Users such as brand managers, production managers, market
development managers and data analysts are empowered with the ability
to rapidly gain new insight, allowing them to easily transform
consumer, demographic and industry data into actionable strategic
decisions.
MineSet 1.2 adds three new capabilities:
1. Users can interact with the full power of MineSet through
openGL-enabled X-servers. On PCs, such capability is provided
by Hummingbird Communications's Exceed3D. Other workstation
vendors provide similar X-server solutions for their platforms.
2. MineSet visualizations can be launched through a web browser.
The visualizations run natively on SGI platforms or through an
OpenGL-enabled X-server.
3. Users can script mining and visualization operations.
The scripts can then be invoked by 3rd party applications.
and download a 30-day free copy of MineSet (under more information).
--
Ronny Kohavi (ronnyk@sgi.com)
Engineering Manager, Analytical Data Mining.
Previous7NextTop
Date: Tue, 10 Jun 1997 11:46:27 +0100
From: Marco Ramoni (M.Ramoni@open.ac.uk)
Subject: Software Available: Bayesian Knowledge Discoverer (Beta)
BAYESIAN KNOWLEDGE DISCOVERER
Version 0.1 (Beta)
This is to announce the availability of the Beta release of Bayesian
Knowledge Discoverer (BKD) version 0.1.
BKD is a program designed to extract Bayesian Belief Networks (BBNs)
from (possibly incomplete) databases. It is based on a new estimation
method called Bound and Collapse and its extensions to model
selection.
We are looking for candidates to a position of 'professor-researcher' at
ESIEA Group's research center. The position will be held in Laval a nice
French provincial town situated in the center of Brittany.
PhD requested, teaching in the French language compulsory, research on Data
Mining.
Both applied (expertise in data mining applied to problems of local
industries) and pure research, particularly: text mining and/or computer
security (= mining security backlogs).
Research work performed under the leadership of Yves Kodratoff.
Candidates should apply to the center's director: Mme A.M. KEMPF, ESIEA, 9,
rue V sale, 75005 Paris, email: am.kempf@esiea.fr.
Previous9NextTop
Date: 11 Jun 1997 16:04:15 +0000
From: 'Ed Babb' (Ed_Babb@parsys.co.uk)
Subject: JOB IN DATA MINING!
PARSYS is a leading European supplier of parallel systems and technology. They
are currently the lead partner in a large multinational ESPRIT project aimed at
building a parallel data mining file server and client. They are looking for
people interested in data mining systems and with experience of the enabling
technologies of user interfaces, databases and machine learning. Knowledge of C
programming is essential. Knowledge of PROLOG, JAVA and Visual Basic would be
useful.
At least a 2.1 degree in Computing, Artificial Intelligence or equivalent is
needed. In addition, several years relevant experience is desirable. Salary up
to 35K pounds depending on experience.
Please post your CV stating current salary to: Ed Babb, PARSYS LTD, Boundary
House, Boston Road, Hanwell, London, W7 2QE, UK. Alternatively email him on
ed@parsys.co.uk
if you wish to make any brief informal enquires.
DEPARTMENT OF COMPUTER SCIENCE
UNIVERSITY OF AUCKLAND
Research Studentship in Knowledge Discovery and Datamining
The Artificial Intelligence Group at the University of Auckland has a
vacancy for a
Research Assistant
Applications are invited for an PhD studentship, within the Artificial
Intelligence Group, at the Department of Computer Science, University
of Auckland New Zealand. The three-year studentship is for the
investigation of making intelligent data analysis techniques usable by
novice data owners. The successful candidate will have a tax-free
scholarship of NZ$12,000 dollars and will be expected to work on a
research project on 'Novice Tools for Knowledge Discovery'. Student
fees will also be covered for NZ, French, or German students. An
additional teaching fellowship of NZ$4800 (taxable) might also be
provided by the Computer Science Department.
The AI Group at Auckland conducts research into constraint
satisfaction, datamining, machine learning, planing, and spatial
reasoning.
Applicants should have at least a Masters in Computer Science or
related subject, with a good background in Artificial Intelligence
or Statistics. Please submit a CV as soon as possible, but
not later than 31 July 1997, to
Dr P Riddle,
Department of Computer Science,
University of Auckland,
Private Bag 92019, Auckland,
New Zealand.
Phone Dr Riddle on (64) (9) 373-7599 (ext 7093), fax at (64) (9)
373-7453 or send email to (pat@cs.auckland.ac.nz)
if you wish to make
an informal enquiry.
General information on PHD studies at Auckland University can be found
at:
Previous11NextTop
Date: Tue, 10 Jun 1997 16:16:49 +0200
From: David Leake (leake@cs.indiana.edu)
(by way of Enric Plaza i Cervera)
Subject: ICCBR-97 - 2nd Call for Participation
Second Call for Participation
ICCBR-97
Second International Conference on Case-Based Reasoning
Brown University
Providence, Rhode Island, July 25-27, 1997
IMPORTANT DEADLINES:
The regular registration deadline is June 18
The conference hotel and dormitory room blocks will be held until June 24
(Note that all hotel-style rooms at Brown are now sold out, but
dormitory rooms are still available.)
- Conference Overview
- Registration Form
- Schedule outline and list of accepted papers
- Travel and Accommodations Information
- Program Committee and Sponsors
Previous12NextTop
From: Eric Horvitz (horvitz@MICROSOFT.com)
Subject: UAI '97 Conference and Full-Day Course Programs
Date: Friday, June 13, 1997 4:15 PM
Thirteenth Conference on Uncertainty in Artificial Intelligence
(UAI '97)
Model Reduction Techniques for Computing Approximately Optimal Solutions
for Markov Decision Processes
Thomas Dean, Robert Givan and Sonia Leach
Incremental Pruning: A Simple, Fast, Exact Algorithm for Partially
Observable Markov Decision Processes
Anthony Cassandra, Michael L. Littman and Nevin L. Zhang
Region-based Approximations for Planing in Stochastic Domains
Nevin L. Zhang and Wenju Liu
Break 10:30-11:00am
* Panel Discussion: 11:00-12:00am
Lunch 12:00-1:30pm
** Plenary Session IV: Foundations
1:30-3:00pm
Two Senses of Utility Independence
Yoav Shoham
Probability Update: Conditioning vs. Cross-Entropy
Adam J. Grove and Joseph Y. Halpern
Probabilistic Acceptance
Henry E. Kyburg Jr.
Estimation of Effects of Sequential Treatments By Reparameterizing
Directed Acyclic Graphs
James M. Robins and Larry Wasserman
Network Fragments: Representing Knowledge for Probabilistic Models
Kathryn Blackmond Laskey and Suzanne M. Mahoney
Correlated Action Effects in Decision Theoretic Regression
Craig Boutilier
A Standard Approach for Optimizing Belief-Network Inference
Adnan Darwiche and Gregory Provan
Myopic Value of Information for Influence Diagrams
Soren L. Dittmer and Finn V. Jensen
Algorithm Portfolio Design Theory vs. Practice
Carla P. Gomes and Bart Selman
Learning Belief Networks in Domains with Recursively Embedded Pseudo
Independent Submodels
J. Hu and Yang Xiang
Relational Bayesian Networks
Manfred Jaeger
A Target Classification Decision Aid
Todd Michael Mansell
Structure and Parameter Learning for Causal Independence and Causal
Interactions Models
Christopher Meek and David Heckerman
An Investigation into the Cognitive Processing of Causal Knowledge
Richard E. Neapolitan, Scott B. Morris, and Doug Cork
Learning Bayesian Networks from Incomplete Databases
Marco Ramoni and Paola Sebastiani
Incremental Map Generation by Low Cost Robots Based on
Possibility/Necessity Grids
M. Lopez Sanchez, R. Lopez de Mantaras, and C. Sierra
Sequential Thresholds: Evolving Context of Default Extensions
Choh Man Teng
Score and Information for Recursive Exponential Models with Incomplete
Data
Bo Thiesson
Fast Value Iteration for Goal-Directed Markov Decision Processes
Nevin L. Zhang and Weihong Zhang
__________________________________________________________
Sunday, August 3, 1997
Invited talk IV: Gaussian processes - a replacement for supervised
neural networks?
David J.C. MacKay
8:20-9:20am
* Plenary Session V: Applications of Uncertain Reasoning
9:20-10:40am
Bayes Networks for Sonar Sensor Fusion
Ami Berler and Solomon Eyal Shimony
Image Segmentation in Video Sequences: A Probabilistic Approach
Nir Friedman and Stuart Russell
Lexical Access for Speech Understanding using Minimum Message Length
Encoding
Ian Thomas, Ingrid Zukerman, Bhavani Raskutti, Jonathan Oliver, David
Albrecht
Perception, Attention, and Resources: A Decision-Theoretic Approach to
Graphics Rendering
Eric Horvitz and Jed Lengyel
* Break 10:40-11:00am
* Panel Discussion: 11:00-12:00am
Lunch 12:00-1:30pm
** Plenary Session VI: Developments in Belief and Possibility
1:30-3:00pm
Decision-making under Ordinal Preferences and Comparative Uncertainty
D. Dubois, H. Fargier, and H. Prade
Inference with Idempotent Valuations
Luis D. Hernandez and Serafin Moral
Corporate Evidential Decision Making in Performance Prediction Domains
A.G. Buchner, W. Dubitzky, A. Schuster, P. Lopes P.G. O'Donoghue, J.G.
Hughes, D.A. Bell, K. Adamson, J.A. White, J. Anderson, M.D. Mulvenna
Exploiting Uncertain and Temporal Information in Correlation
John Bigham
Break 3:00-3:30am
** Plenary Session VII: Topics on Inference
3:30-5:00pm
Nonuniform Dynamic Discretization in Hybrid Networks
Alexander V. Kozlov and Daphne Koller
Robustness Analysis of Bayesian Networks with Local Convex Sets of
Distributions
Fabio Cozman
Structured Arc Reversal and Simulation of Dynamic Probabilistic Networks
If you have questions about the UAI '97 program, contact the UAI '97
Program Chairs, Dan Geiger and Prakash P. Shenoy. For other questions
about UAI '97, please contact the Conference Chair, Eric Horvitz.
UAI '97 Conference Chair
Eric Horvitz (horvitz@microsoft.com)
Microsoft Research, 9S
Redmond, WA, USA