Knowledge Discovery Nuggets 97:19, e-mailed 97-06-07

KDD Nuggets Index

To KD Mine: main site for Data Mining and Knowledge Discovery.

To subscribe to KDD Nuggets, email to kdd-request

Past Issues: 97 Nuggets, 1996 Nuggets, 1995 Nuggets, 1994 Nuggets, 1993 Nuggets

Knowledge Discovery Nuggets 97:19, e-mailed 97-06-07

News:
* I. Parsa, KDD-97 Knowledge Discovery and Data Mining Tools Competition
* GPS, NY Times: Mining the Cosmos data for Extraterrestrials signs,

http://www.nytimes.com/library/cyber/surf/060497mind.html

* David Isherwood, AMEC Data Mining alliance,

http://www.attar.com/pages/amec.htm

Publications:
* U. Fayyad, Data Mining and Knowledge Discovery journal, issue 2

http://www.research.microsoft.com/datamine

* George Paliouras, PhD thesis on refinement of event recognition systems,

http://www.cs.man.ac.uk/csonly/cstechrep/Theses/Paliouras/thesis.html

* A. Freitas, Ph.D. thesis on KDD available,

http://cswww.essex.ac.uk/SystemsArchitecture/DataMining/alex/thesis.html

Meetings:
* Cristina Lopez, Data Warehousing Meeting, Aug 24-29, Boston

http://www.dw-institute.com

* Shirley, NSF Workshop on Mathematical Techniques to Mine Massive Data Sets
July 12-15, 1997, University of Illinois at Chicago

http://www.lac.uic.edu/m3d-chicago.html

* Gordon, ICML-97 workshop: ML APPLICATION IN THE REAL WORLD,
Nashville, TN, July 12th 1997,

http://www.aifb.uni-karlsruhe.de/WBS/ICML97/ICML97.html

--
Data Mining and Knowledge Discovery community, focusing on the
latest research and applications.

Submissions are most welcome and should be emailed, with a
DESCRIPTIVE subject line (and a URL) to gps.
Please keep CFP and meetings announcements short and provide
a URL for details.

To subscribe, see

http://www.kdnuggets.com/subscribe.html

KD Nuggets frequency is 3-4 times a month.
Back issues of KD Nuggets, a catalog of data mining tools
('Siftware'), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site
at

http://www.kdnuggets.com/

-- Gregory Piatetsky-Shapiro (editor)
gps

********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Though this be madness, yet there is method in 't.
Shakespeare, Hamlet
(not commenting on the typical process of knowledge discovery)

Previous 1 Next Top

Date: Sat, 7 Jun 1997 02:01:46 -0400
From: iparsa@epsilon.com (Ismail Parsa)
Subject: Final CfP: KDD-97 Knowledge Discovery and Data Mining Tools Competition
----------------------------------------------------------------------
FINAL CALL FOR PARTICIPATION

KNOWLEDGE DISCOVERY CUP (KDD-CUP-97):

A Knowledge Discovery and Data Mining Tools Competition

to be held in conjunction with

THE THIRD INTERNATIONAL CONFERENCE ON
KNOWLEDGE DISCOVERY AND DATA MINING (KDD-97)

http://www-aig.jpl.nasa.gov/HyperNews/get/KDD97.html

----------------------------------------------------------------------

This year, for the first time, the KDD-97 Organization is organizing a
Knowledge Discovery and Data Mining (KDDM) tools competition
(KDD-CUP-97) in conjunction with the 3rd International Conference on
Knowledge Discovery and Data Mining (KDD-97.)

The Cup is open to all KDDM tool vendors, academics and corporations
with significant applications. All products, applications, research
prototypes and black-box solutions are welcome. If requested, the
anonymity of the participants and their affiliated companies/
institutions will be preserved. Our aim is not to rank the
participants but to recognize the most innovative, efficient and
methodologically advanced KDDM tools.

Attendance at the KDD-97 conference is not required to participate in
the CUP. Participants are required to demonstrate the performance of
their KDDM tool in one or all of the following areas:

1. Supervised Learning: Classification or Discrimination
2. Unsupervised Learning: Clustering or Segmentation.

In the interest of time, the regression or prediction category and
other descriptive modeling techniques, such as the association rules,
are not included in the competition this year.

The registration deadline for the Cup and the release date for the
training and validation data set(s) is June 19, 1997. All participants
must send back the results along with a scoring code[1] by July 17th,
one month prior to the KDD-97 conference. The scoring code will be
used by the KDD-CUP-97 committee to independently validate the results.
Each participant will receive the committee's evaluation of his/her/
their performance by August 11, 1997.

The winners will be determined based on a weighted combination of
classification accuracy (or predictive power,) software novelty (or
innovation,) efficiency (people and CPU time) and the data mining
methodology employed. The top three performing tools in each category
will be awarded Gold Miner, Silver Miner and Bronze Miner awards and
they will be listed in the KD Nuggets web site

http://www.kdnuggets.com

until the beginning of the KDD-98 conference, unless the participants
and their affiliated companies/institutions wish to remain anonymous.

[1] The scoring code is a stand alone C or C++ callable program or
hard code that carries out all the steps required to implement
the learning algorithm outside the model building environment.
In addition to the numeric values of the weights, it also
includes preprocessing statements for treating missing values,
transforming/normalizing/standardizing inputs, etc. It is
ultimately used in computing the predicted value or output from
raw data outside the modeling environment. For example, for the
decision tree algorithms, the preprocessing code along with the
'if-then-else' rules constitutes the scoring code.

+-----------------+
| Important Dates |
+-----------------+

- June 19, 1997: Registration deadline and data set release date
- July 17, 1997: Participants turn-in the results along with the
scoring code
- August 11, 1997: Individual performance evaluations sent to the
participants
- August 14, 1997: Public announcement during the KDD-97 conference
of the top three performing tools in each category.

+------------------------------+
| KDD-CUP-97 Program Committee |
+------------------------------+

Vasant Dhar, New York University, New York, NY, USA
Ronen Feldman, Bar-Ilan University, Ramat-Gan, ISRAEL
Ismail Parsa, Epsilon Data Management, Burlington, MA, USA
Gregory Piatetsky-Shapiro, Geneve Consulting Group, Cambridge, MA, USA

+---------------------+
| EVALUATION CRITERIA |
+---------------------+

A. CLASSIFICATION OR DISCRIMINATION CATEGORY

Although the predictive power, i.e., the classification accuracy, of
the resulting model measured in terms of lift (the term 'lift' implies
improvement over random or no prediction) will be the primary
evaluation criterion in the classification category, the winner will
be selected based on a weighted combination of all of the following:

1) Software Novelty/Innovation, e.g., unified approach to analyses
through the implementation of analytic metadata, integration of
data mining with data visualization, integration with other systems
in novel ways, user interaction, built-in intelligence, etc.

2) Efficiency, i.e., people and CPU time

3) KDD Methodology, including but not limited to:

- Data Archaeology, including but not limited to:

Data Hygiene (quality-control and cleaning)
Identify and eliminate noise

Preprocessing
Identify and eliminate constants
Identify and treat missing values
Identify (and treat) outliers
Identify (and treat) non-linearity
Identify (and treat) non-normality
Create derived features based on string-to-numeric conversions
Create derived features based on dates
Create derived features based on time series smoothing
Discretize or bin continuous features
Discretize or bin nominal features based on a criterion
Create derived features based on feature interactions
Create derived features based on transformations
Identify feature measurement scales: nominal, continuous, etc.

- Exploratory Data Analysis (EDA), including but not limited to:

Collinearity screening (elimination of redundant features)
Feature dimensionality reduction
Feature subset selection
Data visualization

- Model Development and Implementation, including but not limited to:

Application of data mining algorithm(s)
Evaluation of alternative algorithms, modeling technologies
Validation of results (to avoid over-fitting)
Interpretability of extracted patterns
Data visualization
Return on investment (ROI) or back-end analysis
Application of learned knowledge to the universe, i.e., scoring.

B. CLUSTERING OR SEGMENTATION CATEGORY:

In the clustering or segmentation category, the validity of the final
solution will be determined based on a combination of the relevant
items listed above and one or more of the following:

- External evaluation, i.e., using samples from known clusters

- Internal evaluation, i.e., using statistical or other measures
to characterize the goodness of fit of the clustering solution

- Replicability, i.e., using cross-validation samples

- Relative criteria, i.e., comparison of cluster solutions obtained
from alternative clustering algorithms applied to the same data
set.

Visualization of the final clustering solution will also be important.

+-----------------------+
| REGISTRATION BROCHURE |
+-----------------------+

To participate in the KDD-CUP-97, please complete the application form
below and sent it in plain ASCII format to (e-mail preferred):

+-----------------------------+
| Ismail Parsa |
| Epsilon Data Management |
| 50 Cambridge Street |
| Burlington MA 01803 USA |
| |
| E-mail: iparsa@epsilon.com |
| Phone: (617) 273-0250*6734 |
| Fax: (617) 272-8604 |
+-----------------------------+

Detailed information regarding the rules of the competition will be
sent to the participants later.

---------------------------------- cut ---------------------------------

KNOWLEDGE DISCOVERY CUP (KDD-CUP-97)

Registration Brochure

Competition category..........: (_) Classification or Discrimination
(check all that apply) (_) Clustering or Segmentation

Will you attend the KDD-97
conference..................: (_) Yes (_) No

Would you like to sponsor this
event? (terms/benefits to be
determined).................: (_) Yes (_) No

Name of software/product/tool
research prototype..........:

Status of software/product/
tool/research prototype.....: (_) Alpha (_) Beta (_) Production

Release date of software/
product/tool/research
prototype (in YYMM format)..:

Platform availability.........: (_) PC (_) Unix (_) Mainframe
(check all that apply) (_) Parallel environment (_) Other

Built-in KDDM methodology/
technology..................: (_) Graphical User Interface (GUI)
(check all that apply) (_) Data Access
(_) Data Selection (sampling, etc.)
(_) Data Preprocessing
(_) Exploratory Data Analysis
(_) Link Analysis (Associations,
Sequences, etc.)
(_) Clustering or Segmentation
(_) Time Series Analysis
(_) Classification or Discrimination
(_) Prediction or Regression
(_) Multiple Learned or Combined
Models
(_) Data Postprocessing
(_) Data and Knowledge Visualization
(_) Other, specify: _______
_______

Data mining algorithms........: (_) Supervised Neural Networks (MLP,
(check all that apply and RBF, etc.)
specify the algorithms) (_) Statistical Methods (Logistic,
^^^^^^^^^^^^^^^^^^^^^^ OLS, MARS, PPR, GAM, Nearest
Neighbors, etc.)
(_) Decision Trees (ID3, C4.5, CHAID,
CART, etc.)
(_) Hybrid Systems (Neuro-fuzzy systems,
GA optimized neural systems, etc.)
(_) Unsupervised Algorithms (Kohonen
networks, K-means clustering, etc.)
(_) Case-Based Reasoning
(_) Associations and Sequence Discovery
(_) Other, specify: _______
_______

Is your software/product/tool/
research prototype:

Freeware....................: (_) Yes (_) No
Available for purchase......: (_) Yes (_) No
if 'yes' then
Price (optional, in US$)..:
Number of sites installed.:

Does your software/product/
tool/research prototype
have limitations, e.g.,
number of variables and
rows it can handle, etc.....: (_) No (_) Yes, please specify: _______

Other relevant information....:

PRIMARY CONTACT:

Name..........................:
E-mail Address................:
Phone Number..................:
Fax Number....................:
Title.........................:
Name of Company/Institution...:

Mailing Address...............:

SECONDARY CONTACT:

Name..........................:
E-mail Address................:
Phone Number..................:
Title.........................:
Name of Company/Institution...:

Mailing Address...............:

---------------------------------- cut ---------------------------------

Previous 2 Next Top

Date: Wed, 4 Jun 1997
From: GPS (gps)
Subject: Mining the Cosmos ?

URL:

http://www.nytimes.com/library/cyber/surf/060497mind.html.

New Yort Times Cyberedition Mind & Machines section (June 4, 1997)
published this interesting article.

by Ashley Dunn

Breaking Down the Search for Extraterrestrials With Distributed Computing.

Searching for signs of extraterrestrial life has been one of those
quixotic ventures on the fringes of science whose chances of success
with current technology is somewhere around slim to none.

Radio telescopes have been able to reach deep into the cosmos, but they can
analyze only a thin sliver of information for telltale energy spikes
that could be a million-year-old beacon signal from an another
civilization, or possibly an alien version of 'I Love Lucy' leaking into
the cosmos.

Every second, gigabytes of data are thrown away in these
surveys (called SETI, for search for extra-terrestrial intelligence)
because there just isn't enough computing power to take more than a
rough cut at the whole universe.

This waste of data and the allure of searching for extraterrestrial
life was what attracted the attention of David Gedye, the director of
online games for Starwave in Seattle. Gedye was scraping his mind for
ways to involve the public in a mega-science project that would be fun,
educational and significant. It was a hobby project unrelated to his
work for Starwave, but one that had the potential to draw big numbers of
Net users.

Gedye, who had worked on distributed computing programs at
Sun Microsystems, saw SETI (Search for Extraterrestrial
Intelligence) as an appropriate project for global distributed
computing. It might never payoff in our lifetimes, like the
code-cracking efforts or prime number searches that have
sprouted on the Internet, but it was interesting and educational.
He named his project SETI@home.

He found at the University of Washington a like-minded professor of
astronomy, Woodruff Sullivan, who began to develop the idea of using
tens of thousands of personal computers around the world to help re-sift
the SETI data from the giant Arecibo radio telescope in Puerto Rico.

...
(full text available at

http://www.nytimes.com/library/cyber/surf/060497mind.html

(Thanks to Michael Beddows for pointing this article. GPS)

Previous 3 Next Top

From: David Isherwood (disherwo@attar.co.uk)
Date: Tue, 3 Jun 1997 12:41:27 +0000
Subject: AMEC Data Mining alliance

This full text is available at:

http://www.attar.com/pages/amec.htm

AMEC signs strategic Data Mining alliance with Attar Software and Hart
Consultants.

The application of data mining in the oil and gas industries is being
pioneered by AMEC Process and Energy in association with Attar
Software and Hart Consultants. The three companies today signed an
alliance agreement at AMEC's offices in London. AMEC's Elliott
Cairnes explained: 'Data mining is an innovative approach to identify
and explain independent patterns between sets of variable data. The
resultant information can be used to improve the performance of oil
and gas processes. In addition, this technology is equally applicable
to a wider range of operational areas and provides a new opportunity
for AMEC to deliver significant client benefits.'

AMEC, Attar and Hart are now able to provide proven expertise in the
application of advanced data mining techniques to oil and gas
processes, to improve efficiency, provide understanding of complex
processes, analyse performance and help identify problems, potential
problems and opportunities in plant operation.

Data mining is especially suited to complex processes and issues, for
example where the underlying theory is not well understood. Examples
include the analysis of drilling data and subsequent well performance,
the analysis of oil in water, and similar. It can also be used to
learn how process experts, plant operators or other key personnel make
decisions. For example, the 'secrets' of the best shift can be learnt
from records of their actions in response to various scenarios.

AMEC Process and Energy Limited is part of AMEC p.l.c. the
international engineering, construction and development group. One of
the largest and most experienced companies in the North Sea, it is an
international market leader providing a service capability that spans
the full life cycle of offshore production facilities ranging from
conceptual design and asset maintenance through to life cycle cost
optimisation and decommissioning services. AMEC also enjoys a
significant international reputation for value adding engineering,
construction and maintenance of downstream related oil and gas
terminals, refinery, petrochemical and nuclear plants, process plant
and, in the environmental area, incineration and pyrolysis plants.

Attar is a provider of advanced software technology with over ten
years experience in data analysis. Its recent advances in the
application of data mining using their Profiler software has been
pivotal in the company being the software technology partner in the
Pan European CRITIKAL project for large scale data mining, a project
that is part funded by the European Community with a combined
investment of $2 million.

Hart Consultants are specialist process and energy consultants,
providing innovative solutions to blue chip companies. They are
frequent advisors to Government Agencies on energy, and in
co-operation with Attar Software have pioneered the use of data mining
in organisations such as BP, ICI, Carlsberg Tetley and Cleveland
Potash.

for further information please contact:

Jeremy McTeague
PR and communications executive
AMEC Process and Energy Limited
Tel: 44 (0)171 705 2561
jeremy.mcteague@golden.amec.co.uk

http://www.apel.co.uk

David Isherwood
Marketing Director
Attar Software Ltd
Tel: 44 (0)1942 608844
disherwood@attar.co.uk

http://www.attar.com

Previous 4 Next Top

From: Usama Fayyad (fayyad@MICROSOFT.com)
Subject: Journal DMKD - issue 2
Date: Wed, 28 May 1997 11:34:12 -0700

Issue 2 of the new journal: Data Mining and
Knowledge Discovery has been finalized.
You can access the abstracts and full text of
the editorial at the journal's home page:

http://www.research.microsoft.com/datamine

Also, issue 1 is now available free on line
from Kluwer's web server. Links to Kluwer's
server are accessible via the above homepage
or directly at:

http://www.wkap.nl/kapis/CGI-BIN/WORLD/kaphtml.htm?DAMISAMPLE

===================================
DATA MINING AND KNOWLEDGE DISCOVERY
Volume 1, issue 2
===================================
CONTENTS:
--------

Editorial
Usama Fayyad, editor-in-chief

----------------------------------------------
PAPERS
------
BIRCH: A New Data Clustering Algorithm and Its
Applications
Tian Zhang, Raghu Ramakrishnan, Miron Livny

Mathematical Programming in Data Mining
O. L. Mangasarian

A Simple Constraint-Based Algorithm for Efficiently
Mining Observational Databases for Causal Relationships
Gregory F. Cooper

----------------------------------------------
BREIF APPLICATION SUMMARY
-------------------------
Visual Data Mining: Recognizing Telephone Calling Fraud
Kenneth C. Cox, Stephen G. Eick, Graham J. Wills,
and Ronald J. Brachman

================================================

Usama Fayyad
datamine@microsoft.com
for more information on the journal, CFP, and
to submit a paper, please see:

http://www.research.microsoft.com/datamine

Previous 5 Next Top

Date: Sat, 31 May 97 17:47:00 BST
From: George Paliouras (paliourg@cs.man.ac.uk)
Subject: PhD thesis on refinement of event recognition systems

I have recently defended my PhD thesis, which is entitled:

Refinement of Temporal Constraints in an Event Recognition System
using Small Datasets

I attach the abstract.

If you would like to download the thesis, see:

http://www.cs.man.ac.uk/csonly/cstechrep/Theses/Paliouras/thesis.html

For more information about my work see:

http://www.cs.man.ac.uk/ai/George/mypage.html

George Paliouras

=============================================================================

Refinement of Temporal Constraints in an Event Recognition System
using Small Datasets

The central aim of this thesis is to develop novel approaches to the
representation and the refinement of event recognition models. The
event recognition system is viewed as a temporal expert system, which
searches for interesting patterns in a stream of temporally indexed
data. The format of the input stream is unusual in comparison to
standard work on event recognition, such as speech and sound
recognition. It consists of time-stamped events, rather than a set of
signal properties measured at fixed time intervals. This format has
only recently been studied in the area of temporal event recognition.

This thesis proposes a new graphical representation which facilitates
explicit modelling of time. The recognition model is a hierarchy of
events, each defined as a sequence of subevents. A distinction is made
between low-level events, used in the input data stream, and high-level
events, defined by the model. Each event definition in the model
constrains the duration and temporal association of subevents. This
approach naturally handles overlapping events, which have been
overlooked in event recognition systems that do not model time
explicitly.

Using this graphical representation, a novel method for refining the
temporal constraints of a model is presented. The refinement of the
model is based on a small training set, consisting of a sequence of
low-level events and the high-level events which should be recognised.
The small size of the data set does not allow the use of empirical
learning methods. Instead, a knowledge refinement approach is adopted,
which utilises the original model parameters to guide the refinement
process. This approach differs from standard knowledge refinement
methods, in that it can handle the temporal aspects of event
recognition. Particular emphasis is given to the association of
low-level to high-level events - information that is not provided in
the data set.

Two modes of refinement are examined in the thesis: full and partial
supervision. The former requires the provision of training information
for all of the high-level events in the model. This assumption is
relaxed under partial supervision, where training information is
provided only for the events at the highest level of the hierarchical
model. The issue that arises under partial supervision is the correct
distribution of the limited training information to all of the events
in the model.

The performance of the refinement method is evaluated on a real-world
problem: the thematic analysis of the humpback whale song. The song of
humpback whales has been extensively studied and analysed in the
biological literature and data has been collected, in the form of tape
recordings. An event recognition model is derived for the song and the
refinement method is applied using a small set of songs. The results of
the evaluation are very encouraging, showing that the system is able to
improve significantly an initially inaccurate model, even with the use
of very limited training data. This result suggests that the method is
suitable for structured hierarchical models, such as that of the
humpback whale song. Models of this type are used in a wide range of
other event recognition tasks, such as fault diagnosis and image
sequence analysis.

Previous 6 Next Top

From: Freitas A A (freial@essex.ac.uk)
Date: Mon, 2 Jun 97 20:28:06 BST
Subject: Ph.D. thesis on KDD available in the web

Dear Dr. Piatetsky-Shapiro,

I would greatly appreciate if you could announce in KD
Nuggets that the Ph.D. thesis titled:
'Generic, Set-Oriented Primitives to Support Data-Parallel
Knowledge Discovery in Relational Database Systems'
is now available at the URL:

http://cswww.essex.ac.uk/SystemsArchitecture/DataMining/alex/thesis.html

The abstract of the thesis is appended to this message.

Thanks,
Alex
===========================================================
Alex A. Freitas

University of Essex
Dept. of Computer Science
Wivenhoe Park, Colchester, CO4 3SQ,
United Kingdom
Tel.: (44) (1206) 87-3333 ext. 3803
Fax: (44) (1206) 87-2788
e-mail: freial@essex.ac.uk

http://cswww.essex.ac.uk/projects/res/freial/web/alex.html

==========================================================

-----------------------------------------------------------------
'Generic, Set-Oriented Primitives to Support Data-Parallel
Knowledge Discovery in Relational Database Systems.'

Abstract

Efficiency and scalability are crucial issues in Knowledge Discovery
in Databases (KDD), or Data Mining. This thesis addresses these issues
by proposing a set-oriented, primitive-based framework for KDD that
integrates three areas, namely: (a) Machine Learning and/or Statistics
- particularly the Rule Induction (RI) and the Instance-Based Learning
(IBL) paradigms; (b) Relational Database Systems; and (c) Parallel
Database Servers (PDS).
This integration is achieved by devising primitives (rather than
algorithms) that capture the core, time-consuming operations of KDD
algorithms and by exploiting data parallelism in the execution of these
primitives. This leads to a significant speed up in the execution of
KDD algorithms supported by the primitives.
Two major characteristics of the primitives proposed in this thesis
are their generality and their set-oriented nature. The primitives are
generic in the sense that they underpin the central activity of a number
of KDD algorithms. This is important, because there is no single 'best'
KDD algorithm for all application domains and databases. Moreover, the
set-oriented nature of the primitives paves the way for the efficient
exploitation of data parallelism on PDS.
The main contributions of this thesis are that it: (1) proposes a
set-oriented, primitive-based framework for KDD and identify several
benefits of this framework (not only improved efficiency and scalability,
but also improved data re-use and software re-use, extensibility,
data-privacy control, etc.); (2) proposes generic, set-oriented primitives
for the RI and IBL paradigms; (3) shows how to use these primitives to
achieve a roughly linear speed up when executing data parallel KDD
algorithms on PDS; (4) identifies some kinds of algorithms and some
input-parameter values of the proposed primitives that lead to an
efficient exploitation of data parallelism; and (5) proposes extensions
to the functionality of current PDS to improve the efficiency in the
execution of KDD algorithms.

Previous 7 Next Top

Date: Wed, 04 Jun 1997 10:24:06 -0700
From: Cristina Lopez (cristina@airmail.net)
Subject: Data Warehousing Meeting

Third Annual Leadership Conference
The Data Warehousing Institute (TDWI)
August 24-29, 1997
Hynes Convention Center, Boston, Massachusetts

Business and technology professionals interested in data warehousing must
attend TDWI�s Third Annual Leadership Conference. Over 100 one-hour
presentations on hot topics for today�s data warehousing and data access
professionals will be offered. For more information, log on to
www.dw-institute.com or contact Cristina Lopez, (972) 480-9458, x125.

Previous 8 Next Top

From: shirley@math.uic.edu
Date: 3 Jun 1997 21:10:08 -0000
Subject: NSF Workshop on Mathematical Techniques to Mine Massive Data Sets

CALL FOR PARTICIPATION

NSF Sponsored Tutorial Workshop on
Mathematical Techniques to Mine Massive Data Sets

July 12-15, 1997

University of Illinois at Chicago

Chicago, Illinois

We are pleased to announce an NSF sponsored four-day workshop entitled
'Mathematical Techniques to Mine Massive Data Sets' to be held on the
campus of the University of Illinois at Chicago on July 12-15, 1997.

The goal of the workshop is to introduce an invited group of mathematical
scientists to tutorial material related to the data mining of massive data
sets. Data mining is the automatic extraction and discovery of patterns,
associations, changes, anomalies, and significant structures in large data
sets. Large data sets generated by scientific, engineering, medical and
business applications are becoming increasingly common. Developing
algorithms which can uncover patterns in large data sets is an important
mathematical challenge.

If you would like to participate, please contact one of the organizers.
Some travel support is available. Graduate students are also
encouraged to participate.

---------------------------------------------------------------------------

Workshop

The workshop will be limited to 15 speakers and 30 invited participants.
The primary goals of the workshop are:

* To provide background and survey talks in order to introduce
mathematical scientists to some of the mathematical and statistical
techniques used in data mining.

* To provide mathematical scientists with a 'flavor' for data mining by
providing several case studies and some exposure to 'hands-on' demos
and systems.

* To begin a process to create a web-based digital library containing
material related to data mining for the mathematical community.

---------------------------------------------------------------------------

Structure of Workshop

The four day workshop will include tutorial lectures on:

* tree-based statistical techniques
* graphical markov models
* neural nets
* model selection and model averaging
* combinatorial techniques
* logic-based techniques
* data mining applications.

In addition, there will a few advanced lectures, software
demonstrations, and discussion sessions.

The invited lecturers currently include:

Michael Berry, University of Tennessee
Ron Coifman, Yale Univeristy
Herbert Edelsbrunner, University of Illinois at Urbanna Champaign
Michael Jordan, MIT
Heikki Mannila, University of Helsinki
John McCarthy, Stanford University
Vince Poor, Princeton University
J. Ross Quinlan, University of Sydney
Eric Ristad, Princeton University
Stuart Russell, University of California at Berkeley

Additional speakers are anticipated and will be included on our web site.
---------------------------------------------------------------------------

Co-organizers:

Robert Grossman, University of Illinois at Chicago and Magnify, Inc.
Simon Kasif, University of Illinois at Chicago

---------------------------------------------------------------------------

For more information, including registration and hotel
information, please see:

http://www.lac.uic.edu/m3d-chicago.html

or email:

m3d@lac.uic.edu

Previous 9 Next Top

From: gordon@AIC.NRL.Navy.Mil
Date: Fri, 30 May 97 13:16:57 EDT
Subject: ICML-97 workshop: ML APPLICATION IN THE REAL WORLD:
Call for Participation
of the workshop on:

ML APPLICATION IN THE REAL WORLD:
METHODOLOGICAL ASPECTS AND IMPLICATIONS

at the
Fourteenth International Conference on Machine Learning (ICML-97)
Nashville, TN, July 12th 1997

WWW-page:

http://www.aifb.uni-karlsruhe.de/WBS/ICML97/ICML97.html

INTRODUCTION

The workshop 'ML Application in the Real World: Methodological Aspects and
Implications' will be held on Saturday, July 12, 1997 during the
Fourteenth International Conference on Machine Learning (ICML-97) which
will be co-located with the Tenth Annual Conference on Computational
Learning Theory (COLT-97) at Nashville, Tennessee from July 8 through July
12, 1997. This mailing lists the workshop objectives and format, the
program and registration guidelines. Further information is to be found on
the WWW pages of the Workshop and ICML.

WORKSHOP OBJECTIVES AND FORMAT

Applications of Machine Learning techniques to solve real-world problems
have gained much interest over the last decade. Recent years
have shown more and more interest in the application process. In spite
of this attention, the ML application process is still lacking a
generally accepted terminology, let alone commonly accepted approaches
or solutions. Several initiatives, both conferences and workshops have
been held concerning this topic.

The workshop will emphasise the processes underlying the application of
ML in practice. Methodological issues, as well as issues concerning the
kinds and roles of knowledge needed for applying ML will form a major
focus of the workshop. It aims at building upon some of the results of
discussions at the ICML-95 workshop on 'Application of ML techniques in
practice' and at the same time tries to move forward to a consensus
regarding a methodology on the application of learning algorithms in
practice.

The workshop is meant for scientists and practitioners that apply ML and
related techniques to solve problems in the real world. The workshop will
contain three invited lectures, held by speakers with industrial and
academic backgrounds. Five submitted papers will complete the workshops
program. These papers cover the several aspects of the application
process. In the afternoon, the workshop participants, the authors of
papers and the invited speakers will be joined in two working sessions,
with the aim to discuss and define research goals from both the industrial
and academic point of view. The results of these discussions will be a
first step in the direction of a comprehensive methodology for the
development and support of real world applications of ML techniques.

Registration for this workshop is possible via the ICML registration
(see ICML WWW page:

http://cswww.vuse.vanderbilt.edu/~mlccolt/icml97/index.html.

After registering, participants are asked to fill in a short questionnaire.

For further information your are referred to the workshop and ICML WWW
pages. Additional questions can be send to MLApplic.ICML@ato.dlo.nl

Previous 10 Next Top