Knowledge Discovery Nuggets 97:13, e-mailed 97-04-16

KDD Nuggets Index

To KD Mine: main site for Data Mining and Knowledge Discovery.

To subscribe to KDD Nuggets, email to kdd-request

Past Issues: 97 Nuggets, 1996 Nuggets, 1995 Nuggets, 1994 Nuggets, 1993 Nuggets

Knowledge Discovery Nuggets 97:13, e-mailed 97-04-16

News:
* GPS, new address for subscribing to KD nuggets,
subscribe
* G. Prisco, Query: Knowledge Discovery in Network Alarm Databases
Publications:
* J. Fuernkranz, AAI Spec Issue on First-Order Knowledge Discovery
in Databases,

http://www.ai.univie.ac.at/ilp_kdd/aai-si.html

* T. Anand, Review of 'Seven Methods for Transforming Corporate Data
into Business Intelligence' by Vasant Dhar and Roger Stein
* S. Kaski, Thesis on data exploration with SOMs available,

http://nucleus.hut.fi/~sami/thesis/thesis.html

Siftware:
* L. Zoob, SemioMap, the Discovery Search Application

http://www.semio.com

* S.D. BYERS, new version of ace.glm for Splus

http://lib.stat.cmu.edu/S/ace.glm

Positions:
* R. Straughan, Senior Consultant in Data Mining at NSRC in Singapore

http://www.nsrc.nus.sg

* N. Dayanand, Manager of the Data Analysis and Applications group

http://www.think.com

Meetings:
* J. Komorowski, PKDD'97 -- Preliminary symposium program,

http://www.idt.ntnu.no/pkdd97/

* ICML-Colt, ICML-97/Colt-97 call for participation

http://cswww.vuse.vanderbilt.edu/~mlccolt/

* X. Wu, CFP: IEEE Knowledge and Data Engineering Exchange
Workshop (KDEX-97), Nov 3, 1997, Newport Beach, CA, USA

http://www.sd.monash.edu.au/kdex-97

* M. Smyth, Hinton -- Jordan Learning Methods course:
spaces still available,

http://www.ai.mit.edu/projects/cbcl/web-pis/jordan/course/

--
Data Mining and Knowledge Discovery community, focusing on the
latest research and applications.

Submissions are most welcome and should be emailed, with a
DESCRIPTIVE subject line (and a URL) to gps.
Please keep CFP and meetings announcements short and provide
a URL for details.

To subscribe, see

http://www.kdnuggets.com/subscribe.html

KD Nuggets frequency is 3-4 times a month.
Back issues of KD Nuggets, a catalog of data mining tools
('Siftware'), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site
at

http://www.kdnuggets.com/

-- Gregory Piatetsky-Shapiro (editor)
gps
********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 is not equal to 3 - not even for very large values of 2.
Grabel's Law

Previous 1 Next Top

Date: Wed, 16 Apr 1997 09:41:10 -0500 (EST)
From: Gregory Piatetsky-Shapiro (gps)
Subject: New address for subscribing to KD Nuggets -- subscribe

Thanks to many of you for the good words about Nuggets.
Last week I have completed the transfer of Nuggets server
(now called Knowledge Discovery Nuggets rather than KDD Nuggets
to emphasize the broader scope) to kdnuggets.com site.

To subscribe, please email to subscribe

1-line message with

subscribe kdnuggets

(to unsubscribe, message should be unsubscribe kdnuggets)

See

http://www.kdnuggets.com/subscribe.html

for details.

Please address all submissions for Nuggets to gps ;
Email to the old Nuggets address kdd@gte.com will probably be forwarded to
gps for some time, but it is better to send email to the
new address.

-- GPS

Previous 2 Next Top

Date: Mon, 14 Apr 97 12:48:49 PDT
From: Giuseppe Prisco (gprisco@rc0085.roma.tlsoft.it)
Subject: Knowledge Discovery in Switching Network Alarm Databases

We are interested in the application of KDD methods to a public switching
network alarm database. Our goal is to improve maintenance and severe alarm
prevention. Our research started studying TASA System experience and their
sequence analysis algorithm. Any help would be appreciated, in particular:

- suggestions, experiences etc.

- suggestions about (eventually free) software for searching significant
sequences.

- contacts with any Italian University, in order to start a possible thesis
work on that topic.

Thank you
_________________________________________

Giuseppe Prisco - Software Analyst
Telesoft s.p.a SPR/SSCT
Via degli Agrostemmi, 30 S.Palomba - Roma 00040
tel 06/71035723

email Giuseppe.Prisco@tlsoft.it

Previous 3 Next Top

Date: Tue, 01 Apr 1997 12:50:19 +0200
From: Johannes Fuernkranz (juffi@ai.univie.ac.at)

2nd Call For Papers
Applied Artificial Intelligence
Special issue on
First-Order Knowledge Discovery in Databases
(URL:

http://www.ai.univie.ac.at/ilp_kdd/aai-si.html

A recent MLnet Workshop, held at the ICML-96, focussed on a discussion of
the potential contribution of ILP for KDD. Information on the workshop
including a short summary and all accepted papers can be found at

http://www.ai.univie.ac.at/ilp_kdd/.

The general conclusion was that ILP can
be a valuable tool for data mining, its main advantages being the
expressiveness of first-order logic as a representation language and the
ability of many ILP systems to use strong language biases for restricting
the huge search space. ILP has a high flexibility in incorporating various
forms of background knowledge, which can be invaluable for large KDD tasks.

The special issue on 'First-Order Knowledge Discovery in Databases' of the
Applied Artificial Intelligence Journal will thus welcome papers that focus
on one or more of the following topics:

* Embedding ILP into the KDD process
* Necessary pre- and post-processing steps for real-world applications
* Interfacing ILP systems with database managers
* Scalability of ILP for real-world databases
* Criteria for quantifying the complexity of ILP problems
* Evaluation of gain and price of ILP versus propositional learning
* Non-classification learning and discovery in a first-order framework
* Benefits of using background knowledge and/or strong explicit biases
* Innovative real-world applications of ILP

Papers on related subjects are also welcome, but a strong focus on
applications and database issues is required for all submissions.

see

http://www.ai.univie.ac.at/ilp_kdd/aai-si.html

for full details
on Submissions

Submission Deadline: April 30, 1997

[edited for space. GPS]

Previous 4 Next Top

From: 'Anand, Tej' (TAnand@HITC.AtlantaGA.ncr.com)
Subject: book review for Nuggets
Date: Fri, 4 Apr 1997 16:58:14 -0500

Book Review: 'Seven Methods for Transforming Corporate Data into Business
Intelligence' by Vasant Dhar and Roger Stein,
(Prentice-Hall, 1997).

(see

http://www.prenhall.com/allbooks/be_0132820064.html

for more
on this book. GPS)

It has been quite a while since I have been able to read a
technical/business book in its entirety, but recently I accomplished
this feat with 'Seven Methods for Transforming Corporate Data into
Business Intelligence' by Vasant Dhar and Roger Stein. Usually I am
unable to complete a technical/business book because either it is so
high-level (and abstract) that I cannot appreciate how the material
would apply to me, or it is so detailed that I am totally lost 'in the
trees'.

Seven Methods... is different. This short book starts off by providing
a framework for representing objectives and requirements for
'intelligent systems' (systems that embed AI techniques or systems
that explicitly represent knowledge) using a business oriented
vocabulary. This framework not only helps select the 'appropriate'
technique but it helps in formulating the problem that makes that
selection transparent. The business vocabulary helps explain the
selection to management and business types.

The book then describes seven data-intensive modeling techniques (tree
induction, analogical reasoning, fuzzy logic, rule-based systems,
neural nets, genetic algorithms, and OLAP) using the framework. While
these chapters are written to enable business-oriented people to get a
quick understanding of the techniques, they are also great for
technical folks because they can provide us knowledge about techniques
in which we are not experts. All techniques are treated with uniform
depth, which makes it a handy reference. The explanation of the
techniques is highly visual with almost every other page containing a
high quality graphic that explains how the techniques work. One
quibble: Chapter 10, titled Machine Learning, could have been more
aptly titled 'Tree Induction'.

The book ends with seven detailed (8-10 pages each) case studies of
successful applications of each of the techniques. Each case study is
described using the same framework. This is where the rubber meets the
road, and for the seven case studies selected the framework holds up
very well.

My only real complaint with this book is that it does not talk about using
multiple techniques together.

Btw: I felt this book was so well written that I promptly lent it to my
manager for weekend reading.

Disclaimer: Although we have never worked together, Roger Stein and I
for a brief time shared the same employer: Dun & Bradstreet, Roger at
Moody's and I at A.C Nielsen. One of the case studies is about
Spotlight, a system with which I was associated.

-Tej Anand
NCR Corporation
Human Interface Technology Center

Previous 5 Next Top

Date: Sun, 6 Apr 1997 21:54:10 +0300
From: Sami Kaski (sami@james.hut.fi)
Subject: Thesis on data exploration with SOMs available

The following Dr.Tech. thesis is available at

http://nucleus.hut.fi/~sami/thesis/thesis.html

(html-version)

http://nucleus.hut.fi/~sami/thesis.ps.gz

(compressed postscript, 300K)

http://nucleus.hut.fi/~sami/thesis.ps

(postscript, 2M)

The articles that belong to the thesis can be accessed through the page

http://nucleus.hut.fi/~sami/thesis/node3.html

Data Exploration Using Self-Organizing Maps

Samuel Kaski

Helsinki University of Technology
Neural Networks Research Centre
P.O.Box 2200 (Rakentajanaukio 2C)
FIN-02015 HUT, Finland

Finding structures in vast multidimensional data sets, be they
measurement data, statistics, or textual documents, is difficult and
time-consuming. Interesting, novel relations between the data items
may be hidden in the data. The self-organizing map (SOM) algorithm of
Kohonen can be used to aid the exploration: the structures in the data
sets can be illustrated on special map displays.

In this work, the methodology of using SOMs for exploratory data
analysis or data mining is reviewed and developed further. The
properties of the maps are compared with the properties of related
methods intended for visualizing high-dimensional multivariate data
sets. In a set of case studies the SOM algorithm is applied to
analyzing electroencephalograms, to illustrating structures of the
standard of living in the world, and to organizing full-text document
collections.

Measures are proposed for evaluating the quality of different types of
maps in representing a given data set, and for measuring the
robustness of the illustrations the maps produce. The same measures
may also be used for comparing the knowledge that different maps
represent.

Feature extraction must in general be tailored to the application, as
is done in the case studies. There exists, however, an algorithm
called the adaptive-subspace self-organizing map, recently developed
by Kohonen, which may be of help. It extracts invariant features
automatically from a data set. The algorithm is here characterized in
terms of an objective function, and demonstrated to be able to
identify input patterns subject to different transformations.
Moreover, it could also aid in feature exploration: the kernels that
the algorithm creates to achieve invariance can be illustrated on map
displays similar to those that are used for illustrating the data
sets.

Previous 6 Next Top

Date: Thu, 10 Apr 1997 17:43:04 -0700
From: Laurie Zoob (lzoob@semio.com)
Subject: SemioMap, the Discovery Search Application

Semio Corporation, a newly formed start-up company, is using
computational semiotics to identify patterns and relationships in
text-based information on the internet and intranet. Using data
visualization, the relationships are automatically displayed in a
graphical, navigable map. There is a working alpha version/early beta
of the software at

http://www.semio.com.

The initial product is called,
SemioMap, the Discovery Search application. SemioMap is targeted toward
the corporate intranet market.

We are currently seeking data mining, knowledge discovery and data base
oriented companies as development partners. If you are interested in
receiving more information, please email me at lzoob@semio.com.

Best,
Laurie Zoob
Director, Business Development
--
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Laurie Zoob Phone: (415) 802-2943
Director Business Development Fax: (415) 802-2942
Semio Corporation Email: lzoob@semio.com
One Dolphin Drive

http://www.semio.com

Redwood Shores, CA 94065
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Previous 7 Next Top

Date: Wed, 26 Mar 1997 13:07:39 -0800 (PST)
From: 'S.D. BYERS' (byers@stat.washington.edu)
Subject: new version of ace.glm

Dear Splus and GLM users,
I have written a new version of ace.glm for Splus and it is
now available in the S archive at Statlib at

http://lib.stat.cmu.edu/S/ace.glm

This simple function performs the ACE transformation detection
algorithm for generalized linear models using the weighted linear model
obtained from the GLM at convergence of the fitting algorithm.
It generalizes ace.logit, ACE for logistic regression.
A paper describing ace.logit and its uses can be found at

http://www.stat.washington.edu/tech.reports/raftery-richardson.ps

These functions can be powerful tools in Generalised Linear Modelling.
The new ace.glm will work for any GLM that has a family defined in Splus.
It will also work for any link function defined for these families.
Previously, ace.glm worked only for the canonical link function.
By default, ace.glm will pleasantly plot your ACE output if a graphics
device is open.

I would like to hear about any use/abuse/errors that may arise.

Thanks,
Simon Byers,
University of Washington Statistics.
byers@stat.washington.edu

Previous 8 Next Top

From: Robert Straughan (rob@nsrc.nus.sg)
Subject: Senior Consultant in Data Mining at NSRC in Singapore
Date: Sat, 5 Apr 1997 09:06:47 +0800 (SGT)

Staff Title: Group Leader - Senior Consultant, Commercial Applications
Date Required: 1 June 1997

Job Description: National Supercomputing Research Centre (NSRC) is
Singapore's national centre for High Performance Computing (HPC). NSRC
currently facilitates services and solutions to the Singapore industry
in the field of Computer Aided Engineering, Chemical Applications and
Electronics. Commercial Applications has been identified as a new
growth area, where HPC can make a significant impact on the commercial
industries' competitiveness. NSRC has therefore decided to expand into
this field and is currently looking for a person with extensive
industrial experience in the field of Data Mining within finance,
banking, insurance, or retail marketing. The Group Leader shall take
overall responsibility in promoting NSRC's capabilities within the
field of Data Mining to the commercial industry in Singapore and to
solicit for business. The Group Leader shall work closely with NSRC's
existing staff within this field to develop the best possible strategy
to target potential commercial organisations.

Skills Required: Minimum Masters Degree. Specialisation within the
field of Computer Science and Business Administration. At least 5
years experience from a financial institution or in retail marketing
within the field of Data Mining / Data Analysis. Extensive managerial
experience, in particular project management, business analysis and
negotiation skills. Strong knowledge of statistical analysis and
selection / building of appropriate modelling techniques to solve
business problems. A good understanding of the algorithms used in Data
Mining (neural networks, classifications etc.). Have previously used
IBM SP2 and tools such as Intelligent Miner and Darwin as well as
statistical packages such as SAS and SPSS.

Relocation assistance, allowances for housing, children's education and
transportation apply. Salary will be commensurate with qualifications
and experience.

You can obtain more details by contacting admin@nrsc.nus.sg or visit
our web site at

http://www.nsrc.nus.sg.

Resumes can be sent to:

Administration Manager
NSRC
89 Science Park Drive
The Rutherford #01-05/08
Singapore 118261

Previous 9 Next Top

Date: Fri, 04 Apr 1997 14:41:09 -0500
From: Nalini Dayanand (nalini@think.com)
Subject: Job Announcement-Please post

THINKING MACHINES CORPORATION is a leading provider of knowledge discovery
software and services. TMC's high end datamining software suite enables
users to extract meaningful information from large databases. For more
information please see

http://www.think.com.

The company is seeking an
individual to join the development organization as Manager of the Data
Analysis and Applications group.

The manager of the data analysis and applications group will provide
leadership and individual contribution in the design, development and
deployment of data mining applications, prototypes and application
frameworks. Responsibilities include

* working with product marketing and clients to identify opportunities for
data mining applications
* providing leadership and individual contribution in requirements
definition and application/prototype/framework development
* organizing and managing a team of analysts, software engineers and
technology engineers responsible for the development of specific
applications/prototypes/frameworks
* providing feedback to the development organization on potential
enhancements to existing products

Experience in a telecommunications and/or financial services is desirable
but not essential.

If you background and interests match these expectations, please send your
resume via fax, email or regular mail to

Nalini Dayanand
Thinking Machines Corporation
14 Crosby Drive
Bedford, MA 01730

Fax: (617) 276-0444
email: nalini@think.com

Previous 10 Next Top

From: Jan Komorowski (Jan.Komorowski@idi.ntnu.no)
Subject: PKDD'97 -- Preliminary symposium program

PKDD'97 -- 1st European Symposium on Principles of Data Mining and
Knowledge Discovery, Trondheim, Norway, June 24-27, 1997. Preliminary
symposium program and registration information:

http://www.idt.ntnu.no/pkdd97/

Previous 11 Next Top

Date: Thu, 10 Apr 97 15:04:39 CDT
From: mlccolt@vuse.vanderbilt.edu (ICML-COLT Administration)
Subject: COLT/ICML

Call for Participation

Tenth Annual Conference on Fourteenth International
Computational Learning Theory Conference on Machine Learning
(COLT-97) (ICML-97)

July 6-9 July 8-11

COLT/ICML Tutorials on July 8
ICML-affiliated Workshops on July 12

Vanderbilt University
Nashville, Tennessee, USA

The organizers of COLT-97 and ICML-97 invite you to participate
in one or both of these conferences. In hopes of encouraging
interactions between the learning theory and machine learning
communities, the conferences are loosely coupled by joint
tutorials, a day of joint technical sessions, a joint banquet,
and otherwise through co-location at Vanderbilt University in
Nashville, Tennessee.

Find all the latest information about COLT-97 and ICML-97 at

http://cswww.vuse.vanderbilt.edu/~mlccolt/,

including lists
of papers to be presented, registration and housing material,
information on tutorials and workshops, invited speakers,
travel, and the like. You may also obtain registration and
housing material by writing to mlccolt@vuse.vanderbilt.edu.

--------------------

Registration costs and applicable dates are:

Early Late
(until June 2) (after June 2)

COLT $140 $180
ICML $140 $180
COLT/ICML $240 $310

--------------------

Registration for one of three ICML-affiliated Workshops
on
(1) reinforcement learning,
(2) automata induction, grammatical inference, and language
acquisition, or
(3) machine learning application in the real world

is $25 until June 2, and $35 after June 2.

--------------------
ICML-97 acknowledges generous support from the Daimler-Benz
Corporation. COLT-97 acknowledges generous support from
ATT and is held in cooperation with ACM SIGACT and SIGART.
Both conferences are sponsored by Vanderbilt University.

Previous 12 Next Top

Date: Fri, 11 Apr 1997 11:03:04 +1000 (EST)
From: Xindong.Wu@fcit.monash.edu.au (Xindong Wu)
Subject: CFP: IEEE KDEX-97

1997 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX-97)
--------------------------------------------------------------------
Sponsored by the IEEE Computer Society and Co-located with
the 9th IEEE Tools with Artificial Intelligence Conference

November 3, 1997, Newport Beach, California, U.S.A.
===================================================

Call for Papers

The 1997 IEEE Knowledge and Data Engineering Exchange Workshop
(KDEX-97) will provide an international forum for researchers,
educators and practitioners to exchange and evaluate information and
experiences related to state-of-the-art issues and trends in the areas
of artificial intelligence and databases. The goal of this workshop
is to expedite technology transfer from researchers to practitioners,
to assess the impact of emerging technologies on current research
directions, and to identify emerging research opportunities.
Educators will present material and techniques for effectively
transferring state-of-the-art knowledge and data engineering
technologies to students and professionals. The workshop is currently
scheduled for an one-day duration, but depending on the final program
it might be extended to a second day.

Submissions can be in the form of survey papers, experience reports,
and educational material to facilitate technology transfer. Accepted
papers will be published in the workshop proceedings by the IEEE
Computer Society. A selected number of the accepted papers will
possibly be expanded and revised for publication in the IEEE
Transactions on Knowledge and Data Engineering (IEEE-TKDE) and the
International Journal of Artificial Intelligence Tools. Educational
material related to papers published in the IEEE-TKDE will be posted
on the IEEE-TKDE home page.

The theme of the workshop is 'AI MEETS DATABASES'. Topics of interest
include, but are not limited to:

- Computer supported cooperative processing and interoperable
systems
- Data sharing, data warehousing and meta-data management
- Distributed intelligent mediators and agents
- Distributed object management
- Dynamic knowledge
- Evaluation and measurement of knowledge and database systems
- High-performance issues (including architectures, knowledge
representation techniques, inference mechanisms, algorithms and
integration methods)
- Information structures and interaction
- Intelligent search, data mining and content-based retrieval
- Knowledge and data engineering systems
- Quality assurance for knowledge and data engineering systems
(correctness, reliability, security, survivability and
performance)
- Software re-engineering and intelligent software information
systems
- Spatio-temporal, active, mobile and multimedia data
- Emerging applications (biomedical systems, decision support,
geographical databases, Internet technologies and applications,
digital libraries, etc.)

All submissions should be limited to a maximum of 5,000 words. Six
hardcopies should be forwarded to the following address.

Xindong Wu (KDEX-97)
Department of Software Development
Monash University
900 Dandenong Road
Caulfield East, Melbourne 3145
Australia

Phone: +61 3 9903 1025
Fax: +61 3 9903 1077
E-mail: xindong@insect.sd.monash.edu.au

Please include a cover page containing the title, authors (names,
postal and email addresses, telephone and fax numbers), and an
abstract. This cover page must accompany the paper.

************ I m p o r t a n t D a t e s *****************
* 6 copies of full papers received by: June 15, 1997 *
* acceptance/rejection notices: July 31, 1997 *
* final camera-readies due by: August 31, 1997 *
* workshop: November 3, 1997 *
************************************************************

Further Information
===================

WWW:

http://www.sd.monash.edu.au/kdex-97

Previous 13 Next Top

From: Marney Smyth (marney@ai.mit.edu)
Subject: Hinton -- Jordan Learning Methods course : spaces still available
Date: Thu, 10 Apr 1997 07:38:25 -0400 (EDT)

some spaces still available ...

**************************************************************
*** ***
*** Learning Methods for Prediction, Classification, ***
*** Novelty Detection and Time Series Analysis ***
*** ***
*** Washington, D.C., May 2 -- 3, 1997 ***
*** ***
*** Geoffrey Hinton, University of Toronto ***
*** Michael Jordan, Massachusetts Inst. of Tech. ***
*** ***
**************************************************************

A two-day intensive Tutorial on Advanced Learning Methods will be held
May 2 -- 3rd, 1997, at the Hyatt Regency on Capitol Hill, Washington
D.C. Space is available for up to 50 participants for the course.

The course will provide an in-depth discussion of the large collection
of new tools that have become available in recent years for developing
autonomous learning systems and for aiding in the analysis of complex
multivariate data. These tools include neural networks, hidden Markov
models, belief networks, decision trees, memory-based methods, as well
as increasingly sophisticated combinations of these architectures.
Applications include prediction, classification, fault detection,
time series analysis, diagnosis, optimization, system identification
and control, exploratory data analysis and many other problems in
statistics, machine learning and data mining.

(edited for space)

ADDITIONAL INFORMATION
A registration form is available from the course's WWW page at

http://www.ai.mit.edu/projects/cbcl/web-pis/jordan/course/

Marney Smyth
E-mail: marney@ai.mit.edu
Phone: 617 258-8928
Fax: 617 258-6779

Previous 14 Next Top