News: *
GPS, new address for subscribing to KD nuggets,
subscribe
*
G. Prisco, Query: Knowledge Discovery in Network Alarm Databases Publications: *
J. Fuernkranz, AAI Spec Issue on First-Order Knowledge Discovery
in Databases,
*
T. Anand, Review of 'Seven Methods for Transforming Corporate Data
into Business Intelligence' by Vasant Dhar and Roger Stein *
S. Kaski, Thesis on data exploration with SOMs available,
--
Data Mining and Knowledge Discovery community, focusing on the
latest research and applications.
Submissions are most welcome and should be emailed, with a
DESCRIPTIVE subject line (and a URL) to gps.
Please keep CFP and meetings announcements short and provide
a URL for details.
KD Nuggets frequency is 3-4 times a month.
Back issues of KD Nuggets, a catalog of data mining tools
('Siftware'), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site
at
-- Gregory Piatetsky-Shapiro (editor)
gps
********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 is not equal to 3 - not even for very large values of 2.
Grabel's Law Previous1NextTop
Date: Wed, 16 Apr 1997 09:41:10 -0500 (EST)
From: Gregory Piatetsky-Shapiro (gps)
Subject: New address for subscribing to KD Nuggets -- subscribe
Thanks to many of you for the good words about Nuggets.
Last week I have completed the transfer of Nuggets server
(now called Knowledge Discovery Nuggets rather than KDD Nuggets
to emphasize the broader scope) to kdnuggets.com site.
To subscribe, please email to subscribe
1-line message with
subscribe kdnuggets
(to unsubscribe, message should be unsubscribe kdnuggets)
Please address all submissions for Nuggets to gps
;
Email to the old Nuggets address kdd@gte.com
will probably be forwarded to
gps
for some time, but it is better to send email to the
new address.
We are interested in the application of KDD methods to a public switching
network alarm database. Our goal is to improve maintenance and severe alarm
prevention. Our research started studying TASA System experience and their
sequence analysis algorithm. Any help would be appreciated, in particular:
- suggestions, experiences etc.
- suggestions about (eventually free) software for searching significant
sequences.
- contacts with any Italian University, in order to start a possible thesis
work on that topic.
Thank you
_________________________________________
Giuseppe Prisco - Software Analyst
Telesoft s.p.a SPR/SSCT
Via degli Agrostemmi, 30 S.Palomba - Roma 00040
tel 06/71035723
A recent MLnet Workshop, held at the ICML-96, focussed on a discussion of
the potential contribution of ILP for KDD. Information on the workshop
including a short summary and all accepted papers can be found at
The general conclusion was that ILP can
be a valuable tool for data mining, its main advantages being the
expressiveness of first-order logic as a representation language and the
ability of many ILP systems to use strong language biases for restricting
the huge search space. ILP has a high flexibility in incorporating various
forms of background knowledge, which can be invaluable for large KDD tasks.
The special issue on 'First-Order Knowledge Discovery in Databases' of the
Applied Artificial Intelligence Journal will thus welcome papers that focus
on one or more of the following topics:
* Embedding ILP into the KDD process
* Necessary pre- and post-processing steps for real-world applications
* Interfacing ILP systems with database managers
* Scalability of ILP for real-world databases
* Criteria for quantifying the complexity of ILP problems
* Evaluation of gain and price of ILP versus propositional learning
* Non-classification learning and discovery in a first-order framework
* Benefits of using background knowledge and/or strong explicit biases
* Innovative real-world applications of ILP
Papers on related subjects are also welcome, but a strong focus on
applications and database issues is required for all submissions.
It has been quite a while since I have been able to read a
technical/business book in its entirety, but recently I accomplished
this feat with 'Seven Methods for Transforming Corporate Data into
Business Intelligence' by Vasant Dhar and Roger Stein. Usually I am
unable to complete a technical/business book because either it is so
high-level (and abstract) that I cannot appreciate how the material
would apply to me, or it is so detailed that I am totally lost 'in the
trees'.
Seven Methods... is different. This short book starts off by providing
a framework for representing objectives and requirements for
'intelligent systems' (systems that embed AI techniques or systems
that explicitly represent knowledge) using a business oriented
vocabulary. This framework not only helps select the 'appropriate'
technique but it helps in formulating the problem that makes that
selection transparent. The business vocabulary helps explain the
selection to management and business types.
The book then describes seven data-intensive modeling techniques (tree
induction, analogical reasoning, fuzzy logic, rule-based systems,
neural nets, genetic algorithms, and OLAP) using the framework. While
these chapters are written to enable business-oriented people to get a
quick understanding of the techniques, they are also great for
technical folks because they can provide us knowledge about techniques
in which we are not experts. All techniques are treated with uniform
depth, which makes it a handy reference. The explanation of the
techniques is highly visual with almost every other page containing a
high quality graphic that explains how the techniques work. One
quibble: Chapter 10, titled Machine Learning, could have been more
aptly titled 'Tree Induction'.
The book ends with seven detailed (8-10 pages each) case studies of
successful applications of each of the techniques. Each case study is
described using the same framework. This is where the rubber meets the
road, and for the seven case studies selected the framework holds up
very well.
My only real complaint with this book is that it does not talk about using
multiple techniques together.
Btw: I felt this book was so well written that I promptly lent it to my
manager for weekend reading.
Disclaimer: Although we have never worked together, Roger Stein and I
for a brief time shared the same employer: Dun & Bradstreet, Roger at
Moody's and I at A.C Nielsen. One of the case studies is about
Spotlight, a system with which I was associated.
-Tej Anand
NCR Corporation
Human Interface Technology Center
Previous5NextTop
Date: Sun, 6 Apr 1997 21:54:10 +0300
From: Sami Kaski (sami@james.hut.fi)
Subject: Thesis on data exploration with SOMs available
Helsinki University of Technology
Neural Networks Research Centre
P.O.Box 2200 (Rakentajanaukio 2C)
FIN-02015 HUT, Finland
Finding structures in vast multidimensional data sets, be they
measurement data, statistics, or textual documents, is difficult and
time-consuming. Interesting, novel relations between the data items
may be hidden in the data. The self-organizing map (SOM) algorithm of
Kohonen can be used to aid the exploration: the structures in the data
sets can be illustrated on special map displays.
In this work, the methodology of using SOMs for exploratory data
analysis or data mining is reviewed and developed further. The
properties of the maps are compared with the properties of related
methods intended for visualizing high-dimensional multivariate data
sets. In a set of case studies the SOM algorithm is applied to
analyzing electroencephalograms, to illustrating structures of the
standard of living in the world, and to organizing full-text document
collections.
Measures are proposed for evaluating the quality of different types of
maps in representing a given data set, and for measuring the
robustness of the illustrations the maps produce. The same measures
may also be used for comparing the knowledge that different maps
represent.
Feature extraction must in general be tailored to the application, as
is done in the case studies. There exists, however, an algorithm
called the adaptive-subspace self-organizing map, recently developed
by Kohonen, which may be of help. It extracts invariant features
automatically from a data set. The algorithm is here characterized in
terms of an objective function, and demonstrated to be able to
identify input patterns subject to different transformations.
Moreover, it could also aid in feature exploration: the kernels that
the algorithm creates to achieve invariance can be illustrated on map
displays similar to those that are used for illustrating the data
sets.
Semio Corporation, a newly formed start-up company, is using
computational semiotics to identify patterns and relationships in
text-based information on the internet and intranet. Using data
visualization, the relationships are automatically displayed in a
graphical, navigable map. There is a working alpha version/early beta
of the software at
The initial product is called,
SemioMap, the Discovery Search application. SemioMap is targeted toward
the corporate intranet market.
We are currently seeking data mining, knowledge discovery and data base
oriented companies as development partners. If you are interested in
receiving more information, please email me at lzoob@semio.com.
Best,
Laurie Zoob
Director, Business Development
--
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Laurie Zoob Phone: (415) 802-2943
Director Business Development Fax: (415) 802-2942
Semio Corporation Email: lzoob@semio.com
One Dolphin Drive
Redwood Shores, CA 94065
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Previous7NextTop
Date: Wed, 26 Mar 1997 13:07:39 -0800 (PST)
From: 'S.D. BYERS' (byers@stat.washington.edu)
Subject: new version of ace.glm
Dear Splus and GLM users,
I have written a new version of ace.glm for Splus and it is
now available in the S archive at Statlib at
This simple function performs the ACE transformation detection
algorithm for generalized linear models using the weighted linear model
obtained from the GLM at convergence of the fitting algorithm.
It generalizes ace.logit, ACE for logistic regression.
A paper describing ace.logit and its uses can be found at
These functions can be powerful tools in Generalised Linear Modelling.
The new ace.glm will work for any GLM that has a family defined in Splus.
It will also work for any link function defined for these families.
Previously, ace.glm worked only for the canonical link function.
By default, ace.glm will pleasantly plot your ACE output if a graphics
device is open.
I would like to hear about any use/abuse/errors that may arise.
Thanks,
Simon Byers,
University of Washington Statistics.
byers@stat.washington.edu
Previous8NextTop
From: Robert Straughan (rob@nsrc.nus.sg)
Subject: Senior Consultant in Data Mining at NSRC in Singapore
Date: Sat, 5 Apr 1997 09:06:47 +0800 (SGT)
Staff Title: Group Leader - Senior Consultant, Commercial Applications
Date Required: 1 June 1997
Job Description: National Supercomputing Research Centre (NSRC) is
Singapore's national centre for High Performance Computing (HPC). NSRC
currently facilitates services and solutions to the Singapore industry
in the field of Computer Aided Engineering, Chemical Applications and
Electronics. Commercial Applications has been identified as a new
growth area, where HPC can make a significant impact on the commercial
industries' competitiveness. NSRC has therefore decided to expand into
this field and is currently looking for a person with extensive
industrial experience in the field of Data Mining within finance,
banking, insurance, or retail marketing. The Group Leader shall take
overall responsibility in promoting NSRC's capabilities within the
field of Data Mining to the commercial industry in Singapore and to
solicit for business. The Group Leader shall work closely with NSRC's
existing staff within this field to develop the best possible strategy
to target potential commercial organisations.
Skills Required: Minimum Masters Degree. Specialisation within the
field of Computer Science and Business Administration. At least 5
years experience from a financial institution or in retail marketing
within the field of Data Mining / Data Analysis. Extensive managerial
experience, in particular project management, business analysis and
negotiation skills. Strong knowledge of statistical analysis and
selection / building of appropriate modelling techniques to solve
business problems. A good understanding of the algorithms used in Data
Mining (neural networks, classifications etc.). Have previously used
IBM SP2 and tools such as Intelligent Miner and Darwin as well as
statistical packages such as SAS and SPSS.
Relocation assistance, allowances for housing, children's education and
transportation apply. Salary will be commensurate with qualifications
and experience.
You can obtain more details by contacting admin@nrsc.nus.sg
or visit
our web site at
THINKING MACHINES CORPORATION is a leading provider of knowledge discovery
software and services. TMC's high end datamining software suite enables
users to extract meaningful information from large databases. For more
information please see
The company is seeking an
individual to join the development organization as Manager of the Data
Analysis and Applications group.
The manager of the data analysis and applications group will provide
leadership and individual contribution in the design, development and
deployment of data mining applications, prototypes and application
frameworks. Responsibilities include
* working with product marketing and clients to identify opportunities for
data mining applications
* providing leadership and individual contribution in requirements
definition and application/prototype/framework development
* organizing and managing a team of analysts, software engineers and
technology engineers responsible for the development of specific
applications/prototypes/frameworks
* providing feedback to the development organization on potential
enhancements to existing products
Experience in a telecommunications and/or financial services is desirable
but not essential.
If you background and interests match these expectations, please send your
resume via fax, email or regular mail to
Previous10NextTop
From: Jan Komorowski (Jan.Komorowski@idi.ntnu.no)
Subject: PKDD'97 -- Preliminary symposium program
PKDD'97 -- 1st European Symposium on Principles of Data Mining and
Knowledge Discovery, Trondheim, Norway, June 24-27, 1997. Preliminary
symposium program and registration information:
Tenth Annual Conference on Fourteenth International
Computational Learning Theory Conference on Machine Learning
(COLT-97) (ICML-97)
July 6-9 July 8-11
COLT/ICML Tutorials on July 8
ICML-affiliated Workshops on July 12
Vanderbilt University
Nashville, Tennessee, USA
The organizers of COLT-97 and ICML-97 invite you to participate
in one or both of these conferences. In hopes of encouraging
interactions between the learning theory and machine learning
communities, the conferences are loosely coupled by joint
tutorials, a day of joint technical sessions, a joint banquet,
and otherwise through co-location at Vanderbilt University in
Nashville, Tennessee.
Find all the latest information about COLT-97 and ICML-97 at
including lists
of papers to be presented, registration and housing material,
information on tutorials and workshops, invited speakers,
travel, and the like. You may also obtain registration and
housing material by writing to mlccolt@vuse.vanderbilt.edu.
--------------------
Registration costs and applicable dates are:
Early Late
(until June 2) (after June 2)
COLT $140 $180
ICML $140 $180
COLT/ICML $240 $310
--------------------
Registration for one of three ICML-affiliated Workshops
on
(1) reinforcement learning,
(2) automata induction, grammatical inference, and language
acquisition, or
(3) machine learning application in the real world
is $25 until June 2, and $35 after June 2.
--------------------
ICML-97 acknowledges generous support from the Daimler-Benz
Corporation. COLT-97 acknowledges generous support from
ATT and is held in cooperation with ACM SIGACT and SIGART.
Both conferences are sponsored by Vanderbilt University.
1997 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX-97)
--------------------------------------------------------------------
Sponsored by the IEEE Computer Society and Co-located with
the 9th IEEE Tools with Artificial Intelligence Conference
November 3, 1997, Newport Beach, California, U.S.A.
===================================================
Call for Papers
The 1997 IEEE Knowledge and Data Engineering Exchange Workshop
(KDEX-97) will provide an international forum for researchers,
educators and practitioners to exchange and evaluate information and
experiences related to state-of-the-art issues and trends in the areas
of artificial intelligence and databases. The goal of this workshop
is to expedite technology transfer from researchers to practitioners,
to assess the impact of emerging technologies on current research
directions, and to identify emerging research opportunities.
Educators will present material and techniques for effectively
transferring state-of-the-art knowledge and data engineering
technologies to students and professionals. The workshop is currently
scheduled for an one-day duration, but depending on the final program
it might be extended to a second day.
Submissions can be in the form of survey papers, experience reports,
and educational material to facilitate technology transfer. Accepted
papers will be published in the workshop proceedings by the IEEE
Computer Society. A selected number of the accepted papers will
possibly be expanded and revised for publication in the IEEE
Transactions on Knowledge and Data Engineering (IEEE-TKDE) and the
International Journal of Artificial Intelligence Tools. Educational
material related to papers published in the IEEE-TKDE will be posted
on the IEEE-TKDE home page.
The theme of the workshop is 'AI MEETS DATABASES'. Topics of interest
include, but are not limited to:
- Computer supported cooperative processing and interoperable
systems
- Data sharing, data warehousing and meta-data management
- Distributed intelligent mediators and agents
- Distributed object management
- Dynamic knowledge
- Evaluation and measurement of knowledge and database systems
- High-performance issues (including architectures, knowledge
representation techniques, inference mechanisms, algorithms and
integration methods)
- Information structures and interaction
- Intelligent search, data mining and content-based retrieval
- Knowledge and data engineering systems
- Quality assurance for knowledge and data engineering systems
(correctness, reliability, security, survivability and
performance)
- Software re-engineering and intelligent software information
systems
- Spatio-temporal, active, mobile and multimedia data
- Emerging applications (biomedical systems, decision support,
geographical databases, Internet technologies and applications,
digital libraries, etc.)
All submissions should be limited to a maximum of 5,000 words. Six
hardcopies should be forwarded to the following address.
Xindong Wu (KDEX-97)
Department of Software Development
Monash University
900 Dandenong Road
Caulfield East, Melbourne 3145
Australia
Please include a cover page containing the title, authors (names,
postal and email addresses, telephone and fax numbers), and an
abstract. This cover page must accompany the paper.
************ I m p o r t a n t D a t e s *****************
* 6 copies of full papers received by: June 15, 1997 *
* acceptance/rejection notices: July 31, 1997 *
* final camera-readies due by: August 31, 1997 *
* workshop: November 3, 1997 *
************************************************************
Previous13NextTop
From: Marney Smyth (marney@ai.mit.edu)
Subject: Hinton -- Jordan Learning Methods course : spaces still available
Date: Thu, 10 Apr 1997 07:38:25 -0400 (EDT)
some spaces still available ...
**************************************************************
*** ***
*** Learning Methods for Prediction, Classification, ***
*** Novelty Detection and Time Series Analysis ***
*** ***
*** Washington, D.C., May 2 -- 3, 1997 ***
*** ***
*** Geoffrey Hinton, University of Toronto ***
*** Michael Jordan, Massachusetts Inst. of Tech. ***
*** ***
**************************************************************
A two-day intensive Tutorial on Advanced Learning Methods will be held
May 2 -- 3rd, 1997, at the Hyatt Regency on Capitol Hill, Washington
D.C. Space is available for up to 50 participants for the course.
The course will provide an in-depth discussion of the large collection
of new tools that have become available in recent years for developing
autonomous learning systems and for aiding in the analysis of complex
multivariate data. These tools include neural networks, hidden Markov
models, belief networks, decision trees, memory-based methods, as well
as increasingly sophisticated combinations of these architectures.
Applications include prediction, classification, fault detection,
time series analysis, diagnosis, optimization, system identification
and control, exploratory data analysis and many other problems in
statistics, machine learning and data mining.
(edited for space)
ADDITIONAL INFORMATION
A registration form is available from the course's WWW page at