--
Data Mining and Knowledge Discovery community, focusing on the
latest research and applications.
Submissions are most welcome and should be emailed, with a
DESCRIPTIVE subject line (and a URL) to gps.
Please keep CFP and meetings announcements short and provide
a URL for details.
KD Nuggets frequency is 2-3 times a month.
Back issues of KD Nuggets, a catalog of data mining tools
('Siftware'), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site
at
********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Rearranging the letters of 'Data Mining' gives:
A giant mind.
In mad giant.
Am giant din.
A tin, dim nag.
-- perhaps descriptions of Data Mining researchers ?
GPS (thanks to Genius2000 anagram software)
Previous1NextTop
Date: Sat, 20 Sep 1997 13:09:33 -0400
From: computerworld_weekly (cwflash_mail@computerworld.com)
Subject: COMPUTERWORLD on Data Mining the Ski Market
[here is finally the data mining project which has really good lifts.
GPS]
CORPORATE STRATEGIES
'Data mining lifts ski marketing'
Until American Skiing Co. started building a data warehouse and
data mining tools, its marketing efforts were headed downhill.
The ski resort conglomerate was having a hard time keeping track
of critical information -- as basic as who its customers were.
But under its new, centralized approach, American Skiing is able
to keep track of 2 million skiers, offer bonuses to repeat
customers and relate to every one of them on a more personalized
basis.
Previous2NextTop
From: Ronny Kohavi (ronnyk@starry.engr.sgi.com)
Date: Thu, 18 Sep 1997 23:11:23 -0700
Subject: Two large census datasets available from SGI
A year ago, we at Silicon Graphics, created the 'adult' dataset, which
is now available at UCI. Thanks to Terran Lane who interned here this
summer, we now have two larger files based on two years of real US
census data (unlike the previous dataset, these are not filtered
adults only).
The files are challenging for scale-up experiments because the
training sets are larger than common UCI files: 101MB and 47MB respectively.
Previous3NextTop
Date: Sat, 20 Sep 1997 17:27:16 -0400
Subject: CFP: J. of Computational Intelligence in Finance: Special issue on
Complexity and Dimensionality Reduction in Finance
From: Randall Caldwell
Journal of Computational Intelligence in Finance
Call for Papers
Special Issue on
'Complexity and Dimensionality Reduction in Finance'
The Journal of Computational Intelligence in Finance, a peer-reviewed
technical journal, published by Finance & Technology Publishing, is
seeking papers for review and publication in 1998 on 'Complexity and
Dimensionality Reduction in Finance'.
The Journal of Computational Intelligence in Finance publishes applied
research and practical applications of high quality that are based on
sound theoretical, empirical or quantitative analysis. It provides the
international forum for the convergence of the new multi-disciplined
field of computational intelligence in finance.
Papers published in the Journal are eligible for entry in an Annual
Essay Award Contest. The Editorial Advisory Board of the Journal
selects the best paper for which a cash award is presented each year.
SPECIAL TOPIC
Complexity and Dimensionality Reduction in Finance
PUBLICATION DATE
May 1998
PAPER SUBMISSION DEADLINE
December 15, 1997
SCOPE
In the broad sense, all intelligent perception and data
understanding seeks to reduce redundancy in data and, thus,
its complexity and dimensionality. This special issue of JCIF
focuses on a narrower scope: the theories, methods and
algorithms for mapping financial data from its original
representation into another form with reduced complexity and/or
dimensionality that appear beneficial to financial applications.
Of particular interest are techniques which can serve as
preprocessors to data-driven models and data mining technologies,
including those which address or utilize one or more of the
following: complexity and dimensionality characterization,
identification and analysis; data compression; feature extraction
techniques; regularity discovery; inductive reasoning; randomness
tests; algorithmic entropy; informational distance; minimal
description length; adaptive and nonlinear PCA and other alternatives
to standard forms of linear PCA; finite sequence statistics; variable
combining methods; data filtering; categorical versus continuously-
valued inputs; high-dimensional visualization analysis; and input
space reduction techniques.
MOTIVATION
In finance, we inevitably encounter an unavoidable dilemma: an
interest in collecting and utilizing as much data as possible in its
original form so that potentially useful information is not lost,
although this often results in data with high complexity and/or
dimensionality that increases costs and reduces performance. Despite
this, the notion that more input data is better persists.
The need for managing complexity and dimensionality arises from eroding
profit margins, diminishing arbitrage opportunities, lowered barriers to
entry, increasingly segmented markets, increased costs, and, in general,
the reduced performance (e.g., generalization ability) of tools applied,
such as data-driven models and data mining technologies. Thus, the topic
of this special issue represents very important areas of applied research
across multiple disciplines relevant to computational intelligence in
finance.
ABSTRACTS AND PAPERS
For submission requirements on this CFP and further details, see:
or contact: Editors, JCIF, P.O. Box 764, Haymarket, VA 20168, USA
or send inquiry to: ftpub@compuserve.com
Previous4NextTop
Subject: Genetic Programming Book Announcement
Date: Tue, 23 Sep 1997 15:36:57 PDT
From: Wolfgang Banzhaf (banzhaf@ICSI.Berkeley.EDU)
Wolfgang Banzhaf Peter Nordin Robert E. Keller Frank D. Francone
Genetic Programming --- An Introduction
On the Automatic Evolution of Computer Programs and Its Applications
With a Foreword
by
John R. Koza
Publication date: November 1997
Joint publication by: Morgan Kaufmann Publishers, San Francisco
dpunkt.verlag, Heidelberg
Genetic programming addresses the problem of automatic programming,
namely the problem of how to enable a computer to do useful things
without instructing it, step by step, on how to do it. The rapid growth
of the field of genetic programming reflects the growing recognition
that, after half a century of research in the fields of artificial
intelligence, machine learning, adaptive systems, automated logic, expert
systems, and neural networks, we may finally have a way to achieve
automatic programming. Genetic programming is fundamentally different
from other approaches in terms of (i) its representation (namely,
programs), (ii) the role of knowledge (none), (iii) the role of logic
(none), and (iv) its mechanism (gleaned from nature) for getting to a
solution within the space of possible solutions.
FROM THE FIRST SECTION OF THE BOOK
Automated programming will be one of the most important areas of
computer science research over the next twenty years. Hardware
speed and capability has leapt forward exponentially. Yet software
consistently lags years behind the capabilities of the hardware. The
gap appears to be ever increasing. Demand for computer code
keeps growing but the process of writing code is still mired in the
modern day equivalent of the medieval ``guild'' days. Like swords in
the 15th century, muskets before the early 19th century and books
before the printing press, each piece of computer code is, today,
handmade by a craftsman for a particular purpose.
The history of computer programming is a history of attempts to move
away from the ``craftsman'' approach -- structured programming, object
oriented programming, object libraries, rapid prototyping. But each of
these advances leaves the code that does the real work firmly in the
hands of a craftsman, the programmer. The ability to enable computers
to learn to program themselves is of the utmost importance in freeing
the computer industry and the computer user from code that is obsolete
before it is released.
Wolfgang Banzhaf
Department of Computer Science
University of Dortmund
GERMANY
A belief network learning system is now available for download. It includes
a wizard-like interface and a construction engine.
Name: Belief Network Power Constructor
Version: 1.0 Beta 1
Platforms: 32-bit windows systems (windows95/NT)
Input: A data set with discrete values in the fields (attributes) and
optional domain knowledge (attribute ordering, partial ordering, direct
causes and effects).
Output: A network structure of the data set.
Main Features:
1.Easy to use. It gathers necessary input information through 5 simple
steps.
2.Accessibility. Supports most of the popular desktop database and
spreadsheet formats, including: Ms-Access, dBase, Foxpro, Paradox, Excel
and text file formats. It also supports remote database servers like
ORACLE, SQL-SERVER through ODBC.
3.Reusable. The engine is an ActiveX DLL, so that you can easily integrate
the engine into your belief network, datamining or knowledge base system
for windows95/NT.
4.Efficient. This engine constructs belief networks by using conditional
independence(CI) tests. In general, it requires CI tests to the complexity
of O(N^4); when the attribute ordering is known, the complexity is O(N^2).
N is the number of attributes (fields).
5.Reliable. Modified mutual information calculation method is used as CI
test to make it more reliable when the data set is not large enough.
6.Support domain knowledge. Complete ordering, partial ordering and causes
and effects can be used to constrain the search space and therefore speed
up the construction process.
7.Running time is Linear to the number of records.
Our company, Megaputer Intelligence Ltd. (MPI) based in Moscow, Russia, is
a leading provider of advanced data mining and decision support
solutions.
Below I provide some updated information about the company and products.
I would like to inform you that MPI has recently rolled out new data
mining systems:
-- PolyAnalyst 3.0 for Windows NT
-- PolyAnalyst Knowledge Server (Client/Server architecture).
PolyAnalyst 3.0 is a next generation automated knowledge discovery
system. PolyAnalyst utilizes the newest AI technology: Evolutionary
Programming and Symbolic Knowledge Acquisition. It presents discovered
knowledge explicitly as rules and algorithms or predicting tables.
PolyAnalyst achieves phenomenal results at major international contests
of systems for data mining. Among users of the system are securities
traders, bankers, doctors, and utility companies.
We now provide free downloading of an evaluation copy of PolyAnalyst 3.0
for Windows NT platform from our new Web site at
The corresponding user manual can be requested directly from
megaputer@glas.apc.org
Megaputer Intelligence recently opened an office in the USA:
tel: 812-325-3026
fax: 812-339-1646
Megaputer Intelligence
1518 E Fairwood Drive
Bloomington IN 47408
Sergei Ananyan
Megaputer Intelligence
Previous7NextTop
Date: Thu, 25 Sep 1997 09:42:51 -0400
Subject: Research position at SmithKline Beecham Pharmaceuticals
From: Kenneth D Kopple @
SB_PHARM_RD
EXCEPTIONAL OPPORTUNITY IN CHEMINFORMATICS
At SmithKline Beecham Pharmaceuticals the application of combinatorial
chemistry and
high-throughput screening is resulting in an extraordinary increase in
the numbers of compounds and corresponding data being generated for
drug discovery. We are expanding our transnational Cheminformatics
group to work closely with medicinal chemists and screening scientists
in the UK and US in the collection, transfer, manipulation and
exploitation of these data.
This expansion opens an opportunity (based at either our US or UK
state-of-the-art facilities) in Knowledge Discovery in Databases,
covering
the development and application of tools to find relationships within
and among large chemical and biological databases:
GROUP LEADER - KNOWLEDGE DISCOVERY
Requirements include a PhD in physical, chemical, biological or
computer
sciences or statistics, with at least 5 years' experience in
pattern recognition, machine learning or chemometrics and a proven
record of performance in chemical or biological database analysis.
Job Code H7-0273
As part of our commitment to attract and retain the best, SmithKline
Beecham provides a fully competitive salary/benefits/relocation
package. To be considered for this outstanding opportunity,
mail or e-mail your curriculum vitae, indicating job code, to the
address below. For more information on SmithKline Beecham, visit
our Web site at www.sb.com/careers.
We are an Equal Opportunity Employer.
SmithKline Beecham
Job Code H7-0273
P.O. Box 2646
Bala Cynwyd, PA 19004, USA
e-mail: smithkline@jwtworks.com.
Frankfurt -- August 1997. Bill Gates, Chairman & CEO, Microsoft
Corporation, will present a keynote address at COMDEX Internet & Object
World Frankfurt '97. Gates will focus on new trends in Internet and
Intranet Communications.
The Internet markets are exploding, and Dataquest analysts are
prediciting a worldwide EDI market in the year 2,000 of US $ 1.9 billion.
The Internet will play a major role in the way business will be done in the
future. This challenge will be addressed in Bill Gates' keynote.
The keynote address will take place on Wednesday, October 8, 1997,
at the Sheraton Conference Center, Frankfurt,
(Airport). Admission is free to all attendees of COMDEX Internet & Object
World Frankfurt '97.
COMDEX Internet & Object World Frankfurt will take place on
October 7 - 10, 1997, in the Sheraton Conference Center Frankfurt (Airport),
Frankfurt/Main, Germany.
The show will consist of two conferences side by side and one combined
exhibition with over 100 exhibitors.
COMDEX Internet is the premiere COMDEX event in Germany, and it focuses
on the business use of the Internet. Object World Frankfurt, now in its
sixth year, has experienced constant growth and is recognized as the most
important event for object technology in Europe.
Show management is expecting more than 4,000 visitors to the show.
The complete conference program of COMDEX Internet & Object World
Frankfurt '97 is available at:
The 10th European Conference on Machine Learning (ECML-98) will be
held in Chemnitz (ex- Karl Marx Stadt, near Dresden), Germany, from
April, 21st to 24th 1998.
PROGRAM
The scientific program (April 21 - 23) will include invited talks,
presentations of accepted papers, summary and commenting sessions on
current and upcoming issues in machine learning, tutorials, an
industrial session as well as poster and demonstration
sessions. Saturday, April 24, will be devoted to workshops.
Separate calls for proposals will be issued (please, consult ECML'98
web page or contact ECML'98 chairpersons at ecml98@lri.fr
for
details).
RELEVANT RESEARCH AREAS
Submissions are invited that describe empirical and theoretical
research in all areas of machine learning. In addition, papers from
related disciplines that deal with adaptive intelligence,
(semi-)automated knowledge acquisition, or (semi-)automated knowledge
organization are welcome.
Submissions that describe the application of machine learning methods
to real-world problems are encouraged, but such submissions should
speak of general issues of machine learning, perhaps illustrating
novel learning methods or demonstrating the utility of established
methods in previously unexplored settings.
IMPORTANT DATES:
Submission deadline: 31 October 1997
Notification of acceptance 13 January 1998
Camera ready copy 9 February 1998
Conference 21-24 April 1998
IMPORTANT ADDRESS
Submitted papers, and poster / demonstration descriptions should be
sent to :
Claire Nedellec and Celine Rouveirol (ECML'98)
LRI, Bat 490
Universite Paris-Sud
F-91405 Orsay Cedex FRANCE
E-mail: ecml98@lri.fr
Previous10NextTop
Date: Fri, 19 Sep 1997 09:45:59 -0400
From: Trish Carbone (carbone@mitre.org)
Subject: CFP: AFCEA Federal Data Mining Symposium
FIRST FEDERAL DATA MINING SYMPOSIUM
J. W. Marriot Hotel in Washington, D.C.
December 16-17, 1997
On behalf of AFCEA and the participating commands, we are pleased to invite
you to the first Federal Data Mining Symposium!
The Federal Data Mining Symposium will spotlight the technical advances in
and applications of Data Mining in the government community. The need for
better and more automated methods of analysis is particularly important as
the amount of data being collected and stored increases dramatically.
Analysts must be knowledgeable about new types of techniques, both
statistical and artificial intelligence techniques, in order to better find
patterns, correlations, trends, and summaries in the wealth of data.
The major goals of the Symposium are to exchange information and ideas on
the role of data mining and present requirements and proposed solutions,
provide discussions on the broad range of applicable technologies, provide
policy guidance applicable to DoD and civil agency information resource
managers, and identify and encourage service-unique and government-wide
knowledge and use of the important technology.
The Federal Data Mining Symposium will focus on three overall areas:
* User requirements for better analysis methods including data mining
techniques
* Applications of data mining that have been constructed and fielded,
both for
structured data as well as textual and other multimedia data
* Technology for addressing the requirements
Topics of interest include:
* User requirements for data mining
* Applications of data mining
* Data mining from multimedia data (e.g., text, imagery, geospatial)
* Lessons learned from constructing data mining systems
* Solutions to data mining problems (noisy data, uncertain data, incomplete
data, dynamic data)
* Data cleansing as part of the data mining process
* Visualization as part of data mining process
* Validation and verification of discovered knowledge
* Security/privacy concerns and solutions
* Employment of discovered knowledge in decision support or other systems
Data users, analysts, administrators, managers, developers, researchers,
theoreticians, and vendors are cordially invited to attend and to submit
papers for presentation at the Federal Data Mining Symposium. Papers will
be selected based on relevance to the conference and technical quality.
Selected papers will be presented at the conference and/or published in the
proceedings. Exhibit Space Available!
Call for Papers Due No Later Than - October 31, 1997
The CFP for the 'Research Methodology' track especially invites
papers and special session proposals dealing with 'automated data
analysis/data mining.' As chair of this track, I would like to see
presentations that facilitate an 'outbreak' of KDD work within the
academic Marketing community. Along with competitive submissions, I
also need volunteers to serve as reviewers for these submissions.