KDD Nuggets Index


To KD Mine: main site for Data Mining and Knowledge Discovery.
To subscribe to KDD Nuggets, email to kdd-request
Past Issues: 1996 Nuggets, 1995 Nuggets, 1994 Nuggets, 1993 Nuggets


Data Mining and Knowledge Discovery Nuggets 96:8, e-mailed 96-03-01

Contents:
News:
* H. Wang, Communications Week (Feb 12) on Data Mining,
http://techweb.cmp.com/cw/021296/close596.htm
* U. Vere, Ultragem: New Company Offers Data Mining Services
http://www.ultragem.com
* A. Beitel, Director of Marketing needs help
Publications:
* AI-STAT list, Journal of Statistical Software
http://www.stat.ucla.edu/journals/jss/
Siftware:
* R. Kohavi, MLC++ tool at http://www.sgi.com/Technology/mlc/
* J. P. Brown, Superinduction Tool, http://www.hal-pc.org/~jpbrown
Meetings:
* B. Grossman, Call for Position Papers for Data Mining Meeting,
March 14-15, 1996, San Diego
http://nscp.uic.edu/Conf/sdsc.html
* D. Sleeman, ICML-96 workshop: Synergy between Scientific Knowledge
Discovery and Knowledge Discovery In Databases,
http://www.csd.abdn.ac.uk/~sleeman/cfp-iml96ws.html
* 2nd CFP: AAAI-96 Workshop on Integrating Multiple Learned Models,
http://www.cs.fit.edu/~imlm/
* I. Imam, CFP: AAAI-96 WS on INTELLIGENT ADAPTIVE AGENTS
http://www.mli.gmu.edu/~iimam/aaai96.html
* J. Oliver, Final CFP for ISIS, Information, Statistics and
Induction in Science, Melbourne, Australia, 20-23 Aug 1996,
http://www.cs.monash.edu.au/~jono/ISIS/ISIS.shtml

*** only 18 days to KDD-96 paper submission deadline ***
--
Data Mining and Knowledge Discovery community,
focusing on the latest research and applications.

Contributions are most welcome and should be emailed,
with a DESCRIPTIVE subject line (and a URL, when available) to (kdd@gte.com).
E-mail add/delete requests to (kdd-request@gte.com).

Nuggets frequency is approximately weekly.
Back issues of Nuggets, a catalog of S*i*ftware (data mining tools),
and a wealth of other information on Data Mining and Knowledge Discovery
is available at Knowledge Discovery Mine site, URL http://info.gte.com/~kdd.

-- Gregory Piatetsky-Shapiro (moderator)

********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) *
* and not necessarily of their respective employers (or GTE Laboratories) *
*****************************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'A people who mean to be their own Governours, must arm themselves
with the power which knowledge gives.'
James Madison - 1822 (thanks to Susan Tafolla)


Previous  1 Next   Top
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 16 Feb 1996 10:14:01 -0500
From: hw22@xian.gte.com (Hongbin Wang)

from techweb newsletter

DATA GOLD MINE
~~~~~~~~~~~~~~
There's gold in that there data! With the data-mining tools available
today, users don't necessarily need to know what they're looking
for to find answers that save money or spawn ideas to grow a business.
A powerful new query ability lets users drill down through millions of
records to grab golden nuggets of information that, when combined,
yield creative answers to questions no one ever thought to ask.
This week CommunicationsWeek's closeup focuses on
'Data Warehousing: Mining for Data.'

Full text in http://techweb.cmp.com/cw/021296/close596.htm


A powerful new query ability lets users drill down through millions
of records to grab golden nuggets of information that, when combined, yield
creative answers to questions no one ever thought to ask

By
LOUIS CONNOR

B eer and diapers sell well together. Who knew?

If not for the data-mining power of Oracle7.1, a retailer in a
neighborhood populated with young families might never have known
that husbands on the way home from work often grab a six-pack of
beer as a post-diaper impulse buy. What vice president of marketing
would have thought to ask?

That's just the point, says Ken Jacobs, vice president of product
strategy for server technologies at Oracle Corp. in Redwood Shores,
Calif. With the data-mining tools available today, users don't
necessarily need to know what they're looking for to find answers that
save money or spawn ideas to grow a business.

Data-mining technology exemplifies the state of enterprise-wide
database-access and analysis software. Very simply, it is a powerful
query ability that lets users ask open-ended questions. Used with new,
high-powered data-setup tools, data-mining software enables network
administrators to build better indexes. It also gives database-access
mechanisms a very fast way to analyze hundreds of millions of records,
according to Praful Shah, Oracle's director of server technologies.

But these capabilities come at a price. 'The biggest thing is, don't
underestimate the time and effort it takes, and don't underestimate
how much it will cost you,' says Wayne Eckerman, data-warehousing
analyst at the Patricia Seybold Group, a consulting company in Boston.

A typical data-mining setup can run into the hundreds of thousands of
dollars. For instance, a one-year license for BaseSAS, an
enterprise-wide software system from SAS Institute that has a
substantial installed base among Fortune 500 companies, costs $2,000
per client, plus $25,000 for the server.

Complex Venture

Indeed, this particular venture is complicated in many ways, involving
many man-hours of development (though vendors offer substantial
support), complicated procedural steps and product choices. Users will
need scrubbing and cleansing programs, a way to move data and a
'metadata' creator--and that's before the querying and reporting tools
come into the picture. What's more, there is no single, high-powered
package that does it all.

'For some functions, there can be a steep learning curve for end
users, even after the network administrators work with us to set up
the system,' says Becky Brown, marketing manager for the
data-warehousing initiative at SAS Institute, Cary, N.C. 'The more
power users want, the more they must understand about the way the
systems work.'

Learning the often-used fourth-generation language, for instance, can
be especially tough, she says. 'But what they're able to see afterward
is unbelievable. Once they get to that level, there's nothing they
can't do--show four graphs on a screen at a time, hot-link data with
the graph. It's powerful stuff.'

MetroHealth Insurance Corp. in Roanoke, Va., decided to take the
plunge. The $8 billion health-insurance company decided that a
pictorial directory of health-care providers on a CD-ROM would help
its agents. Using MapInfo Professional from MapInfo Corp., Dan
Chinnok, MetroHealth's manager of geographic services, pulled 200,000
pieces of data from MetroHealth's data warehouse, threw the
information onto a CD, then programmed the data for geographically
oriented queries.

According to Chinnok, when an insurance agent in San Francisco pops in
the CD and keys in that city, a color-coded map appears to indicate
how densely populated the area is with health-care providers, with one
color showing areas with more than 100 providers and another to
indicate 50 to 100. MetroHealth sold 250 CDs as soon as they were
available, Chinnok says, because there's no other way to get at the
data.'You'd have to flip through directory books otherwise, and the
stack could be 12-feet thick,' he says.

A limited-query version of MapInfo's query-and-report technology is
included as a pull-down menu option in Microsoft Corp.'s Excel
spreadsheet program. Analysts and developers agree that the deal with
Microsoft indicates an overall acclimation to data-warehousing
analysis among users of varying expertise. Mainstream use of such
products--or, at least, use of limited-query functions--will likely be
seen on the Internet within the year, observers say. They envision the
Internet as a remote repository where users can use a simple Web
browser to query searchable text.

A new business climate is fostering interest in data-mining
technology, analysts say. Strategic decisions (choosing the next
location for a toy store franchise), focused marketing (unearthing the
best prospects for a bank's new credit card) and customer tracking
(who's buying beer and diapers together and why?) go straight to the
bottom line of any company--and all can be ascertained through data
mining.

Although reports indicate that data warehousing's popularity isn't
limited to any specific industry, telephone and banking
companies--many of which are among Oracle's mainstay clients--are
prime examples of users benefiting from data mining, Oracle's Jacobs
says. After deregulation, both industries were left with masses of
data to analyze--intelligence that would later help them determine
where to expand their businesses. Jacobs cites U S West and Pacific
Northwest Bell Telephone Co. as examples.

'Imagine it from their perspective: to make a query and find out who
is calling whom, and who is being called by whom,' he says. 'It's a
marketer's dream.'

Take it a step further to include international marketing. Say
L.L. Bean, the sports outfitter, wants to compare European and
American sales. It will need both continents' sales figures in its
data store. Suppose the marketing vice president for L.L. Bean's
American buyers decides to massage the data to find out who buys what
and when. George Washington's birthday, for example, may be a time
when Americans are at home, flipping through catalogs or surfing the
Internet, and relaxed enough to want to buy mail-order
merchandise. But the same day in France would have no meaning. Why
should the vice president drill down through worldwide data when he
can create a self-contained American data mart for himself?

Remember, however, that not only are such tools expensive, but the
real cost of enterprise-wide data warehousing lies in the software and
integration requirements, analysts and vendors say.

'We see a lot of prospects who take a look at the $200,000 price of
our product and go away,' says Steve Feldman, vice president of
marketing at Vality Corp., a 'scrubber-and-cleanser' manufacturer in
Boston. 'Then, a year later, they come back, thinking the price is a
bargain.'

Feldman explains why. Vality's Integrity Data Reengineering tool
cleans and consolidates data, providing the data-quality foundation
for the warehouse. Consider a bank that has credit card accounts
stored in three different databases: The address for a single
cardholder can be different in all three. The scrubber looks at the
data character by character to decide which address is the most
recently entered. That one goes in the data warehouse; the others are
deleted.

'It gets down and dirty in the muck and mud of the data and fixes it,'
Feldman says. 'You can't do without it, because unless you concentrate
first on the data quality, your decision support system will
ultimately fail.'

Shrink-wrapped data-warehouse analysis tools are available for
thousands of dollars, not hundreds of thousands. But these are not
enterprise-wide solutions and they offer only limited analysis
capabilities, says Henry Morris, an analyst at International Data
Corp., a market researcher in Framingham, Mass.

'They're like the financial transaction-only packages Oracle offers,
or any number of smaller packages from Business Objects,' he says.

One highlight of the Business Objects Inc. package is its natural
language query capability; no Structured Query Language knowledge is
necessary. But it is hampered by its inability to perform
multidimensional analyses--which means users can't ask open-ended
'beer and diapers' questions--which effective data mining
demands. Though limited, Business Objects is reportedly one of the
most popular packages around; it sells for $500 per user, plus $3,495
for the mandatory developer's module. Officials at Data General Corp.,
a reseller in Westborough, Mass., say DG sells a lot of these packages
to users that want to get into data warehousing slowly.

'We're starting to see that retailers, especially, have an interest in
them,' says DG product manager Howard Dernehl. One large New York
retailer now uses a prefabricated package to automate promotional
mailings to all customers of a store chain in location 'X' who bought
clothing type 'Y' at time 'Z.' The retailer, which requested
anonymity, uses an on-line analysis processing (OLAP) tool to identify
all the people who meet the criteria and then sends them a store
coupon. 'It's serious target marketing,' Dernehl says.

But some users who taste of the power of these analysis and reporting
tools soon crave more. For many users, IDC's Morris says, it may be
more cost-effective in the long run to implement an enterprise-wide
system at the outset. Otherwise, making additions, changes and other
replacements to smaller packages can create quite a mess in terms of
implementation and troubleshooting.

If an enterprise-wide data warehouse is indeed the ultimate objective,
then once all necessary hardware is in place and the data has been
dumped, the first job is scrubbing and cleansing.

Consider hypotranslating a Digital Equipment Corp. system and a Unix
system and consolidating them into a single system. Scrubbers are
essential in this case--but even databases that run on the same system
in different offices tend to have conflicting or redundant data,
vendors and analysts say.

Scrubber definitions differ, however. Travis Richardson, director of
planning of the enterprise systems group at Apertus Technologies Inc.,
in Eden Prairie, Minn., calls Apertus' Enterprise/Integrator 3.0 an
integration tool; actually, it's a scrubber whose integration features
are emphasized for marketing reasons. Similarly, Vality's Feldman
describes Sunny- vale, Calif.-based Prism Solutions Inc.'s scrubber as
a mover, not a scrubber.

Ultimately, what makes a scrubber a scrubber is its ability to merge
redundant data, resolve conflicting data and integrate data from
incompatible systems. Any product that meets these criteria can scrub
and cleanse a database. What separates a good scrubber from a great
one is how many databases it supports, IDC's Morris says. The rest is
a matter of tweaking it for individual needs.

The next consideration is the metadata creator. Scrubbers lie
underneath the metadata layer, which contains the descriptive specs of
the source data, such as which database the regional sales figure came
from, according to SAS' Brown. Essentially, metadata gives business
users information about the source of the data they're looking at.

'Say you have a sales report and you query to get a consolidated
report that pulls from a lot of different places around the world,'
Brown says. 'Metadata tells you where each bit of information came
from, how old it is, whether it's a summarized value or an actual
value, whether the database it came from is reliable.'

When drilling down for data, metadata is essential for open-ended
queries, she adds--so important to the future of data mining, in fact,
that SAS Institute has joined the Metadata Consortium and is devoting
resources toward developing a product that can create metadata; the
metadata creator is intended to complement SAS' sophisticated
drill-down tools.

SAS isn't the only vendor trying to get into the metadata game. 'We're
adding capabilities and are starting to encroach on the metadata
products' turf,' says Vality's Feldman. 'Eventually, we'll be able to
do all the metadata creation.'

Indeed, developers build sophisticated scrubbing tools, complete with
pattern recognition. Often, scrubbers will uncover information users
didn't even know was there.

'If these tools add the capability to do transformations on top of
creating the business tools, then you'll have one-stop shopping for
all your data preparation needs,' Seybold's Eckerson says. 'So, yes,
these two areas will start to tread on each other's turf.'

As scrubbers and metadata creators become more sophisticated, they
start to have an impact on how much the end-user drill-down tools can
do. 'People have a taste of what it can do now, so this year we'll see
people demanding more access to unstructured data,' Eckerson
predicts. Consider the acquisition of Illustra by Informix Software
Inc., he says, which combined the companies' technological know-how to
make the drill-down tools more intelligent. 'But that still is in
development. On the client side, you can already build apps to
integrate graphics and text, but you're still restricted by the tools
to go in and retrieve the data,' he says.

SAS' Brown adds: 'OK, it's a struggle to get intelligent data from a
client/server environment, and it's going to cost--but it's worth it.'

Fortunately, some costs related to data-mining efforts are
dropping. At the high end, where technological trends begin, advanced
enterprise-wide data-warehouse analysis is surging forward, spurred by
storage hardware costs that are falling so fast that between press
time and now, they've likely dropped again. A terabyte of storage
would have cost $10 million five years ago; now, it's less than $1
million, says IDC's Morris. Not that users will necessarily have to
buy a terabyte of storage to install a data-mining system--but, at the
very least, they will need to duplicate the storage capacity they
already have, so falling storage prices are a boon.

'Now, extensive data warehousing is within the scope of your
thinking--and your budget,' says Oracle's Jacobs. 'With hardware costs
down, to store data on disks, not on tape, is viable; so the option to
analyze it with all the new tools is now very inviting. It's not that
people didn't want to ask these open-ended questions before,
either. They did. They just couldn't get an answer.'

On another front, vendors are hotly debating whether it's better to
set up a relational OLAP structure or a multidimensional one. In a
relational structure, data is stored in a tabular format, permitting
ad hoc queries; navigation is effected by matching field values
between tables. In a multidimensional architecture, sets of cubes are
arranged in arrays, with subsets created according to category.

OLAP--or Not?
For a sales database, for example, this might mean
arrays that show data by geography, time and product. One 3-D
geographical cube might indicate that in the state of Georgia, Company
'ABC' had 'X' number of sales at 'Z' dollar value.

Multidimensional structures are often imperative for data-mining
querying, but some kinds of multidimensional queries are possible with
relational OLAP architectures.

'The whole hullabaloo about OLAP is simply that it lets users identify
trends or problems buried under the data,' Eckerson says. 'It is much
easier and more intuitive to use. It has built-in analysis functions,
like calculations--basically different ways of rolling up data and
looking at it and summarizing it.'

Some analysts see the architectural debate as a way for vendors to
differentiate their products (see Product Lineup, page 43). Yet if
relational architecture proves itself as a performance booster in a
client/server environment, there will be a natural evolution to it.

The last big trend in data warehousing--the Internet--will make use of
OLAP. 'The Internet will be used as a basic interface to a data
warehouse,' Eckerson says. 'Users will pull down existing reports and
custom queries. The Web replaces the front end and the browser becomes
your query tool.'

MapInfo is planning its foray into the Internet with geographical
information databases on a Web site that users can tap into for
pictorial map reports, says MapInfo CEO Brian Owen. Other companies
will follow, analysts say, in turn forcing the query tools in browsers
to advance as well.

It's quite a scene to conjure: Seated at your networked PC, armed only
with a mouse to drive your browser, you have a world of unstructured
databases lies at your fingertips. You seamlessly integrate all
available information about your industry into your WAN's data
warehouse, then drill down into it to unearth a mapped report of how
your company is faring competitively in the context of the world.



Previous  2 Next   Top
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Subject: New Company Offers Data Mining Services
To: kdd@gte.com
Date: Sat, 24 Feb 1996 17:21:25 -0800 (PST)
From: vere@ultragem.com

We would like to announce the formation of a new company, Ultragem Data
Mining. Ultragem will specialize in custom data mining services using
genetic algorithms. The company is located in the San Francisco Bay
area, and will conduct business globally over the Internet. The genetic
data mining technology was originally developed at a large corporation, where
it was quite successful. We were pleasantly surprised to discover that
the new technology actually achieved new performance records on a number
of machine learning benchmark problems. This finding motivated the formation
of Ultragem, so that the technology could be offered to a wider audience.
Additional information can be obtained from our web site at
http://www.ultragem.com.

Ursula Vere



Previous  3 Next   Top
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Return-Path: (ABEITEL@mhs.on.com)
From: ABEITEL@mhs.on.com (Anne Beitel)
Date: 01 Mar 96 09:54:00
To: kdd%eureka@gte.com (KDD Nuggets Moderator), ABEITEL@mhs.on.com

Hello. I've visited the Knowledge Discovery and Data Mining Resources
page
several times recently and found it very interesting. I'm still a bit
new
to this field, and lost, frankly, and I was wondering if you could help.

Who I am is the director of marketing for a company that does a lot of
direct mail marketing. We have an in-house corporate database of more
than
300,000 people who have requested information or trials of our software
products, and a subset of whom have bought. We also collect a
lot
of demographic information on these people, and maintain records of how
many
times we've talked with them, their stage in the sales cycle, etc.

We've built some statistical models to predict the likelihood of someone
becoming a customer, and we use those models to select prospect names for
renting. However, what we'd really like to do is 'mine' the database for
interesting correlations. In my mind, modeling assumes a relationship,
whereas mining FINDS the relationships. I'm looking for a service
company
or organization to whom we could ship our database, and whom we could pay
to
data-mine it. (I'd much rather find a service company, than buy software
in-house and do it ourselves.) Do you have any suggestions?

Thanks.

Anne Beitel
ON Technology Corporation
abeitel@on.com
(617) 692-3127 voice
(617) 374-1433 fax


Previous  4 Next   Top
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From (@watstat.uwaterloo.ca:stamant@lindy.cs.umass.edu) Sat Feb 17 13:32:12 1996
To: ai-stats@watstat.uwaterloo.ca
Subject: Journal of Statistical Software

=====================================================================
Journal of Statistical Software
=====================================================================

will start accepting submissions. JSS will be a fully electronic
peer-reviewed journal, freely available on the WWW. There is a large
(about 30 people) editorial board, and there is a not-so-large
technical (read HTML) staff (about 5 people at UCLA).

Submissions are basically manuals -- although the relative emphasis on
theory, applications, and usage instructions may vary. Software will
also be reviewed. It will be available by ftp from our servers.

We will continue to seek support from ASA, RSS, ISI and other
societies, and we will try to be listed in CIS and other indices. We
do accept advertisements to cover some of our costs.

For many other details such as motivation, discussion, guidelines for
authors, and so on, we refer you to

======================================================================
http://www.stat.ucla.edu/journals/jss/
======================================================================

Jan de Leeuw; UCLA Department of Statistics; UCLA Statistical Consulting
US mail: 8118 Math Sciences, 405 Hilgard Ave, Los Angeles, CA 90095-1554
phone (310)-825-9550; fax (310)-206-5658; email: deleeuw@stat.ucla.edu
www: http://www.stat.ucla.edu/~deleeuw


Previous  5 Next   Top
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 15 Feb 1996 14:54:56 -0800
From: ronnyk (Ronny Kohavi)
To: kdd@gte.com
Subject: MLC++ : Machine learning library in C++
Reply-to: ronnyk@engr.sgi.com


MLC++ is a machine learning library developed in C++.
MLC++ is public domain and can be used free of charge,
including use of the source code.

MLC++ contains common induction algorithms, such as ID3,
nearest-neighbors, naive-bayes, oneR (Holte), winnow, and decision
tables, all written under a single framework.

MLC++ also contains interfaces to common algorithms, such
as C4.5, PEBLS, IB1-4, OC1, CN2.

MLC++ contains wrappers to wrap around algorithms.
These include: feature selection, discretization filters,
automatic parameter setting for C4.5, bagging/combinining classifiers,
and more.

Finally, MLC++ contains common accuracy estimation methods, such as
holdout, cross-validation, and bootstrap .632.

Interfaces to existing algorithms are not hard to create and
implementing new algorithms in MLC++ is possible with added benefits
(some procedures work only on induction algorithms implemented in
MLC++ as opposed to interfaced ones).

Object code for MLC++ utilities is provided for Silicon Graphic
machines running Irix 5.3.

To contact us, send e-mail to: mlc@postofc.corp.sgi.com
Visit our web page at: http://www.sgi.com/Technology/mlc/

--

Ronny Kohavi (ronnyk@sgi.com, http://robotics.stanford.edu/~ronnyk


Previous  6 Next   Top
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 26 Feb 96 16:36:17 -0800
From: jpbrown (jpbrown@hal-pc.org)
Organization: Ultimate Resources
Subject: Submitting SuperInduction for KDD listing

Name: SuperInduction
URL: http://www.hal-pc.org/~jpbrown
Discovery Methods: Graphics, Machine Learning, Neural Net.
Platform: Windows
Contact: J.P.Brown Ultimate Resources, Inc. 13631 Queensbury, Houston,
TX, 77079 USA. E-mail: jpbrown@hal-pc.org
Fax & Phone (713)461-7734
Status: Provided as a service.
Source of information: URL above
Updated: 1996-02-20 by jpbrown
Category in listing: Multistrategy Tools
Summary:Unprejudiced SuperInduction analysis will often reach unexpected
conclusions, but its decision support shows what really needs to be done
to succeed.



Previous  7 Next   Top
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Return-Path: (gps0@ns.gte.com)
Date: Thu, 22 Feb 96 22:05:54 CST
From: grossman@matisse.eecs.uic.edu (Bob Grossman)
To: kdd@gte.com
Subject: Call for Position Papers for Data Mining Meeting
Reply-To: grossman@uic.edu
Cc: grossman@mail.eecs.uic.edu
Content-Type: text
Content-Length: 3652



Call for Position Papers & Participation

Workshop on Scientific Data Management, Mining,
Analysis and Assimilation

March 14-15, 1996

San Diego Supercomputer Center
La Jolla, California

For more details, see http://nscp.uic.edu/Conf/sdsc.html

Data Intensive Computing. As it becomes common for desktop machines to have
gigabytes of storage, for departmental machines to have terabytes of
storage, and for large organizations to have petabytes of storage,
algorithms and software for the management, mining, and analysis of data are
becoming critical. An integrated solution is needed that subsumes data
analysis within the data management infrastructure.

Management of large volumes of data is an important enabling technology for
both data analysis and data mining. Data analysis includes application of
statistical and clustering algorithms. Data mining is the extraction of
patterns, associations, and anomalies from data, and is emerging as an
important component of data analysis on scientific data sets. The need for
these capabilities is appearing in virtually all scientific disciplines,
including environmental science, bio-chemistry, astronomy, and high-energy
physics.

Problems in this emerging field usually involve numerically and
statistically intensive analyses or queries on large amounts of complex
data. The data is often heterogeneous and may be integrated from a variety
of sources. A unifying characteristic is the need to manipulate thousands of
data objects, whose aggregate size can exceed that of local disk caches.
Algorithms to support efficient management, retrieval, and analysis are
needed that are extendible from relatively small data sets which can be
processed in situ to the massive petabyte archives that are being
constructed of scientific data sets.

Workshop. The purpose of the workshop is to bring together experts from
these fields, as well as practitioners from application areas, with an
interest in sharing ideas and insights. The first day of the workshop will
be devoted to surveys and overviews, and the second day to break out groups
and the preparation of a summary report. Participation in the workshop
requires the submission of a short position paper on a topic or viewpoint
related to the themes of the workshop. The workshop is limited to 40
participants.

Topics of special interest include:

Data mining: pattern recognition, association algorithms, anomaly detection,
machine learning, classification and regression trees, knowledge discovery,
statistical learning, neural nets, Markov models, and related topics.

Data management: data warehousing architectures, architectures for scalable
data management systems, integration of databases and hierarchical storage
systems, scalable object stores, resource management for very large
databases and object bases, caching and migration within multi-level and
heterogeneous environments, integration of tape into data mining and data
management systems, integration of statistical and data management systems,
and related topics.

Data analysis: parallelizing statistical algorithms, statistical algorithms
on very large data sets, nearest neighbor algorithms, clustering algorithms,
and related topics.

More information can be found at: http://nscp.uic.edu or http://www.sdsc.edu
or by contacting one of the organizers:

Robert Grossman
Laboratory for Advanced Computing, M/C 249
University of Illinois at Chicago
851 South Morgan Street
Chicago, IL 60607
312 413 2176
312 996 1491 fax
grossman@uic.edu

Reagan Moore
San Diego Supercomputer Center
PO Box 85608
San Diego, CA 92186-9784
619 534-5073
619 534-5152 fax
moore@sdsc.edu


Previous  8 Next   Top
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Derek Sleeman (sleeman@csd.abdn.ac.uk)
Date: Thu, 29 Feb 1996 17:21:58 GMT
Subject: for KDD Nuggets

Call for Participation
SYNERGY BETWEEN
SCIENTIFIC KNOWLEDGE DISCOVERY and
KNOWLEDGE DISCOVERY IN DATABASES


Organizers:
Derek Sleeman Patricia Riddle
Computing Science Department MS 7L-66
The University Boeing Computer Services
ABERDEEN AB9 2UE P.O. Box 24346
Scotland UK Seattle, WA 98124-0346
Tele: +44 (0)1224 272288 Tele: +1 206-865-3415
FAX: +44 (0)1224 273422 FAX: +1 206-865-2964
email: sleeman@csd.abdn.ac.uk email: riddle@atc.boeing.com

PROGRAM COMMITTEE

Usama Fayyad (Microsoft, USA)
David Hand (Open University, UK)
Peter Karp (SRI, USA)
Willi Kloesgen (GMD, Germany)
Pat Langley (Stanford University, USA)
Patricia Riddle (Boeing Services, Seattle, USA)
Derek Sleeman (University of Aberdeen, UK)
Raul Valdes-Perez (CMU, USA)
Jan Zytkow (Wichita State University, USA)


Machine Learning methods have played a major role in both Scientific Discovery
and KDD (Knowledge Discovery in Databases) systems, but the subfields have
diverged in the last few years into separate communities. Our target audience
would be those working in scientific discovery and those working in knowledge
discovery in databases. The goal is to bring them together to share techniques
and approaches; to see where overlap exists and where synergy can be
exploited. Specifically, KDD might have advice to give the scientific
discovery community on dealing with large data-sets or visualization of
results. The scientific discovery community might have new techniques for
handling complex time-series data etc.

Workshop Structure:

Overview/Invited talks from both communities;
Talks from each community depending on quality of submissions;
Panel discussion - to stimulate Interaction between the communities.

Length: One day



TYPES of SUBMISSION WE HOPE to RECEIVE:

Below we raise questions we'd like to see addressed:

Are there fundamental differences between the fields (Such as does Scientific
Discovery usually work with historical data-sets and KDD with contemporary
ones?)

Are the data engineering or result summarization and display
techniques different between Scientific Discovery and KDD?

Are there differences between the type of data usually handled in
either field (numeric versus symbolic, more time sequences)?

Are there differences between the amount of evaluation either field normally
expects?


Below we give some guidance of the types of papers we'd like to receive:

- papers on KDD techniques which are thought to be of particular relevance to
the Scientific Discovery community;

- papers on Scientific Discovery techniques which are thought to be of
particular relevance to the KDD community;

- analyses of the differences between the techniques used in the two
subfields;

- Application of KDD techniques to Scientific/Technical tasks.




SUBMISSION OF WORKSHOP EXTENDED ABSTRACTS/PAPERS

Authors are invited to submit an extended abstract or a full paper by either
email or airmail. Papers should be written in English in single column format
and should be limited to no more than eight, (8) sides of A4 paper including
figures and references.

The first page should include title, author name(s), affiliation(s), and
mailing and email address(es), together with an abstract (15-20 lines), and
text.

The following information should be included in an accompanying
letter/message: Full title of paper, presenting author's name, address, and
telephone and fax numbers, authors e-mail address.


TIMETABLE

Feb 26: calls out on Web
Apr 23: deadline for submissions
May 14: decisions about acceptance
Jun 4: final papers to organizers
Jun 11: camera-ready copy of workshop documents to Bari

For paper submissions and inquiries please contact:

Patricia Riddle
MS 7L-66
Boeing Computer Services
P.O. Box 24346
Seattle, WA 98124-0346
Tele: +1 206-865-3415
FAX: +1 206-865-2964
email: riddle@atc.boeing.com


Please address general inquires about the ICML96 to the Local Chair:

Floriana Esposito, University of Bari esposito@vm.csata.it
Dipartimento di Informatica Phone: (+39) 80 - 5443.264
Via Orabona 4, Fax: (+39) 80 - 5443.196
70125 Bari (ITALY)



ICML96 has its own page on the World-Wide Web in the URL at:
http://www.di.unito.it/pub/WWW/ICML96/home.html


For upto date information on this workshop see:
http://www.csd.abdn.ac.uk/~sleeman/cfp-iml96ws.html


Previous  9 Next   Top
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Date: Mon, 19 Feb 1996 22:12:15 -0500
From: IMLM Workshop (pkc) (imlm@tuck.cs.fit.edu)
Subject: 2nd CFP: AAAI-96 Workshop on Integrating Multiple Learned Models

*********************************************************************
Paper submission deadline: March 18, 1996
*********************************************************************

CALL FOR PAPERS/PARTICIPATION


INTEGRATING MULTIPLE LEARNED MODELS
FOR IMPROVING AND SCALING MACHINE LEARNING ALGORITHMS

to be held in conjunction with AAAI 1996
(collocated with KDD-96, UAI-96, and IAAI-96)
Portland, Oregon
August 1996


Most modern machine learning research uses a single model or learning
algorithm at a time, or at most selects one model from a set of
candidate models. Recently however, there has been considerable
interest in techniques that integrate the collective predictions of a
set of models in some principled fashion. With such techniques often
the predictive accuracy and/or the training efficiency of the overall
system can be improved, since one can 'mix and match' among the
relative strengths of the models being combined.

The goal of this workshop is to gather researchers actively working in
the area of integrating multiple learned models, to exchange ideas and
foster collaborations and new research directions. In particular, we
seek to bring together researchers interested in this topic from the
fields of Machine Learning, Knowledge Discovery in Databases, and
Statistics.

Any aspect of integrating multiple models is appropriate for the
workshop. However we intend the focus of the workshop to be improving
prediction accuracies, and improving training performance in the
context of large training databases.

More precisely, submissions are sought in, but not limited to, the
following topics:

1) Techniques that generate and/or integrate multiple learned
models. In particular, techniques that do so by:

* using different training data distributions
(in particular by training over different partitions
of the data)
* using different output classification schemes
(for example using output codes)
* using different hyperparameters or training heuristics
(primarily as a tool for generating multiple models)

2) Systems and architectures to implement such strategies.
In particular:

* parallel and distributed multiple learning systems
* multi-agent learning over inherently distributed data

A paper need not be submitted to participate in the workshop, but
space may be limited so contact the organizers as early as possible if
you wish to participate.

The workshop format is planned to encompass a full day of half hour
presentations with discussion periods, ending with a brief period for
summary and discussion of future activities. Notes or proceedings for
the workshop may be provided, depending on the submissions received.


Submission requirements:

i) A short paper of not more than 2000 words detailing recent research
results must be received by March 18, 1996.

ii) The paper should include an abstract of not more than 150 words,
and a list of keywords. Please include the name(s), email
address(es), address(es), and phone number(s) of the author(s) on the
first page. The first author will be the primary contact unless
otherwise stated.

iii) Electronic submissions in postscript or ASCII via email are
preferred. Three printed copies (preferrably double-sided) of your
submission are also accepted.

iv) Please also send the title, name(s) and email address(es) of the
author(s), abstract, and keywords in ASCII via email.



Submission address:

imlm@cs.fit.edu

Philip Chan
IMLM Workshop
Computer Science
Florida Institute of Technology
150 W. University Blvd.
Melbourne, FL 32901-6988
407-768-8000 x7280 (x8062)
407-984-8461 (fax)


Important Dates:

Paper submission deadline: March 18, 1996
Notification of acceptance: April 15, 1996
Final copy: May 13, 1996


Chairs:

Salvatore Stolfo, Columbia University sal@cs.columbia.edu
David Wolpert, Santa Fe Institute dhw@santafe.edu
Philip Chan, Florida Institute of Technology pkc@cs.fit.edu


General Inquiries:

Please address general inquiries to one of the chairs or send them to:

imlm@cs.fit.edu

Up-to-date workshop information is maintained on WWW at:

http://www.cs.fit.edu/~imlm/ or
http://cs.fit.edu/~imlm/


Previous  10 Next   Top
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 21 Feb 96 03:00:10 EST
From: iimam@aic.gmu.edu (Ibrahim Fahmi Imam)
Subject: CFP: AAAI-96 WS on INTELLIGENT ADAPTIVE AGENTS
The WEB address is:
http://www.mli.gmu.edu/~iimam/aaai96.html

C A L L F O R P A P E R S

AA AA AA IIII IIII AA AA
AAAA AAAA AAAA II II AAAA AAAA
AA AA AA AA AA AA II __ II AA AA AA AA
AAAAAAAA AAAAAAAA AAAAAAAA II __ II AAAAAAAA AAAAAAAA
AA AA AA AA AA AA II II AA AA AA AA
AA AA AA AA AA AA IIII IIII AA AA AA AA

AAAI-96 International Workshop on
Intelligent Adaptive Agents (IAA-96)

August 4-8, 1996, Portland, Oregon

In recent years, researchers from different fields have pushed toward
greater flexibility and intelligent adaptation in their systems. The
development of intelligent adaptive agents has been rapidly evolving
in many fields of science. Such systems should have the capability of
dynamically adapting their parameters, improve their knowledge-base or
method of operation in order to accomplish a set of tasks. This workshop
will focus on intelligent adaptation and its relationship to other
fields of interest.

Research issues that are of interest to the workshop include but are
not limited to:
1) Analyzing the role of adaptation in planning, execution monitoring, and
problem-solving;
2) Adaptive control in real-world engineering systems;
3) Analyzing the computational cost of adaptation vs. system robustness;
4) Controlling the adaptive process (what is the strategy? what is
needed?, what is expected?, etc.);
5) Adaptive mechanisms in an open agent society;
6) Adaptation in distributed systems;


The workshop seeks high quality submission in these areas. Researchers
interested in submitting papers should explain the adaptive process in
light of one or more of the issues presented above. Papers with
real-world applications are strongly encouraged.


Please send any questions to: Ibrahim F. Imam at
iimam@aic.gmu.edu

Program Committee Members

Jaime Carbonell, Carnegie Mellon University, USA
Gerald DeJong, University of Illinois at Urbana-Champaign, USA
Tim Finin, University of Maryland Baltimore County, USA
Brian Gaines, University of Calgary, Canada
Diana Gordon, Naval Research Laboratory, USA
Yves Kodratoff, Universite de Paris Sud, France
Ryszard Michalski, George Mason University, USA
Ashwin Ram, Georgia Institute of Technology, USA
Nigel Shadbolt, University of Nottingham, England
Reid Simmons, Carnegie Mellon University, USA
Walter Van de Velde, Vrije Universiteit Brussel, Belgium
Brad Whitehall, United Technologies Research Center, USA
Stefan Wrobel, GMD, Germany


Submission Information

Paper submissions should not exceed eight single-spaced pages, with
1 inch margins, 12pt font. The first page must show the title, authors'
names, full surface mail addresses, fax number (if possible), email
addresses, short abstract (does not exceed 200 words), and a list of
key words (up to 5).

Electronic submissions are strongly encouraged and should be sent to
iimam@aic.gmu.edu
Otherwise, contact the workshop chair at (iimam@aic.gmu.edu) for mailing
arrangements. An extended version of the CFP can be found in:

http://www.mli.gmu.edu/~iimam/aaai96.html


Important Dates

Submission Deadline: March 18, 1996
Notification Date: April 15, 1996
Camera-Ready Due: May 13, 1996
Workshop: August 4, 1996



Previous  11 Next   Top
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Tue, 20 Feb 1996 17:18:01 +1100
From: Jonathan Oliver (jono@cs.monash.edu.au)
To: kdd@gte.com
Subject: Final CFP for ISIS: Information, Statistics and Induction in Science

*** FINAL CALL FOR PAPERS ***

ISIS: Information, Statistics and Induction in Science
Melbourne, Australia, 20-23 August 1996


Conference Chair: David Dowe
Co-chairs: Kevin Korb and Jonathan Oliver

Invited Speakers:

Henry Kyburg, Jr. (University of Rochester, NY)
Marvin Minsky (MIT)
J. Ross Quinlan (Sydney University)
Jorma J. Rissanen (IBM Almaden Research, San Jose, California)
Ray Solomonoff (Oxbridge Research, Mass)

This conference will explore the use of computational modelling to
understand and emulate inductive processes in science. The problems
involved in building and using such computer models reflect
methodological and foundational concerns common to a variety of
academic disciplines, especially statistics, artificial intelligence
(AI) and the philosophy of science. This conference aims to bring
together researchers from these and related fields to present new
computational techniques for supporting or analysing scientific
inference and to engage in collegial debate over the merits and
difficulties underlying the various approaches to automating inductive
and statistical inference.

PROGRAM COMMITTEE

Hirotugu Akaike, Lloyd Allison, Shun-ichi Amari, Mark Bedau,
Jim Bezdek, Hamparsum Bozdogan, Wray Buntine, Peter Cheeseman,
Honghua Dai, David Dowe, Usama Fayyad, Doug Fisher,
Alex Gammerman, Clark Glymour, Randy Goebel, Josef Gruska,
David Hand, Bill Harper, David Heckerman, Colin Howson,
Lawrence Hunter, Frank Jackson, Max King, Kevin Korb,
Henry Kyburg, Rick Lathrop, Ming Li, Nozomu Matsubara,
Aleksandar Milosavljevic, Richard Neapolitan, Jon Oliver,
Michael Pazzani, J. Ross Quinlan, Glenn Shafer, Peter Slezak,
Ray Solomonoff, Paul Thagard, Neil Thomason, Raul Valdes-Perez,
Tim van Gelder, Paul Vitanyi, Chris Wallace, Geoff Webb,
Xindong Wu, Jan Zytkow.

Inquiries to:
isis96@cs.monash.edu.au
David Dowe: dld@cs.monash.edu.au
Kevin Korb: korb@cs.monash.edu.au or
Jonathan Oliver: jono@cs.monash.edu.au

Information is available on the WWW at:
http://www.cs.monash.edu.au/~jono/ISIS/ISIS.shtml

AREAS OF INTEREST

The following streams/subject areas are of particular interest to the
organisers:

Concept Formation and Classification.
Minimum Encoding Length Inference Methods.
Scientific Discovery.
Theory Revision.
Bayesian Methodology.
Foundations of Statistics.
Foundations of Social Science.
Foundations of AI.

CALL FOR PAPERS

Prospective authors should mail five copies of their papers to Dr. David
Dowe, ISIS chair. Alternatively, authors may submit by email to

isis96@cs.monash.edu.au.

Submitted papers should be in double-column format in 10 point font and not
exceeding 10 pages, and we encourage authors to have less than 800 words
per page. An additional page should display the title, author(s) and
affiliation(s), abstract, keywords and identification of which of the eight
areas of interest are most relevant to the paper.

(see http://www.cs.monash.edu.au/~jono/ISIS/ISIS.Area.Interest.html

Refereeing will be blind; that is, this additional page will not be passed
along to referees. Accepted papers will be published in the conference
proceedings, so long as at least one author is in attendance.

Authors are encouraged to use the ISIS LaTex style guide available at:

http://www.cs.monash.edu.au/~jono/ISIS/LatexGuide.

Papers should be sent to:

Dr David Dowe
ISIS chair
Department of Computer Science
Monash University
Clayton Victoria 3168
Australia
Phone: +61-3-9 905 5226
FAX: +61-3-9 905 5146

Email submissions must be in LaTex (using the ISIS LaTex style guide).
Authors should email both the postscript file, and the LaTex document to
isis96@cs.monash.edu.au.

DEADLINES

Submission (receipt) deadline: 11 March, 1996
Notification of acceptance: 10 June, 1996
Camera-ready copy (receipt) deadline: 15 July, 1996

CONFERENCE VENUE

ISIS will be held at the Old Melbourne Hotel, 5-17 Flemington Rd.,
North Melbourne.

The Old Melbourne Hotel is within easy walking distance of downtown Melbourne,
Melbourne University, many restaurants (on Lygon Street) and the Melbourne Zoo.
It is about fifteen to twenty minutes drive from the airport.

REGISTRATION

A registration form will be available at the WWW site
http://www.cs.monash.edu.au/~jono/ISIS/ISIS.shtml,
or by mail from the conference chair. Dates for registration will be
considered to be met assuming that legible postmarks are on or before the
dates and airmail is used. Student registrations will be available at a
discount (but prices have not yet been fixed). Relevant dates are:

Early registration (at a discount): 3 June, 1996
Final registration: 1 July, 1996




Previous  12 Next   Top
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~