--
Data Mining and Knowledge Discovery community, focusing on the
latest research and applications.
Submissions are most welcome and should be emailed, with a
DESCRIPTIVE subject line (and a URL) to gps.
Please keep CFP and meetings announcements short and provide
a URL for details. Submissions may be edited for space.
KD Nuggets frequency is 3-4 times a month.
Back issues of KD Nuggets, a catalog of data mining tools
('Siftware'), pointers to Data Mining Companies, relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site
at
********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'I can do 12 months work in 9 months, but not in 12 months'
Brij Masand
(commenting on the need to take a longer vacation).
[Note: I will be taking a much shorter vacation, July 3-20, and
will not be checking email or sending KD Nuggets at this time. GPS]
Previous1NextTop
Date: Sun, 15 Jun 97 13:47:18 PDT
From: othar@Thinkbank.COM
(Othar Hansson)
Subject: Great Moments in Data Mining (#1 in a series)
Amazon.com now has a clustering feature. On many of their book-blurb
pages, they will point you to three books commonly purchased by readers
who purchased that book. Here's one such cluster that made me laugh:
> Only the Paranoid Survive : How to Exploit
> the Crisis Points That Challenge Every
> Company and Career
> by Andrew S. Grove
> ...
> Check out these titles! Readers who bought Only the Paranoid Survive
> also bought:
>
> The Dilbert Principle : A Cubicle's-Eye View of Bosses, Meetings,
> Management Fads & Other Workplace Afflictions; Scott Adams
> The Road Ahead; Bill Gates, et al
> Dogbert's Top Secret Management Handbook; Scott Adams
So there you have it: Dogbert, Dilbert, Bill and Andy Grove. $1 prize
to the best name for that cluster. 'Well-known business cartoon
characters'?
Thinkbank, Inc. [voice] +1 510.558.8800
1678 Shattuck Avenue, Suite 320 [ fax ] +1 510.558.8700
Berkeley, CA 94709-1631 othar@Thinkbank.COM
[P.S. I checked the Amazon.com on June 30, and the clustering has changed
-- alas, the sparkling nugget of discovery may wash away in the torrent of
change ... GPS] Previous2NextTop
From: Ron Webb (ronw@apqc.org)
Subject: Data Mining Benchmarking Study By The APQC (www.apqc.org)
Date: Fri, 20 Jun 1997 17:44:28 -0500
KD Nuggets subscribers,
The American Productivity & Quality Center (APQC) is conducting a benchmarking study on transforming customer data into information. It will include a large component related to data mining. It will deal with learning how best practice organizations manage customer data and information. For a copy of the proposal for the study with the full scope delineated, go to www.apqc.org where you can view or download it.
In a nutshell the scope is:
Organizational enablers - what do organizations which manage customer data and information well have in place organizationally (culture, politics, 'soft' stuff, etc.) to enable a best practice status.
Technological enablers - this is the biggest piece of this study. How do best practice organizations
*gather and store customer data (warehousing)
*transform this data into information (mining)
*get this information to the correct person within the organization.
Leveraging customer information - how do best practice organizations leverage all these practices to impact the bottom line, customer retention, marketing efforts, etc.
The kick-off of the study is August 22, 1997 and it will end on December 16, 1997. Let me know if I can help you get more information.
Ron Webb
Project Manager
ronw@apqc.org
713-685-4634
Previous3NextTop
Date: Thu, 19 Jun 1997 16:11:32 -0400
From: Mike Blundin (mblundin@cirrusrec.com)
Subject: Datasage (Cirrus Recognition) Press Release
BW1044 JUN 18,1997 4:58 PACIFIC 07:58 EASTERN
( BW)(DATASAGE)
Datasage raises $2.8 million in equity financing
from OneLiberty Ventures and Sigma Partners
Business/Technology Editors
BOSTON--(BUSINESS WIRE)--June 18, 1997-- Datasage, Inc., the
leader in production data mining solutions (formerly Cirrus
Recognition Systems) announced today that it has completed $2.8
million in equity financing to expand its efforts in marketing and
sales of its flagship product, Datasage(TM).
'We are very pleased to have OneLiberty Ventures and Sigma
Partners aboard,' said Datasage President and CEO, David Blundin.
'Top tier venture financing will allow us to rapidly build our sales
and marketing so we can expand our vision for production data mining
in the marketplace.'
'Data Mining technology is rewriting the book on how data is
valued and leveraged at the corporate level,' said John Mandile of
Sigma Partners.
'We are very excited about the opportunity to invest in Datasage
and their production data mining technology. The company has the
potential to be the leading vendor in this explosive new market,'
added Duncan McCallum of OneLiberty Ventures.
Datasage provides critical software architecture and data mining
tools that allow corporations to deploy data mining technology
against production data sources. So-called desktop or micro-mining
data mining tools enable analysts to discover new information in
subsets of corporate data. Datasage allows analysts and IT
departments to take those data mining models and seamlessly deploy
them against live production data sources. This allows corporations
to move beyond short term gains and realize the full strategic
business benefit of data mining technology. According to Blundin,
'Production data mining is particularly valuable for data-intensive
companies with many transactions or customers, such as large
retailers and grocers.'
'Finding patterns is only the beginning,' said John Lunny, Vice
President of Engineering. 'An analyst can often build excellent
models for customer and product behavior with only a few thousand
examples using a desktop data mining tool and a PC. But when he or
she goes back to rank a database of perhaps 25 million transactions,
they hit the data mining gap. Tools don't scale, data connections
are inadequate and the increase in computation makes it a major IS
project. Datasage(TM) fills that gap.'
'The greatest value inherent in a corporate data warehouse is
realized by a production data mining solution,' said Blundin.
'Unlike desktop data mining tools which may yield information during
a one-off ad hoc analysis, production data mining is a repeatable
process that delivers continuous value to an organization.'
According to the market research firm Meta Group, headquartered
in Stamford, Connecticut, the data mining segment of the decision
support market will grow from $120 million in 1996 to more than $800
million by 1998, and $4 billion by 2000, a compound annual growth
rate in excess of 250 percent.
$2.8 Million in Equity Financing
OneLiberty Ventures and Sigma Partners co-managed the investment
in Datasage.
OneLiberty Ventures is a Boston-based, privately held venture
capital firm that focuses on start-up and early stage technology
investments. Formed in 1982 as Morgan, Holland Ventures, OneLiberty
has established three funds totaling more than $150 million in
committed capital. Recent investments include Brooks Fiber
Properties, Cerulean Technology, Cytyc, Corex Technologies,
Extraprise Group, Indus River Networks, Linguistic Technology,
Riverton Software, Satara Networks, and Vista Medical Systems.
Sigma Partners, with offices in Menlo Park, CA. and Boston, MA.,
is a privately held venture capital partnership organized in 1984,
with $185 million under management in three funds. Sigma's
investments include Cascade Communications, Cerulean Technology,
Chipcom, Electronic Arts, FileNet, Global Village Communications
and Wellfleet Communications.
John Mandile of Sigma Partners and Duncan McCallum of OneLiberty
Ventures will be joining the Datasage board of directors. John has
15 years experience in the high technology industry, most recently as
the president and CEO of Vermeer Technologies, Inc., the developers
of FrontPage, the leading Web authoring tool. Previously he was an
early principal at SQL Solutions, and following their acquisition by
Sybase, took responsibility for the new Systems Management Unit which
he grew to $55 million in 30 months. Currently he is a director of
FutureTense, Inc., Novera Software, Inc. and OnDisplay.
Before joining OneLiberty Ventures, Duncan was at Haemonetics
where his roles included Assistant to the President, Director of
Blood Bank Marketing, and Business Development Manager. Previously
he was a management consultant at the Boston Consulting Group and a
Senior Member of the Technical Staff and Program Manager at Draper
Laboratory. He holds BS and MS degrees from MIT and an MBA from
Harvard Business School. His current investments include Extraprise
Group and Cerulean Technology.
About Datasage, Inc.
Datasage, Inc., headquartered near Boston, MA, provides
comprehensive data mining software solutions that enable corporations
to turn raw data into business opportunity. Today corporate
databases and data warehouses are growing dramatically, to the point
where the wealth of customer and transaction data far outstrips
capability to effectively use it. Datasage data mining software
allows corporations to put data mining technology into full scale
production so they can quickly put their data to work and realize
payback on their growing data assets.
Datasage(TM) is the first software solution to meet the rigorous
demands of corporate data mining in a production environment. It
delivers the performance, scalability, reliability, integration and
architecture demanded by corporate IS departments for their critical
systems. Datasage(TM) is based on the innovative Database-Centric
Architecture(TM) which maintains robust, high speed data throughput
between data sources and data mining technology. In addition to
direct connectivity to industry standard RDBMS's such as Oracle,
Informix, DB2 and SQL Server, the architecture offers a
comprehensive set of open APIs that allow integration of new data
sources, incorporation of existing business logic, and best-of-breed
data mining algorithm selection. Datasage also includes advanced
data mining algorithms (neural networks, rule induction, genetic
algorithms, etc.) that are coupled with the architecture to allow the
algorithms to deliver peak performance. The high throughput enables
corporations to move from summary level data analysis to atomic level
data mining - mining the lowest level of transaction detail for
extremely accurate forecasting, customer scoring and anomaly
detection.
--30--mb/bos ls/bos
CONTACT: Michael Blundin, Datasage, Inc.
(617) 942-3600
mblundin@datasage.com
KEYWORD: MASSACHUSETTS
INDUSTRY KEYWORD: COMED COMPUTERS/ELECTRONICS
REPEATS: New York 212-752-9600 or 800-221-2462; Boston 617-236-4266 or
800-225-2030; SF 415-986-4422 or 800-227-0845; LA 310-820-9473
Today's News On The Net - Business Wire's full file on the Internet
with Hyperlinks to your home page.
URL:
Previous4NextTop
Date: Sat, 14 Jun 97 14:29:23 CDT
From: 'Gerry McKiernan' (JL.GJM@ISUMVS.IASTATE.EDU)
Subject: _Data Mining and Knowledge Discovery in MARC Databases
_Data Mining and Knowledge Discovery in MARC Databases_
I am in the process of preparing a review article on
the application of data mining and knowledge discovery in
databases (KDD) to MARC record databases. These techniques
are efforts to identify 'hidden' information within large
data sets. It is my belief that there exists important, yet
overlooked, relationships within MARC records created through
the descriptive and subject cataloging process that have not
been as fully exploited as they might. A good example would
be to identify significant works on a subject based upon
associations within records of a given publisher, author(s)
and subject heading and call number.
I am particularly interested in the application of Data
Mining and KDD as potential enhancement to future online
public access systems (e.g OPACs).
For a description of an associated project, folk are invited
to review my 4T9R(sm) URL. It contains not only a project
description but links to an excellent review article from
DBMS magazine and to the outstanding _KDNuggets Data Mining and
Knowledge Discovery Resource center at its new URL. The URL for
4T9R9(sm) is
P.S. MARC is a bibliographic format standard that has been in use for
over a generation. It offers a means by which bibliographic entitys (e.g.,
books) can be consistently described and the associated data elements used
in creating a variety of value-added databases (e.g the local library
onlin ecatalog (OPAC).
For examples of MARC records you may wish to search the Iowa
State University OPAC that is directly accessible at:
Previous5NextTop
Date: Thu, 26 Jun 97 13:51:32 EDT
From: Sal Stolfo (sal@cs.columbia.edu)
Subject: WORKSHOP report: Towards the Digital Government of the 21st Century
Dear Colleague:
I would like to bring to your attention a recent report I've had
the pleasure of coauthoring with Herb Schorr of USC/ISI.
to see the homepage for the Workshop on R&D Opportunities in Federal
Information Services.
The homepage now has a link to the issued Workshop Report (in multiple
formats). A Press Release has also been issued.
We have submitted this report to several key Federal Government agencies,
and recently we presented the report to the Presidential Advisory Council
on HPCC/IT/NGI.
The report recommends that the Government fund a major new Applied Research
Program to develop pilot projects with Federal agencies to invent
the Digital Government for the citizens of the 21st Century.
A number of applied research opportunities are involved including
data mining over the huge collection of publicly available
federally-held data.
The effort has the support and encouragement from a broad range of
Federal agencies, as well as the executive branch of government and
we are now seeking broad support and involvement from the research community.
If you have any interest in this activity, set a bookmark and please
browse the web site routinely for further announcements about this effort.
Please also forward this message to any other person that you
believe may have an interest in this exciting opportunity.
This research community and others will be informed of the outcome of
this effort when it becomes known.
best regards
sal stolfo
Previous6NextTop
Date: Thu, 19 Jun 1997 22:18:35 -0700
From: Ronny Kohavi (ronnyk@starry.engr.sgi.com)
Subject: MineSet Quicktime movies available
Silicon Graphics' MineSet is well known for its data visualization
capabilities: both for direct visualization and visualization
of the models built by the analytical engines using MLC++.
Two Crows corporation in their book Data Mining: Products,
Applications & Technologies (1997) wrote
We really liked MineSet. The visualization tools, particularly,
are without parallel.
In Data Management Strategies, Premier issue (1997), Curt Hall wrote
MineSet's data visualization provides the best means I've seen to
view and analyze generated rules, decision trees, and other models.
Most impressive is its 'fly-through navigational' paradigm that
displays decision trees in a well-organized 3D landscape format, so
you don't end up overwhelmed with a screen full of rules or decision
trees when you analyze large data sets.
We now provide voice annotated quicktime movies so you can see examples
of MineSet visualizations on any platform that has a quicktime movie player:
under more information, or contact us at mineset@postofc.corp.sgi.com
--
Ronny Kohavi
Engineering Manager, Analytical Data Mining.
Previous7NextTop
From: Abraham Meidan (Abraham@wizsoft.com)
Subject: WizRule 3
Organization: WizSoft Inc.
WizSoft Inc. has released WizRule ver 3. WizRule is a data auditing
applications that reveals cases to be audited in the data. WizRule reads
the data, automatically reveals the rules that govern the data, and points
at the deviations from the set of all the discovered rules as suspected
errors.
WizRule contains 4 algorithms:
(1) An algorithm that reveals ALL if-then rules with no limit as to the
number of conditions. (This algorithm is similar to IBM association rules
algorithm, the input and the output are the same, but WizRule's algorithm
is faster).
(2) An algorithm that reveals mathematical formula rules, such as: Field A
= Field B - Field C * Field D.
(3) An algorithm that calculates the Level Of Unlikelihood of each case
that deviates from the discovered rules.
(4) An algorithm that reveals rules in the spelling of names, and points at
strings that deviates from these rules.
In its previous version WizRule pointed at each deviation from each rule as
a suspected error. This method resulted in many cases of false alarms, i.e.
deviations from rules that were not indeed errors. This problem has been
solved in the new version by calculating the level of unlikelihood of each
deviation. The level of unlikelihood signifies how unlikely a certain value
of a certain field is, in regard to the set of ALL the discovered rules and
the frequencies of the values. The higher the level of unlikelihood, the
higher the probability that the case is indeed an error.
A working demo, limited to files having up to 1,000 records, can be
downloaded from
Previous8NextTop
Date: Mon, 23 Jun 1997 09:55:55 +1000
From: Glenn Stone (Glenn.Stone@dms.csiro.au)
Subject: PostDoc, Sydney, Australia
Postdoctoral Fellowship
CSIRO Mathematical & Information Sciences
North Ryde NSW Australia
Postdoctoral Fellowship - Term 3 years
$AUS 41,000 - 47,000 + superannuation
We wish to appoint a Post-Doctoral fellow to join a research team
working on large and complex datasets. Your PhD in statistics,
computer science, or related discipline or equivalent must have been
awarded with the last three years.
The team consists of Statisticians and Computer Scientists with
interest in techniques for handling and cleaning large datasets,
methods for modeling large datasets, wavelet methods for feature
extraction, statistical visualisation and modeling multiple time
series. The team is working on datasets coming from areas as diverse
as motor vehicle insurance, finance, marketing and astronomy.
The project would suit an applicant with experience analysing
real-world datasets. You will need excellent computing skills in C or
C++, or a statistical package such as S-Plus or SAS. Ability to work
in a team and demonstrated ability to meet deadlines.
The position is for a term of three (3) years. Further information
about the position may be obtained from
Dr Glenn Stone, tel +61 2 9325 3216 email: glenn.stone@cmis.csiro.au
The job description and selection criteria may be obtained from
Lucinda Wells, tel +61 2 9325 3277 email: lucinda.wells@cmis.csiro.au
Applications for the position should address the selection criteria,
be marked 'Confidential' quoting reference number MS 97/1, and be
sent to: The Human Resources Manager, CSIRO, Division of Mathematical and
Information Sciences, Locked Bag 17, North Ryde NSW 2113 by
25th July, 1997.
--
Glenn Stone
Statistician, CSIRO
Locked Bag 17, North Ryde, NSW 2113
Phone:+61 2 9325 3216, Fax:+61 2 9325 3200
Glenn.Stone@cmis.csiro.au
Previous9NextTop
Date: Fri, 27 Jun 1997 18:08:39 +0001
From: NADA LAVRAC (Nada.Lavrac@ijs.si)
Subject: ILP Week in Prague, 15-20 Sept. 1997
Please find at the URL below the information on the ILP Week in Prague,
Czech Republic:
* General information on the ILP Week in Prague, 15_20 September 1997
* International Summer School on ILP and KDD, 15-17 September 1997
* 7th International Workshop on ILP, ILP-97, 17-20 September 1997
* CompulogNet Area Meeeting on representation issues in reasoning
and learning, 20 September 1997
* Registration form and payment information
============================================================================
ILP Week in Prague
15-20 September 1997
----------------------------------------------------------------------------
Since the very start of machine learning research, logic has been very popular
as a representation language for inductive concept learning and the
possibilities for learning in a first order representation have been
investigated. Recently, this research has concentrated in the lively research
field of Inductive Logic Programming (ILP), which studies inductive machine
learning within the framework of logic programming.
The ILP Week in Prague will consist of the following three events:
15-17 September 1997 - The International Summer School on Inductive
Logic Programming and Knowledge Discovery in
Databases (ILP and KDD)
17-20 September 1997 - The Seventh International Workshop on Inductive
Logic Programming (ILP-97)
20 September 1997 - CompulogNet Meeting of the Area 'Computational
Logic and Machine Learning' (CL and ML)
Schedule:
---------
Monday, 15 September - ILP and KDD Summer School
Tuesday, 16 September - ILP and KDD Summer School
Wednesday, 17 September - ILP and KDD Summer School (morning)
- ILP-97 Workshop (afternoon)
- Welcome party (evening)
Thursday, 18 September - ILP-97 Workshop
- Guided-tours (afternoon - optional)
Friday, 19 September - ILP-97 Workshop
- Farewell dinner (evening)
Saturday, 20 September - ILP-97 Workshop (morning)
- CL and ML Area Meeting (afternoon)
Program Organization:
---------------------
Nada Lavrac and Saso Dzeroski, J. Stefan Institute, Ljubljana, Slovenia
Email: Saso.Dzeroski@ijs.si,
Nada.Lavrac@ijs.si