KDD Nuggets 95:18, e-mailed 95-08-04
Contents:
* GPS, forthcoming BYTE story on Data Mining (Oct 95 ?)
* GPS, KD Mine usage Stats (accessed from 42 countries ...)
* GPS, ComputerWorld 95/07/10 on Data Mining at Bank of America
* U. Fayyad, KDD tutorial home page
http://www-aig.jpl.nasa.gov/kdd95/tutorials/IJCAI95-tutorial.html
* W. Buntine, Web sources on Bayesian/Probabilistic networks
* A. Gupta, IBM Almaden Data Mining site
http://www-i.almaden.ibm.com/cs/quest/index.html
The KDD Nuggets is a moderated mailing list for news and information
relevant to Data Mining and Knowledge Discovery in Databases (KDD).
Please include a DESCRIPTIVE subject line in your submission.
Nuggets frequency is approximately bi-weekly.
Back issues of Nuggets, a catalog of S*i*ftware (data mining tools),
references, FAQ, and other KDD-related information are available
at Knowledge Discovery Mine, URL http://info.gte.com/~kdd
I have been interviewed for the story and am waiting with trepidation
for its appearance.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Return-Path:
Date: Fri, 28 Jul 1995 09:50:40 -0400
From: gps@gte.com (Gregory Piatetsky-Shapiro)
Subject: KD Mine usage stats for June 1995
I have compiled (for information purposes only)
usage stats for KD Mine site (http://info.gte.com/~kdd)
for the period of Jun 5 1995 to Jun 30 1995.
There were a total of 12,443 files and 91,855,933 bytes transmitted.
There were requests from over 40 countries. Top domains (sorted by
decreasing # of requests)
3220 | com US Commercial
1778 | edu US Educational
1712 | unresolved
748 | uk United Kingdom
585 | jp Japan
484 | au Australia
429 | net Network
400 | de Germany
355 | gov US Government
343 | nl Netherlands
308 | ca Canada
277 | fr France
170 | it Italy
143 | fi Finland
128 | ch Switzerland
120 | org Non-Profit Organization
104 | es Spain
100 | sg Singapore
98 | il Israel
97 | no Norway
68 | mil US Military
68 | pl Poland
62 | za South Africa
61 | at Austria
61 | se Sweden
60 | be Belgium
55 | kr Korea (South)
50 | ie Ireland
43 | nz New Zealand (Aotearoa)
33 | my Malaysia
31 | th Thailand
27 | hk Hong Kong
25 | us United States
22 | gr Greece
22 | cz Czech Republic
22 | tw Taiwan
20 | dk Denmark
18 | pt Portugal
16 | arpa Old style Arpanet
12 | gb Great Britain (UK)
13 | br Brazil
10 | su USSR (former)
10 | pe Peru
5 | sk Slovak Republic
4 | si Slovenia
3 | cl Chile
2 | ar Argentina
2 | id Indonesia
1 | ph Philippines
1 | hu Hungary
Top pages accessed:
1305 | /~kdd/ top-level
958 | /~kdd/siftware.html
295 | /~kdd/what-is-new.html
268 | /~kdd/other-servers.html
193 | /~kdd/reference.html
147 | /~kdd/FAQ.txt
142 | /~kdd/kdd-publications.html
132 | /~kdd/kdd-93-report.tex
129 | /~kdd/ai4kdd.html
110 | /~kdd/nuggets/95/
98 | /~kdd/kdd95.html
43 | /~kdd/kdd-at-gte.html
40 | /~kdd/homepages.html
34 | /~kdd/nuggets/94/
27 | /~kdd/kdd-terms.html
18 | /~kdd/nuggets/93/
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 28 Jul 1995 12:39:06 -0400
From: gps0 (Gregory Piatetsky-Shapiro)
Subject: ComputerWorld 95/07/10 on Data Mining at Bank of America
ComputerWorld July 10, 1995 page 1 story on
"Data Mining unearths customers" describes how Bank of America
uses its own data warehouse to analyze its customer data, with
queries like "How many of Silicon Valley residents in a particular
sales district own Acura Legends and also golf club memberships."
In another example, the bank has started recently to mine for Hispanic
customers who are potential first-time home buyers.
The BoA Data Warehouse allows many interactive ways to access the data.
In 1986 BoA had 15 Gbytes of data, 30 MIPS of processing power,
did 5 queries per day at the cost of $2,430 per query.
In 1995, BoA has 800 Gbytes, 1800 MIPS, does 2000 queries per day at
the cost of $24 per query, with an average response around 30 seconds.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 28 Jul 95 16:17:40 PDT
From: fayyad@aig.jpl.nasa.gov (Usama Fayyad)
Subject: KDD tutorial home page
The latest description of the forthcoming KDD tutorial by
Usama Fayyad and Evangelos Simoudis is at
http://www-aig.jpl.nasa.gov/kdd95/tutorials/IJCAI95-tutorial.html
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[from ML-list]
From: Wray Buntine
Subject: Web information sources on Bayesian/Probabilistic networks
Date: Mon, 17 Jul 95 12:05:38 PDT
Those who found David Heckerman's presentation and tutorial
on learning Bayesian networks (at the recent IMLC'95) interesting
should check out the following:
* An up todate review of the state of the art in learning
probabilistic networks can be found at:
http://www.Heuristicrat.COM/wray/graphbib.ps.Z
(this has been revised several times from a quick and dirty report
distributed a year ago). Currently under review at a major journal.
Some very nice work exists in this area.
* Other tutorial articles on probabilistic networks by the UAI
community are listed at:
http://www.heuristicrat.com/wray/uaiconnections.html
This includes a pointer to David Heckerman's tutorial article
(a Microsoft report) that matches part of his talk.
* Some of the techniques described by David were first applied to
learning class probability trees (CART/C4.5 etc) way back in 1990.
These Bayesian tree classification methods are available in IND2.1
as Bayesian Smoothing and Option Trees, and independent studies
reported in Statlog (Spiegelhalter, Michie and Taylor, 1994)
show the methods are highly competitive with CART and C4.5.
Look for my trees paper in:
http://www.Heuristicrat.COM/wray/refs.html#papers
Jon Oliver presented a better variation of smoothing at IMLC'95.
* Michael Jordan presented a paper showing technology transfer in the
reverse direction: a probabilistic network algorithm adapted to
do multivariate splits in trees (IMLC'93).
The parallel between learning decision trees and learning Bayesian networks
is remarkable. Techniques for learning class probability trees transfer
easily to Bayesian networks and back. For instance, I mention in the review
above how Usama Fayyad's discretization methods could well be adapted for
learning Bayesian networks.
I believe this is an excellent demonstration that the business of
constructing a learning algorithm for a particular knowledge represention is
something we now have well in hand, i.e., its becoming an engineering
problem rather than research. In fact, several groups have already built
compilers that take a problem represention and generate a learning
algorithm. Remarkable but true! I gave some examples in my IMLC'95
tutorial, and the slides are available from:
http://www.Heuristicrat.COM/wray/refs.html#tutes
Of course, more realistically, we'd expect this kind of technology to create
pieces of a learning algorithm rather than the whole thing, but
nevertheless, expect in the near future to be able to prototype many
learning algorithms faster. The technology exists to do this already.
Wray Buntine +1 (510) 845-5810 [voice]
Heuristicrats Research, Inc. +1 (510) 845-4405 [fax]
1678 Shattuck Avenue, Suite 310 wray@Heuristicrat.COM
Berkeley, CA 94709-1631 http://WWW.Heuristicrat.COM/wray/
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Return-Path:
Date: Thu, 3 Aug 1995 13:45:34 -0700
From: (Ashish Gupta)
To: kdd@gte.com
Subject: request
Dear Gregory,
I am sending this note on behalf of the Data Mining Project at
IBM Almaden Research Center. We have recently "built" a homepage
on the WWW and would like to have included in your "other miners"
page, a pointer to our site. Please let me know if that is possible.
Our URL is:
http://www-i.almaden.ibm.com/cs/quest/index.html
Thanks.
Regards.
Ashish Gupta
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~