KDD Nuggets 95:18, e-mailed 95-08-04 Contents: * GPS, forthcoming BYTE story on Data Mining (Oct 95 ?) * GPS, KD Mine usage Stats (accessed from 42 countries ...) * GPS, ComputerWorld 95/07/10 on Data Mining at Bank of America * U. Fayyad, KDD tutorial home page http://www-aig.jpl.nasa.gov/kdd95/tutorials/IJCAI95-tutorial.html * W. Buntine, Web sources on Bayesian/Probabilistic networks * A. Gupta, IBM Almaden Data Mining site http://www-i.almaden.ibm.com/cs/quest/index.html The KDD Nuggets is a moderated mailing list for news and information relevant to Data Mining and Knowledge Discovery in Databases (KDD). Please include a DESCRIPTIVE subject line in your submission. Nuggets frequency is approximately bi-weekly. Back issues of Nuggets, a catalog of S*i*ftware (data mining tools), references, FAQ, and other KDD-related information are available at Knowledge Discovery Mine, URL http://info.gte.com/~kdd I have been interviewed for the story and am waiting with trepidation for its appearance. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Return-Path: Date: Fri, 28 Jul 1995 09:50:40 -0400 From: gps@gte.com (Gregory Piatetsky-Shapiro) Subject: KD Mine usage stats for June 1995 I have compiled (for information purposes only) usage stats for KD Mine site (http://info.gte.com/~kdd) for the period of Jun 5 1995 to Jun 30 1995. There were a total of 12,443 files and 91,855,933 bytes transmitted. There were requests from over 40 countries. Top domains (sorted by decreasing # of requests) 3220 | com US Commercial 1778 | edu US Educational 1712 | unresolved 748 | uk United Kingdom 585 | jp Japan 484 | au Australia 429 | net Network 400 | de Germany 355 | gov US Government 343 | nl Netherlands 308 | ca Canada 277 | fr France 170 | it Italy 143 | fi Finland 128 | ch Switzerland 120 | org Non-Profit Organization 104 | es Spain 100 | sg Singapore 98 | il Israel 97 | no Norway 68 | mil US Military 68 | pl Poland 62 | za South Africa 61 | at Austria 61 | se Sweden 60 | be Belgium 55 | kr Korea (South) 50 | ie Ireland 43 | nz New Zealand (Aotearoa) 33 | my Malaysia 31 | th Thailand 27 | hk Hong Kong 25 | us United States 22 | gr Greece 22 | cz Czech Republic 22 | tw Taiwan 20 | dk Denmark 18 | pt Portugal 16 | arpa Old style Arpanet 12 | gb Great Britain (UK) 13 | br Brazil 10 | su USSR (former) 10 | pe Peru 5 | sk Slovak Republic 4 | si Slovenia 3 | cl Chile 2 | ar Argentina 2 | id Indonesia 1 | ph Philippines 1 | hu Hungary Top pages accessed: 1305 | /~kdd/ top-level 958 | /~kdd/siftware.html 295 | /~kdd/what-is-new.html 268 | /~kdd/other-servers.html 193 | /~kdd/reference.html 147 | /~kdd/FAQ.txt 142 | /~kdd/kdd-publications.html 132 | /~kdd/kdd-93-report.tex 129 | /~kdd/ai4kdd.html 110 | /~kdd/nuggets/95/ 98 | /~kdd/kdd95.html 43 | /~kdd/kdd-at-gte.html 40 | /~kdd/homepages.html 34 | /~kdd/nuggets/94/ 27 | /~kdd/kdd-terms.html 18 | /~kdd/nuggets/93/ >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Fri, 28 Jul 1995 12:39:06 -0400 From: gps0 (Gregory Piatetsky-Shapiro) Subject: ComputerWorld 95/07/10 on Data Mining at Bank of America ComputerWorld July 10, 1995 page 1 story on "Data Mining unearths customers" describes how Bank of America uses its own data warehouse to analyze its customer data, with queries like "How many of Silicon Valley residents in a particular sales district own Acura Legends and also golf club memberships." In another example, the bank has started recently to mine for Hispanic customers who are potential first-time home buyers. The BoA Data Warehouse allows many interactive ways to access the data. In 1986 BoA had 15 Gbytes of data, 30 MIPS of processing power, did 5 queries per day at the cost of $2,430 per query. In 1995, BoA has 800 Gbytes, 1800 MIPS, does 2000 queries per day at the cost of $24 per query, with an average response around 30 seconds. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Fri, 28 Jul 95 16:17:40 PDT From: fayyad@aig.jpl.nasa.gov (Usama Fayyad) Subject: KDD tutorial home page The latest description of the forthcoming KDD tutorial by Usama Fayyad and Evangelos Simoudis is at http://www-aig.jpl.nasa.gov/kdd95/tutorials/IJCAI95-tutorial.html >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [from ML-list] From: Wray Buntine Subject: Web information sources on Bayesian/Probabilistic networks Date: Mon, 17 Jul 95 12:05:38 PDT Those who found David Heckerman's presentation and tutorial on learning Bayesian networks (at the recent IMLC'95) interesting should check out the following: * An up todate review of the state of the art in learning probabilistic networks can be found at: http://www.Heuristicrat.COM/wray/graphbib.ps.Z (this has been revised several times from a quick and dirty report distributed a year ago). Currently under review at a major journal. Some very nice work exists in this area. * Other tutorial articles on probabilistic networks by the UAI community are listed at: http://www.heuristicrat.com/wray/uaiconnections.html This includes a pointer to David Heckerman's tutorial article (a Microsoft report) that matches part of his talk. * Some of the techniques described by David were first applied to learning class probability trees (CART/C4.5 etc) way back in 1990. These Bayesian tree classification methods are available in IND2.1 as Bayesian Smoothing and Option Trees, and independent studies reported in Statlog (Spiegelhalter, Michie and Taylor, 1994) show the methods are highly competitive with CART and C4.5. Look for my trees paper in: http://www.Heuristicrat.COM/wray/refs.html#papers Jon Oliver presented a better variation of smoothing at IMLC'95. * Michael Jordan presented a paper showing technology transfer in the reverse direction: a probabilistic network algorithm adapted to do multivariate splits in trees (IMLC'93). The parallel between learning decision trees and learning Bayesian networks is remarkable. Techniques for learning class probability trees transfer easily to Bayesian networks and back. For instance, I mention in the review above how Usama Fayyad's discretization methods could well be adapted for learning Bayesian networks. I believe this is an excellent demonstration that the business of constructing a learning algorithm for a particular knowledge represention is something we now have well in hand, i.e., its becoming an engineering problem rather than research. In fact, several groups have already built compilers that take a problem represention and generate a learning algorithm. Remarkable but true! I gave some examples in my IMLC'95 tutorial, and the slides are available from: http://www.Heuristicrat.COM/wray/refs.html#tutes Of course, more realistically, we'd expect this kind of technology to create pieces of a learning algorithm rather than the whole thing, but nevertheless, expect in the near future to be able to prototype many learning algorithms faster. The technology exists to do this already. Wray Buntine +1 (510) 845-5810 [voice] Heuristicrats Research, Inc. +1 (510) 845-4405 [fax] 1678 Shattuck Avenue, Suite 310 wray@Heuristicrat.COM Berkeley, CA 94709-1631 http://WWW.Heuristicrat.COM/wray/ >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Return-Path: Date: Thu, 3 Aug 1995 13:45:34 -0700 From: (Ashish Gupta) To: kdd@gte.com Subject: request Dear Gregory, I am sending this note on behalf of the Data Mining Project at IBM Almaden Research Center. We have recently "built" a homepage on the WWW and would like to have included in your "other miners" page, a pointer to our site. Please let me know if that is possible. Our URL is: http://www-i.almaden.ibm.com/cs/quest/index.html Thanks. Regards. Ashish Gupta >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~