KDD Nugget 95:5, e-mailed 95-02-27 Contents: * E. Simoudis, home for Recon: http://hitchhiker.space.lockheed.com/~recon/ * W. Kovach, WWW page for statistical software, http://www.compulink.co.uk/kovcomp * R. Musick, Ph. D. thesis abstract: Belief Network Induction * A. Sharma, Announcement of Post-Doctoral Position * GPS, what is new in Knowledge Discovery Mine * D. Jensen, ERRATA: Statistical Evaluation of Classifiers --- CFPs --- * TDWI: Data Warehousing Institute Annual Conference (DW'95) * K. Ong, Workshop on the Integration of Knowledge Discovery with Deductive and Object-Oriented Databases (KDOOD) The KDD Nuggets is a moderated mailing list for news and information relevant to Knowledge Discovery in Databases (KDD), also known as Data Mining, Knowledge Extraction, etc. Relevant items include tool announcements and reviews, summaries of publications, information requests, interesting ideas, clever opinions, etc. Please include a descriptive subject line in your submission. Nuggets frequency is approximately bi-weekly. Back issues of Nuggets, a catalog of S*i*ftware (data mining tools), references, FAQ, and other KDD-related information are available at Knowledge Discovery Mine, URL http://info.gte.com/~kdd/ or by anonymous ftp to ftp.gte.com, cd /pub/kdd, get README E-mail add/delete requests to kdd-request@gte.com E-mail contributions to kdd@gte.com -- Gregory Piatetsky-Shapiro (moderator) ********************* Official disclaimer *********************************** * All opinions expressed herein are those of the writers (or the moderator) * * and not necessarily of their respective employers (or GTE Laboratories) * ***************************************************************************** ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Your work is to discover your world and then with all your heart give yourself to it. BUDDHA ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Return-Path: Date: Fri, 17 Feb 95 12:28:40 PST From: simoudis@aic.lockheed.com (Evangelos Simoudis) To: gps@gte.com Subject: information on Recon Reply-To: simoudis@aic.lockheed.com Gregory, We have created a home page on Recon. Its URL is http://hitchhiker.space.lockheed.com/~recon/ Please announce this in the KDD nuggets and please put a pointer from the Recon entry of your siftware catalog to our URL. Thank you and regards, Evangelos -------------------------------------------- Return-Path: Date: Mon, 20 Feb 1995 08:01:37 +0000 From: "Warren L. Kovach" Subject: WWW: Statistical and data analysis software X-To: CLASS-L@ccvm.sunysb.edu X-Cc: morphomet@cunyvm.cuny.edu, mingeol@csn.org To: Multiple recipients of list CLASS-L I am pleased to announce my new World Wide Web pages focusing on shareware and public domain statistical and data analysis software. The URL is: http://www.compulink.co.uk/kovcomp These pages provide detailed information about and shareware copies of my programs MVSP and Oriana. MVSP is a multivariate statistical program for MS-DOS that calculates a variety of cluster analyses as well as PCA, PCO, and correspondence/detrended correspondence analysis. Oriana is my new circular statistics/orientation analysis package for Windows. The pages also have a list of resources on the Internet related to statistical software. In particular, there are many links to WWW pages and FTP sites that have software. I hope to maintain a definitive list of sources of shareware and public domain software on the Internet. If you know of sites that are not yet on my list I would appreciate hearing about them. For a bit of fun, there is also a page with information about the Isle of Anglesey, in North Wales, the home of Kovach Computing Services, and links to other WWW pages about Wales. Come and learn how to pronounce one of the longest placenames in the world! -- Dr. Warren L. Kovach Internet: WarrenK@kovcomp.demon.co.uk Kovach Computing Services tel./fax: +44-(0)1248-450414 85 Nant-y-Felin, Pentraeth, Anglesey CompuServe: 100016,2265 Wales LL75 8UY U.K. WWW: http://www.compulink.co.uk/kovcomp -------------------------------------------- Return-Path: Date: Mon, 20 Feb 1995 11:40:17 -0800 From: Ron Musick To: kdd@gte.com Subject: Ph. D. thesis abstract: Belief Network Induction Belief Network Induction Ron Musick musick@cs.berkeley.edu Doctoral Thesis, accepted December 1994. University of California, Berkeley Partial Abstract This dissertation describes BNI (Belief Network Inductor), a tool that automatically induces a belief network from a database. The fundamental thrust of this research program has been to provide a theoretically sound method of inducing a model from data, and performing inference over that model. Along with a solid grounding in probability theory, BNI has proven to be a quick, practical method of inducing data models that are highly accurate. The results include a belief network that stores beta distributions in the conditional probability tables, coupled with theorems demonstrating how to maintain these distributions through inference; techniques for applying neural network and other learning techniques to the task of conditional probability table learning; and a decision theoretic sampling theory which addresses scalability issues by characterizing the size of the sample needed to produce high quality inferences. The setting for this work is in database mining. The thesis can be found at URL http://http.cs.berkeley.edu/~musick/ -------------------------------------------- Return-Path: arun@cse.unsw.edu.au From: arun@cse.unsw.edu.au (Arun Sharma) To: kdd@gte.com Date: Tue, 21 Feb 1995 11:43:24 +1100 (EST) Subject: Announcement of Post-Doctoral Position X-Mailer: ELM [version 2.4 PL23] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 2359 Hi, This may be of interest to some recipients of your mailing list. I would very much appreciate if you could include the following advertisement in the next issue. Regards, Arun Sharma ---------------- RESEARCH ASSOCIATE (Fixed Term) SCHOOL OF COMPUTER SCIENCE AND ENGINEERING University of New South Wales, Sydney, Australia REF. 131 --------------------------------------------------------------------- Salary: A$36,345 -- A$40,087 per annum (A$1 = US$0.75 approx). --------------------------------------------------------------------- The School of Computer Science and Engineering is seeking to employ a Research Associate for a project funded by the Australian Research Council. This project, based at School's Artificial Intelligence Laboratory, involves the development and evaluation of knowledge discovery tools (based on inductive logic programming) for databases. Duties will involve all aspects of the project: design and analysis of knowledge discovery tools, planning and running experiments, analysing results, assisting in the preparation of papers, and taking part in discussions and seminars. Applicants must have recently completed a PhD in Computer Science (preferably with specialization and publications in Machine Learning), and will be required to have excellent C and Prolog programming skills. Good communication skills, both written and oral and a knowledge and understanding of EEO/AA principles are required. Familiarity with concepts and techniques of Logic Programming and Deductive Databases is desirable. The appointment is initially for a period of 12 months with further renewal up to three years, dependent on funding. Enquiries may be directed to Dr Arun Sharma on telephone +61-2-385-3938, email: arun@cse.unsw.edu.au; Assoc Prof Claude Sammut on telephone +61-2-385-3933, email: claude@cse.unsw.edu.au; or Dr John Shepherd on telephone +61-2-385-3969, email: jas@cse.unsw.edu.au. Applications close 8 March 1995. Applications, mentioning the Reference number, should be sent to the following: The Recruitment Officer Human Resources Department The University of New South Wales Sydney, NSW 2052, Australia Fax: +61-2-662 2832 ---------------------------------------------------------------------- -------------------------------------------- Date: Mon, 20 Feb 95 12:02:43 EST From: gps0 (Gregory Piatetsky-Shapiro) To: kdd Subject: What is new in KD Mine

Feb 19, 1995

  • In AC2, a decision tree classification tool developed in C++. AC2 allows the user to create and to manipulate decision trees from data set of symbolic, numeric, noisy and unknown descriptions. The scientific grounds of AC2 relies on the discriminatory methods and on the representation language of the data set.
  • In SE-Learn, An SE-tree-based induction and classification tool.
    Set Enumeration (SE) trees provide the basis for an induction and classification framework which generalizes decision trees. In this framework, called SE-Learn, rather than splitting according to a single attribute, one recursively branches on all (or most) relevant attributes. A single SE-tree economically embeds many decision trees, supporting a more expressive representation. SE-Learn benefits from many techniques developed for decision trees, e.g., attribute-selection and pruning measures. In particular, SE-Learn can be tailored to start off with anyone's favorite decision tree, and then improve upon it via further exploring the SE-tree. This hill-climbing algorithm allows trading time/space for added accuracy. Current studies show that SE-trees are particularly advantageous in domains where (relatively) few examples are available for training, and in noisy domains. Finally, SE-trees provide a unified framework for combining induced knowledge with knowledge available from other sources.
  • In http://hitchhiker.space.lockheed.com/~recon/
  • In Other Servers section, added Warren Kovach's World Wide Web pages focusing on shareware and public domain statistical and data analysis software.

Feb 13, 1995

  • In DataMariner, DataMariner combines classical statistical techniques with inductive machine learning to discover multivariate relationships in numerical and discreet data. The product consists of a set of tools for KDD, including: clustering algorithms, automatic formation of new attributes, simplifying attributes, rule induction, incremental rule induction, rule pruning, cross-validation, rule evaluation and a graphical display of rules.

Feb 8, 1995

In Other Servers section,
  • Computing as Compression, a summary of the main ideas in a programme of research which seeks to develop a new theory of computing and a `new generation' computing system based on the SP theory. This research relates to several AI topics (learning, pattern recognition, deduction, abduction, data mining, and others) as well as `mainstream' computing (execution of functions, information retrieval, software design, and others).
  • Society for AI and Statistics home page.

December 29, 1994

  • Home page for KDD-95 conference
  • In MATLAB, a complete engineering environment for neural network research, design, and simulation. Offers over fifteen proven network architectures and learning rules.

December 23, 1994

In Essbase, a high-performance multi-dimensional analytical engine for OLAP (On-Line Analytical Processing.)
  • added info on Data Surveyor, a data mining tool for the discovery of strategic relevant information from large databases.

    November 22, 1994

    In SDISCOVER, a tool which discovers regular expression style motifs in each family among a set of families of sequences. In The WinViz system home page.
    ... is a data visualisation software with a revolutionary concept of representing N-dimension data. It allows data mining through visualisation and many other ML techniques. The visualisation component of WinViz is currently completed. Enhancement such as supervised machine learning, self clustering module, statistical analysis and time series analysis are under construction. -------------------------------------------- Return-Path: From: "Jensen, David" To: 'KDD Nuggets' , 'ML-List' Cc: 'Rich Ambrosino' , 'David Wolpert' , "'A. Feelders'" , "'O. Gascuel'" , 'Larry Hunter' , 'Ian Witten' , "Raul Valdes-Perez (CMU)" , "'W. Verkooijen'" Subject: ERRATA: Statistical Evaluation of Classifiers Date: Wed, 22 Feb 95 10:24:00 PST Encoding: 7 TEXT X-Mailer: Microsoft Mail V3.0 David Wolpert (dhw@santafe.edu) identified an error in the first citation of my recent submission ("Statistical Evaluation of Classifiers", ML-List 7:3 and KDD Nuggets 95:2). The date in the citation, and on the first page of the article itself, is incorrect. The correct citation is: G. Piatetsky-Shapiro et al., "KDD-93: Progress and Challenges in Knowledge Discovery in Databases," AI Magazine, FALL 1994, 15(3): 77-82. ======================= CFPs ================== -------------------------------------------- Return-Path: Date: Mon, 20 Feb 95 12:02:43 EST From: gps0 (Gregory Piatetsky-Shapiro) To: kdd ANNOUNCEMENT & CALL FOR ABSTRACTS DW'95 The 1995 Data Warehousing Institute Annual Conference Focusing On Practical Solutions To Tough Problems With Two Concurrent Forums DSS/EIS'95 and the ORACLE Data Warehousing Forum July 24-28, 1995 Washington, DC ********************************************** Dates For Refereed Papers Submission Abstracts due: March 1, 1995 Notification to authors: March 15, 1995 Camera-ready final papers due: June 1, 1995 *********************************************** For information on the conference: email tdwi@aol.com and include the word conference in the subject or body. ************************************************************** Job prospects for data warehousing professionals are at an all time high. HP, IBM, AT&T GIS, and Sun all report that at last 50 per cent of their commercial server sales in 1995 will be used for data warehousing projects. Thousands of new people want to learn how to do data warehousing right, and a large number of them are coming to the Data Warehousing Institute's Annual Conference in Washington. If you are one people who has already learned the hard lessons of data warehousing, please consider submitting a paper. It will be a fun conference with lots of invited papers by industry experts, courses by the best teachers in the field, BOFs to help you network, as well as the peer-reviewed papers and user panels that will illuminate the problems that experienced data warehousing managers and staff have faced and solved. Approximately 400 people will gather in Washington, DC, July 24-28, to share experiences, learn from the experts, identify practical solutions to current challenges, build support networks, and discuss alternative futures for the field. DW'95 offers three days of in-depth, authoritative courses and two days of multiple tracks of invited and peer-reviewed technical papers. It also is the home of the Oracle Warehousing Forum: a conference-within-a-conference for DBAs and warehousing managers who are using or plan to use Oracle as the centerpiece of their data warehousing initiative. DW'95 also provides a unique opportunity to review the implementation of the data warehouses developed by companies or government agencies chosen as having the "Best Data Warehousing Applications of 1995." (Nominations close on February 24. If you think you know of a great one, email tdwi@aol.com and include "nomination" in the subject line or the body.) Conference Topics The overall focus of DW'95 is finding practical solutions to common problems in data warehousing, data mining, decision support and executive information systems. Feel free to offer abstracts on any of the following topics, or a topic of your own: Topics on Initiating and Justifying DW, DSS, or EIS projects Gaining top management support Estimating costs Going back for more money Estimating the sizes and capacities of the systems (a paper we would like to see: "How and Why We Underestimated The Data Volumes and What We Had To Do To Correct the Error In System Sizing") Survival Strategies for Data Warehousing Managers Determining the focus for the warehouse and the requirements it must meet Gaining early user involvement and buy-in Keeping user expectations in line Finding the right driver (the executive who bridges between the data warehouse manager and top executives) Getting agreement of common data definitions Topics on Staffing Data Warehouses The right skills for DW managers, DSS managers, and EIS managers. Can COBOL programmers play a role in data warehouses Can EIS and DSS staff be reinvented as data warehousing managers? Topics on Using System Integrators Why I love (hate) the system integrator we chose (success/horror stories) Tips on managing a data warehousing system integration project Topics on Data Selection, Extraction and Cleaning Choosing the right data How to ask your users what they need (and how not to ask them) Experiences in using commercial data extraction and cleaning tools: how they paid off -- or didn't. Replication: push vs pull Topics on Information Packaging (DSS/EIS) How DSS/EIS managers migrated to data warehousing Why I love/hate (pick one) for information packaging Roll Your Own delivery system your own with Excel (or PowerBuilder) Topics on Multi-Dimensional Database Management How multi-dimensional systems are actually used How much better is multidimensional than relational with bit-wise indexing Topics on selecting platforms Experiences with parallel systems: how big was the actual gain. Case studies of co-existence of warehousing and production systems. Topics on Metadata Strategies for metadata development Online business directories: do they work? Exciting applications Alarm systems (we'd like to see something on "techniques for triggering and delivering alarms to data warehousing end users") Opening the data warehouse to your employer's clients Data mining success stories Topics on Security and Systems Management Tools and techniques for administering the data warehouse Tools and techniques for improving security Management automation strategies Lesson You Have Learned What you wish you had known when you started your data warehousing project, but had to learn the hard way. You don't have to have made a major breakthrough to have your paper accepted. The delegates just want practical solutions for real problems. Abstract Submission =================== Abstracts must be 1 to 3 pages long and be received by March 1, 1995. The object of an abstract is to convince the reviewers that a good paper and presentation will result. Your abstract should include: 1. A cover page including the title of the paper, the principal author's name, title, organization, address, email, telephone, and FAX numbers, and the names and affiliations of the other authors. 2. A description of the problem and its importance. Option: Describe what happened to you or your users that made you aware of how important the problem was. 3. Your solution, including details of how it worked. If this is work on emerging technology, try to show what the expected impact will be. If your solution is based on commercial hardware or software tools, name them. 4. Data on how well the solution works: before/after comparisons, direct savings, trade-offs, etc. 5. Lessons learned. Where To Submit: ================ Please send one copy of your abstract to the program committee using one of the following methods. All submissions will be acknowledged. Preferred method: email (plain text) to tdwi@aol.com Alternative method: fax: 719-599-4395 Alternative 2: postal delivery to DW'95 Abstracts, Data Warehousing Institute, 1815 H Street NW, Suite 1100, Washington, DC 20006. Questions: email tdwi@aol.com or telephone 719-599-4303 ************************************************************************ Peer-Review Committee Executives Chair: Dr. Ramon Barquin, Executive Director, Data Warehousing Institute Geane Schubert, Staff Director, Data Warehousing Institute Herb Edelstein, Program Chair, Euclid Associates plus practitioners and consultants ************************************************************************ For a DW'95 Registration package, email tdwi@aol.com and include the word: "Conference" in the subject line or the body. ************************************************************************ ************************************************************************ For information about the Data Warehousing Institute, email tdwi@aol.com and include the word: "Info" in the subject line or the body. ************************************************************************ -------------------------------------------- Return-Path: Date: Tue, 21 Feb 95 14:25:31 CST From: ong@mcc.com (Kayliang Ong) To: kdd@gte.com Subject: Announcement: First International Workshop on the Integration of Knowledge Discovery with Deductive and Object-Oriented Databases (KDOOD) -------------------------------------------------------------------------- This CALL-FOR-PAPER can also be found on WWW: http://www.mcc.com/projects/carnot/kdoodws-cfp.html This www page will be updated as relevant information becomes available. -------------------------------------------------------------------------- CALL FOR PAPERS First International Workshop on the Integration of Knowledge Discovery with Deductive and Object-Oriented Databases (KDOOD) December 8, 1995, Singapore OBJECTIVES Knowledge discovery from databases and Deductive and Object-Oriented Databases (DOOD) are two promising research areas that have so far been growing rather independently of each other. However, lots of evidence suggests that the two areas can be mutually beneficial. For example, DOOD techniques may be used to facilitate knowledge discovery, whereas knowledge discovery techniques may help knowledge-base construction, and thus enhance and/or challenge the deductive and object-oriented database techniques. The objective of this workshop is to explore how knowledge discovery and DOOD can be integrated. FORMAT The workshop will be held immediately following the DOOD'95 conference. The plan is to have either a half-day or a full-day session depending on the number of submissions and the level of participation. If the number of submissions is large, we may categorize the papers in presentation papers, poster papers and system/prototype descriptions/demonstrations. Proposals for panel discussions are welcome. The final program of the workshop will be decided only after all the papers have been reviewed. The program will be formatted in a way that will encourage open discussions. The workshop also plans to invite one or two pioneering researchers in the area as keynote speakers. TOPICS The topics of interest include but are not limited to the following: 1. Knowledge-base construction by KDD. 2. Rule-guided induction. 3. Integration of induction and deduction techniques. 4. Extension of DOOD systems for KDD. 5. Knowledge-merging: merging deduction rules and discovered rules. 6. Induction from data and schema. 7. Applications of discovered knowledge in DOOD (including semantic query optimization, intelligent query answering, data purification, etc.). 8. Uncertainty handling in data mining and DOOD. 9. KDD tools for DOOD. SUBMISSION AND REVIEWS OF THE PAPERS Authors are invited to submit original research contributions. Each paper should not be longer than 10 pages. We enourage electronic submissions in the form of postscript, latex etc. but limited to the std 8/5x11 size paper. If hardcopies are submitted, four copies will be required. Each submitted paper will be reviewed by at least three program committee members. PUBLICATIONS To be decided based on the number of submissions. An arrangement will likely be made with the DOOD conference organizers. ATTENDANCE At least one author of the papers selected for paper and/or poster presentation must attend the workshop. Researchers who are willing to contribute in the discussions are encouraged to participate. Workshop participants are not required to register and attend DOOD'95 conference. However, a discount will be given if they attend both conference and workshop. PROGRAM COMMITTEE Rakesh Agrawal, IBM, USA Paolo Atzeni, Universita' La Sapienza, Italy Ron Brachman, AT&T Bell Laboratories JiaWei Han, Simon Fraser University, Canada Willi Kloesgen, GMD, Germany Laks V.S. Lakshmanan, Concordia University, Canada Raymond Ng, U. British Columbia, Canada Shojiro Nishio, Osaka University, Japan KayLiang Ong, MCC, USA Beng-Chin Ooi, NUS, Singapore Wei-Min Shen, ISI/USC, USA Evangelos Simoudis, Lockheed, USA Shalom Tsur, Argonne National Research Lab, USA Carlo Zaniolo, UCLA, USA IMPORTANT DATES 1 June 1995 - Final submission date for papers 24 July 1995 - Notification 7 August 1995 - Workshop programs completed 1 September 1995 - Camera ready 8 December 1995 - Workshop ORGANIZING COMMITTEE Jiawei Han, Simon Fraser University, Canada. Laks V.S. Lakshmanan, Concordia University, Canada. Raymond Ng, University of British Columbia, Canada. KayLiang Ong, Microelectronics and Computer Technology Corporation (MCC), USA. For further information and paper submission, please contact: KayLiang Ong, Microelectronics and Computer Technology Corporation (MCC), 3500, West Balcones Center Drive, Austin, Texas 78759-5398, USA email: ong@mcc.com phone: (512) 338-3354. fax: (512) 338-3890.