KDD Nugget 94:21, e-mailed 94-11-30 Contents: * IDA-95, CFP: Int. Symp. of Intelligent Data Analysis (IDA-95) * J. Petit, Using Queries to Improve Database Reverse Engineering * S. Kloosterman, KDD Definitions question * J. P. Lee, Book: Database Issues for Data Visualization * A. Aamodt, CFP: 5th Scandinavian Conf. on AI The KDD Nuggets is a moderated list for the exchange of information relevant to Knowledge Discovery in Databases (KDD, also known as Data Mining), e.g. application descriptions, conference announcements, tool reviews, information requests, interesting ideas, clever opinions, etc. It has been coming out about every two-three weeks, depending on the quantity and urgency of submissions. Back issues of nuggets, a catalog of data mining tools, useful references, FAQ, and other KDD-related information are now available at Knowledge Discovery Mine, URL http://info.gte.com/~kdd/ or by anonymous ftp to ftp.gte.com, cd /pub/kdd, get README E-mail contributions to kdd@gte.com Add/delete requests to kdd-request@gte.com -- Gregory Piatetsky-Shapiro (moderator) ********************* Official disclaimer *********************************** * All opinions expressed herein are those of the writers (or the moderator) * * and not necessarily of their respective employers (or GTE Laboratories) * ***************************************************************************** ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It is my belief that the answers lie within us, but sometimes we need another to enable the discovery. -- Jeff Shepherd (jeff@netboss1.trg.saic.com) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Return-Path: Date: Mon, 21 Nov 94 21:49:13 GMT To: IE-list@cs.ucl.ac.uk, ag-exp-l%ndsuvm1.bitnet@cunyvm.cuny.edu, agosta@sumex-aim.stanford.edu, ai-ed@sun.com, ai-medicine@med.stanford, ai-nat@adfa.oz.au, ai-stats@watstat.uwaterloo.ca, at-finance-board@invnext.worldbank.org, cbr-med@cs.uchicago.edu, class-l%sbccvm.bitnet@cunyvm.cuny.edu, cybsys-l%bingvmb.bitnet@cunyvm.cuny.edu, cybsys-l@bingvmb.cc.binghamton.edu, diagrams@cs.swarthmore.edu, idss@socs.uts.edu.au, kaw@swi.psy.uva.nl, kdd@gte.com, met-ai@comp.vuw.ac.nz, ml@ics.uci.edu, students.chi@xerox.com Subject: IDA-95: International Symposium on Intelligent Data Analysis From: IDA-95@dcs.bbk.ac.uk 1st CALL FOR PAPERS AND PANELS International Symposium on Intelligent Data Analysis (IDA-95) Baden-Baden, Germany 17th-19th August 1995 Objective --------- The gap between data generation and data comprehension is widening. Efficient computational methods for analysing data effectively are required to narrow this gap. There have been a variety of computationally intelligent techniques developed, which are beginning to provide such capability. However, many questions need to be properly addressed before these techniques can be most effectively employed to perform various data analysis tasks. It is the purpose of IDA-95 to provide an international forum for the discussion of these questions, some of which are listed below: a) How important is it to understand the data characteristics and to pre-process data accordingly before using the data for tasks such as classification and forecasting? [Exploratory data analysis, incompleteness and uncertainty, noise filters, outliers] b) With so many modern techniques, which technique should I use for my application? [Bayesian networks, fuzzy logic, decision trees, genetic algorithms, neural nets, statistical pattern recognition] c) What is the impact of modern visualisation techniques on data analysis? [Computer graphics, computational geometry, image processing, user interface] d) What is the role of domain knowledge in data analysis? Does it help analyse data more effectively or simply introduce "biases" into the analysis procedure? e) How do we evaluate the performance of intelligent data analysis systems? What should we do when "golden standards" do not exist? f) How can one integrate a variety of related techniques to develop the most effective system for a given application? Submissions ----------- Participants who wish to present a paper are requested to submit a 1000 word extended abstract as soon as possible, but not later than February 1, 1995. (E-mail submissions are preferred.) Notification of acceptance will be sent to authors by March 15, 1995. Full camera-ready papers, not exceeding 5 single-spaced pages, will be required by May 1, 1995 for publication in the Symposium Proceedings. In addition to paper presentations, panel sessions on one or more of the above-mentioned topics are planned. If you would like to organise a panel discussion in these or other related topics, please submit your proposals with a one-page description of the subject matter and a list of proposed panelists by April 1, 1995. Program Committee ----------------- Nirwan Ansari New Jersey Inst. of Tech., USA David Bell Univ. of Ulster at Jordanstown, N. Ireland Max Bramer Univ. of Portsmouth, England Paul Cohen Univ. of Massachusetts at Amherst, USA Doug Fisher Vanderbilt University, USA Alex Gammerman Royal Holloway, London Univ., England Se June Hong IBM T.J. Watson Research Center, USA Xiaohui Liu Birkbeck College, London Univ., England (Chair) Alan Payne Kodak Research Division, England Henri Prade Univ. of Paul Sabatier, France Colin Shearer Integral Solutions Limited, England Paul Snow Independent Consultant, Concord, USA Lionel Tarassenko Oxford University, England Serdar Uckun Rockwell International Science Center, USA Vladimir Vapnik AT&T Bell Laboratories, Holmdel, USA Sholom Weiss Rutgers Univ. at New Brunswick, USA H-J Zimmermann ELITE Foundation, Aachen Inst. of Tech., Germany Location -------- Baden-Baden is a beautiful spa-resort town and convention centre located in the middle of the Black Forest in Germany. It can be reached in two hours by train from Frankfurt or Stuttgart. Those travelling by car can reach Baden-Baden by Autobahn A5 (Frankfurt - Basel) or Autobahn A8 (Stuttgart - Karlsruhe). The Conference will be held in the Markraf-Ludwig-Gymnasium. Sponsor ------- IDA-95 is sponsored by the International Institute for Advanced Studies in System Research and Cybernetics and will be held as part of their annual conference on "Systems Research, Informatics and Cybernetics". The aim of this year's conference is to encourage and facilitate the interdisciplinary communication and co-operation amongst scientists, engineers, and professionals working in different fields such as computer science, cognitive science, engineering, linguistics, logic, management, medicine, philosophy and psychocybernetics, and to identify and develop those areas of research that will most benefit from such a cooperation. Those who want to submit papers to the general conference should contact: Professor G E Lasker, School of Computer Science, University of Windsor, Windsor, Ontario N9B 3P4, Canada. Fax: (+1) 519 974 8191 Correspondence -------------- Submissions for IDA-95 should be addressed to: Dr X Liu, Department of Computer Science, Birkbeck College, Malet Street, London WC1E 7HX, UK. E-mail: ida-95@dcs.bbk.ac.uk Tel: (+44) 171 631 6711 Fax: (+44) 171 631 6727 Latest information regarding IDA-95 will be available on the World Wide Web Server of the Department of Computer Science at Birkbeck College, London: http://web.dcs.bbk.ac.uk/CS/Research/IDA/cfp.html -------------------------------------------- Return-Path: Date: Tue, 22 Nov 94 10:32:18 +0100 From: jmarc@lisiecrin.insa-lyon.fr (Jean-Marc Petit) To: kdd@gte.com Subject: Using Queries to Improve Database Reverse Engineering Cc: dream@lisiecrin.insa-lyon.fr I believe that this announce may be relevant to the KDD list. I have written with three collegues a paper on the use of SQL code embedded in application programs to improve a RDB reverse engineering process. A preliminary version will be available in the proceedings of the 13th Int. Conf. on ER Approach to be held in Manchester in December. A revised version of this paper can be asked via e-mail to: Jean-Marc.Petit@lisiecrin.insa-lyon.fr The abstract follows: --------------------------------------- This paper describes a technique that supports Extended Entity-Relationship (EER) schema extraction from an operating relational database. In this reverse engineering context, the two major decisions that have to be taken are the assumptions on the initial schema and where data semantic is extracted from. Original aspects of our method are manifold. First, it is based on realistic assumptions, e.g., there is no constraints on the uniqueness of the attribute names. Second, the dependencies between the attributes are not supposed to be known a priori. The method starts from the database schema as stored in the DBMS dictionary, i.e., the relation names, the attribute names and their basic characteristics (uniqueness of value, not null values). Finally, semantics extraction is supported by available queries analysis. It is shown how specific kinds of query can help to build an EER schema including is-a relationships and aggregates. Keywords: Database reverse engineering, Semantics discovery, Conceptual modelling, Relational model, Extended Entity-Relationship model. --------------------------------------- I would appreciate any feedbacks and comments. Thank you. Jean-Marc Petit Laboratoire d'Ingenierie des Systemes d'Information INSA de Lyon, Batiment 501 20 av. Albert Einstein 69621 Villeurbanne cedex FRANCE -------------------------------------------- Date: Mon, 21 Nov 1994 11:11:45 +0100 From: S.H.Kloosterman@research.ptt.nl (Sytse Kloosterman) Subject: Data mining terminology To: kdd@gte.com Hi, I'm working on Knowledge Discovery in Databases (KDD) which is also known as data mining. Based on known literature, we distinguish four types of KDD tasks: * clustering, * class description (which can be divided in summary and discrimination), * dependency analysis, * deviation detection. When one tries to interrelate these tasks or when one tries to formulate their differences, it turns out that this is not easy to do. A specific task of one catergory can sometimes also be seen as belonging to another category. Well, here are my questions: * what are the precise definitions of the above mentioned tasks, * how are they related, and * what are the differences? To give a first attempt, I think that clustering is the process of grouping objects. Objects are grouped because they are similar enough. Now, based on these clusters, the other tasks can be viewed as functions defined on one or more clusters returning a pattern which is a representation of a summary of the cluster, a trend in a series of clusters, etc. Please shoot! Thanks & regards, Sytse. (GPS: I basically agree with the above definitions. A very extensive terminology of data mining terms was compiled by Willie Kloesgen and Jan Zytkow in http://orgwis.gmd.de/explora/terms.html). -------------------------------------------- From: John Peter Lee Subject: Re: DB issues for Data Visualization Date: Fri, 25 Nov 1994 14:05:32 -0500 (EST) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ JP Lee Institute for Visualization and Perception Research jlee@cs.uml.edu University of Massachusetts at Lowell (508) 934-3384 1 University Ave. Lowell, MA 01854 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Book Announcement: "Database Issues for Data Visualization" J.P. Lee and G.G. Grinstein, editors Springer-Verlag Lecture Notes in Computer Science, vol 871 Available through your bookseller or Springer-Verlag Description: Scientific Data Visualization is a methodology of portraying complex data in meaningful ways. The doctrine of "insight, not numbers" is manifested in beautiful imagery that pushes the envelope of computational power and graphic display techniques. Unfortunately, current visualization systems provide primitive data management facilities, and are geared primarily towards image production. "Data Mining" is visual, not algorithmic. Such systems, and ultimately the end user, will be better served with facilities to interrogate data for associations, relationships, and structure. The integration of database management system (DBMS) and knowledge discovery in databases (KDD) technologies with visualization is a logical conclusion. Many obstacles stand in the way, however, from logical models to performance issues. The workshop on Database issues for Data Visualization held at the IEEE Visualization'93 conference in San Jose, California was a first attempt at bringing together a number of researchers in both fields to discuss the issues and report on possible avenues requiring research activity. The book contains enhanced submissions from most of the participants describing current research projects, as well as the complete subgroup reports on Data Models, Systems Integration, and Interaction, User Interfaces, and Presentation. Table of Contents: Workshop Description Workshop Participants Workshop Subgroup Reports Developing a Data Model System Integration Issues Interaction, Interfaces and Presentation Data Models for Scientific Data "The VIS-AD Data Model: Integrating Metadata and Polymorphic Display with a Scientific Programming Language" Bill Hibbard, Charles Dyer and Brian Paul University of Wisconsin - Madison "An Extended Schema Model for Scientific Data" R. Daniel Bergeron, David Kao, Ted Sparr University of New Hampshire - Durham "Data Integration for Visualization Systems" Karen Ryan Cray Research, Inc. - Eagan MN "Inherent Logical Structure of Computational Data: its Role in Storage and Retrieval Strategies to Support User Queries" Sandra Walther Rutgers University - Piscataway NJ Systems Integrations Issues "Database Management for Data Visualization" Peter Kochevar kochevar@sdsc.edu San Diego Supercomputer Center and DEC "Data Exploration Interactins and the ExBase System" J.P. Lee Institute for Visualization and Perception Research University of Massachusetts at Lowell "Database Requirements for Supporting End-User Visualizations" Venu Vasudevan Motorola Advanced Design Technology Laboratory - Tempe AZ "A System Architecture for Data-Oriented Visualization" Andreas Wierse, Ulrich Lang, R. Ruhle University of Stuttgart, Germany "A Hyperspectral Image Analysis Workbench for Environmental Science Applications " John Christiansen, M. Woyna, D. Zawada, K Simunich Argonne National Laboratory - Argonne IL User Interfaces / Interaction / Presentation Issues "Design of a 3D User Interface to a Database" John Boyle, J.E. Fothergill, P.M.D. Gray University of Aberdeen, Scotland "Visualizing Reference Database" Stephen G. Eick, Eric E. Sumner, Graham J. Wills AT&T Bell Laboratories - Naperville, IL "A 3D Based User Interface for Information Retrieval Systems" Matthias Hemmje German Natioinal Center for Computer Science "Visually Supporting Data Mining of Large Existing Databases" Daniel A. Keim, H.P. Kriegel University of Munich, Germany -------------------------------------------- From: Agnar.Aamodt@ifi.unit.no Subject: SCAI'95 - 2nd CfP SCAI'95 FIFTH SCANDINAVIAN CONFERENCE ON ARTIFICIAL INTELLIGENCE Trondheim, Norway, May 29 - 31, 1995 2. CALL FOR PAPERS and PROGRAM HIGHLIGHTS The biennial Scandinavian Conference on Artificial Intelligence is the international, open Scandinavian forum for scientific exchange and presentation of AI research and development. The conference language is English. The aim of the conference is to cover all aspects of AI, and to bring together basic and applied research. The technical program will include paper and poster presentations, invited talks and panels. An award will be given to the best student paper. The major theme for SCAI'95 will be "Theory meets Practice", with facilitation of feedback from real world applications to the researchers as a central goal. Industry is particularly encouraged to submit papers. The fifth SCAI is hosted by the University of Trondheim and SINTEF DELAB, in cooperation with the Norwegian AI Society, NAIS. Submission of papers -------------------- Authors are requested to submit 5 hard-copies of papers written in English. Submitted papers should be unpublished and present original work. Papers should be double-spaced and not exceed 6000 words. Each copy of the paper should include a separate title page containing the title, full names, postal addresses, phone numbers and e-mail addresses of all authors, an abstract of 100-200 words and an indicator whether a paper or poster presentation is preferred. Papers should be sent to ------------------ SCAI'95 Agnar Aamodt Dept. of Informatics, College of Arts and Science The University of Trondheim, N-7055 Dragvoll, NORWAY email: agnar@ifi.unit.no, fax: +47-73591733, phone: +47-73591838 / -1840 or SCAI'95 Jan Komorowski Knowledge Systems Group Dept. of Computer Systems and Telematics O.S. Bragstads plass 2E The Norwegian Institute of Technology The University of Trondheim N-7034 Trondheim, NORWAY e-mail: Komorowski@idt.unit.no fax: +47-73594466, phone: +47-73594567 Key Dates --------- January 10, 1995 - Papers due. February 25, 1995 - Notification of acceptance or rejection March 25, 1995 - Camera ready paper due Preliminary program ------------------- Sunday 28. May -------------- 1800 - 2000 Registration Reception Monday 29. May -------------- 0830 - 0930 Registration 0930 - 1230 Tutorial 1 Evolutionary computation Prof. Zbigniew Michalewicz, University of North Carolina at Charlotte, USA 1230 - 1400 Lunch 1400 - 1700 Tutorial 2 The successful application of modern artificial intelligence technologies Dr. Robert Milne, Intelligent Applications Ltd., England Tuesday 30. May --------------- 0900 - 1000 Invited speaker Exploring design space and niche space Prof. Aaron Sloman, The University of Birmingham, England 1000 - 1030 Coffee 1030 - 1230 Paper presentations 1230 - 1400 Lunch 1400 - 1530 Paper presentations 1530 - 1600 Coffee 1600 - 1700 Paper presentations 1715 - 1830 Norwegian AI Society Annual meeting 2000 Conference dinner Wednesday 31. May ----------------- 0900 - 1000 Invited speaker Synthesis of adaptive decision systems from experimental data Prof. Andrzej Skworon, Poland, Warsaw University 1000 - 1030 Coffee 1030 - 1230 Paper presentation 1230 - 1400 Lunch 1400 - 1530 Panel discussion Rates ----- Conference: NOK 2000 Not member of SAIS, DAIS, NAIS, FAIS: NOK 2250 Tutorials: NOK 500 Late registration (after April 15, 1994) Conference: NOK 2250 Not member of SAIS, DAIS, NAIS, FAIS: NOK 2500 Tutorials: NOK 750 NOTE: The conference rate includes reception, 2 lunches, conference dinner and coffee. The tutorial rate includes both tutorials, lunch and coffee. Program committee ----------------- Agnar Aamodt, University of Trondheim/AVH, co-chair Jan Komorowski, University of Trondheim/NTH, co-chair Tore Amble, University of Trondheim/NTH Bernt Bremdal, Bremdal Technology Services, Asker Roar Fjellheim, Computas Expert Systems, Sandvika Steffen Leo Hansen, Dept. of Computational Linguistics, Frederiksberg Johan Moller Holst, Norsk Hydro, Bergen Sture Hagglund, Linkopings Universitet Carl Gustaf Jansson, Stockholm Universitet Andrew Jones, Institutt for Rettsinformatikk, Oslo Mette Kloster, SINTEF Informatikk, Oslo Aarno Lehtola, VTT/TTE, Laboratory of Information Processing Morten Lind, Danmarks tekniske universitet Mihhail Matskin, University of Trondheim/NTH Brian Mayoh, Aarhus University Jorgen Fischer Nilsson, Danmarks tekniske universitet Erik Sandewall, Linkopings Universitet Markku Syrjaenen, Tekniska Hogskolan i Helsingfors Ingeborg Solvberg, University of Trondheim/AVH and SINTEF DELAB Henry Tirri, Helskinki University Enn Tuygu, KTH, Stockholm Universitet Erling Woods, SINTEF Reguleringsteknikk, Trondheim Conference organizing committee ------------------------------- Inge Nordbo, SINTEF DELAB, Trondheim, co-chair Arvid Holme, University of Trondheim/AVH, co-chair Conference secretariat ---------------------- SCAI'95 Inge Nordbo SINTEF DELAB, N-7034 Trondheim, Norway fax: +47 73 53 25 86 e-mail: scai95@delab.sintef.no Welcome to Trondheim ! ---------------------- Trondheim is the third largest city and the technical capital of Norway. Trondheim was already a center of power in Norway in 997, when the Viking king Olav Tryggvason founded a trading town at the mouth of the Nid river and built a castle there. Trondheim became Norway's first capital. Modern Trondheim features culture, research and education, besides trade and industry. The Norwegian Institute of Technology (abbreviated in Norwegian: NTH) was established here in 1910 and is the only technical university in the country. A major technological environment has grown up around it, larger than any other technological center in Norway. The other major part of the University is the College of Arts and Science (AVH), which has grown from a rather small college 25 years ago, to its present size equal to NTH. With its 16000 students, the University of Trondheim is the second largest in Norway. Close to the University there is Scandinavia's largest foundation for scientific and industrial research - SINTEF - with 2200 employees performing contract research and development for industry and public sector. Take the opportunity to participate in both technical and cultural events during SCAI'95. We look forward to seeing you in Trondheim! ===============================