Knowledge Discovery Nuggets 97:19, e-mailed 97-06-07

KDD Nuggets Index


To
KD Mine: main site for Data Mining and Knowledge Discovery.
To subscribe to KDD Nuggets, email to kdd-request
Past Issues: 97 Nuggets, 1996 Nuggets, 1995 Nuggets, 1994 Nuggets, 1993 Nuggets


Knowledge Discovery Nuggets 97:19, e-mailed 97-06-07

News:
* I. Parsa, KDD-97 Knowledge Discovery and Data Mining Tools Competition
* GPS, NY Times: Mining the Cosmos data for Extraterrestrials signs,
  • http://www.nytimes.com/library/cyber/surf/060497mind.html

  • * David Isherwood, AMEC Data Mining alliance,
  • http://www.attar.com/pages/amec.htm

  • Publications:
    * U. Fayyad, Data Mining and Knowledge Discovery journal, issue 2
  • http://www.research.microsoft.com/datamine

  • * George Paliouras, PhD thesis on refinement of event recognition systems,
  • http://www.cs.man.ac.uk/csonly/cstechrep/Theses/Paliouras/thesis.html

  • * A. Freitas, Ph.D. thesis on KDD available,
  • http://cswww.essex.ac.uk/SystemsArchitecture/DataMining/alex/thesis.html

  • Meetings:
    * Cristina Lopez, Data Warehousing Meeting, Aug 24-29, Boston
  • http://www.dw-institute.com

  • * Shirley, NSF Workshop on Mathematical Techniques to Mine Massive Data Sets
    July 12-15, 1997, University of Illinois at Chicago
  • http://www.lac.uic.edu/m3d-chicago.html

  • * Gordon, ICML-97 workshop: ML APPLICATION IN THE REAL WORLD,
    Nashville, TN, July 12th 1997,
  • http://www.aifb.uni-karlsruhe.de/WBS/ICML97/ICML97.html

  • --
    Data Mining and Knowledge Discovery community, focusing on the
    latest research and applications.

    Submissions are most welcome and should be emailed, with a
    DESCRIPTIVE subject line (and a URL) to gps.
    Please keep CFP and meetings announcements short and provide
    a URL for details.

    To subscribe, see
  • http://www.kdnuggets.com/subscribe.html


  • KD Nuggets frequency is 3-4 times a month.
    Back issues of KD Nuggets, a catalog of data mining tools
    ('Siftware'), pointers to Data Mining Companies, Relevant Websites,
    Meetings, and more is available at Knowledge Discovery Mine site
    at
  • http://www.kdnuggets.com/


  • -- Gregory Piatetsky-Shapiro (editor)
    gps

    ********************* Official disclaimer ***************************
    All opinions expressed herein are those of the contributors and not
    necessarily of their respective employers (or of KD Nuggets)
    *********************************************************************

    ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Though this be madness, yet there is method in 't.
    Shakespeare, Hamlet
    (not commenting on the typical process of knowledge discovery)

    Previous  1 Next   Top
    Date: Sat, 7 Jun 1997 02:01:46 -0400
    From: iparsa@epsilon.com (Ismail Parsa)
    Subject: Final CfP: KDD-97 Knowledge Discovery and Data Mining Tools Competition
    ----------------------------------------------------------------------
    FINAL CALL FOR PARTICIPATION

    KNOWLEDGE DISCOVERY CUP (KDD-CUP-97):

    A Knowledge Discovery and Data Mining Tools Competition


    to be held in conjunction with

    THE THIRD INTERNATIONAL CONFERENCE ON
    KNOWLEDGE DISCOVERY AND DATA MINING (KDD-97)

  • http://www-aig.jpl.nasa.gov/HyperNews/get/KDD97.html

  • ----------------------------------------------------------------------

    This year, for the first time, the KDD-97 Organization is organizing a
    Knowledge Discovery and Data Mining (KDDM) tools competition
    (KDD-CUP-97) in conjunction with the 3rd International Conference on
    Knowledge Discovery and Data Mining (KDD-97.)

    The Cup is open to all KDDM tool vendors, academics and corporations
    with significant applications. All products, applications, research
    prototypes and black-box solutions are welcome. If requested, the
    anonymity of the participants and their affiliated companies/
    institutions will be preserved. Our aim is not to rank the
    participants but to recognize the most innovative, efficient and
    methodologically advanced KDDM tools.

    Attendance at the KDD-97 conference is not required to participate in
    the CUP. Participants are required to demonstrate the performance of
    their KDDM tool in one or all of the following areas:

    1. Supervised Learning: Classification or Discrimination
    2. Unsupervised Learning: Clustering or Segmentation.

    In the interest of time, the regression or prediction category and
    other descriptive modeling techniques, such as the association rules,
    are not included in the competition this year.

    The registration deadline for the Cup and the release date for the
    training and validation data set(s) is June 19, 1997. All participants
    must send back the results along with a scoring code[1] by July 17th,
    one month prior to the KDD-97 conference. The scoring code will be
    used by the KDD-CUP-97 committee to independently validate the results.
    Each participant will receive the committee's evaluation of his/her/
    their performance by August 11, 1997.

    The winners will be determined based on a weighted combination of
    classification accuracy (or predictive power,) software novelty (or
    innovation,) efficiency (people and CPU time) and the data mining
    methodology employed. The top three performing tools in each category
    will be awarded Gold Miner, Silver Miner and Bronze Miner awards and
    they will be listed in the KD Nuggets web site
  • http://www.kdnuggets.com

  • until the beginning of the KDD-98 conference, unless the participants
    and their affiliated companies/institutions wish to remain anonymous.

    [1] The scoring code is a stand alone C or C++ callable program or
    hard code that carries out all the steps required to implement
    the learning algorithm outside the model building environment.
    In addition to the numeric values of the weights, it also
    includes preprocessing statements for treating missing values,
    transforming/normalizing/standardizing inputs, etc. It is
    ultimately used in computing the predicted value or output from
    raw data outside the modeling environment. For example, for the
    decision tree algorithms, the preprocessing code along with the
    'if-then-else' rules constitutes the scoring code.


    +-----------------+
    | Important Dates |
    +-----------------+

    - June 19, 1997: Registration deadline and data set release date
    - July 17, 1997: Participants turn-in the results along with the
    scoring code
    - August 11, 1997: Individual performance evaluations sent to the
    participants
    - August 14, 1997: Public announcement during the KDD-97 conference
    of the top three performing tools in each category.

    +------------------------------+
    | KDD-CUP-97 Program Committee |
    +------------------------------+

    Vasant Dhar, New York University, New York, NY, USA
    Ronen Feldman, Bar-Ilan University, Ramat-Gan, ISRAEL
    Ismail Parsa, Epsilon Data Management, Burlington, MA, USA
    Gregory Piatetsky-Shapiro, Geneve Consulting Group, Cambridge, MA, USA

    +---------------------+
    | EVALUATION CRITERIA |
    +---------------------+

    A. CLASSIFICATION OR DISCRIMINATION CATEGORY

    Although the predictive power, i.e., the classification accuracy, of
    the resulting model measured in terms of lift (the term 'lift' implies
    improvement over random or no prediction) will be the primary
    evaluation criterion in the classification category, the winner will
    be selected based on a weighted combination of all of the following:

    1) Software Novelty/Innovation, e.g., unified approach to analyses
    through the implementation of analytic metadata, integration of
    data mining with data visualization, integration with other systems
    in novel ways, user interaction, built-in intelligence, etc.

    2) Efficiency, i.e., people and CPU time

    3) KDD Methodology, including but not limited to:

    - Data Archaeology, including but not limited to:

    Data Hygiene (quality-control and cleaning)
    Identify and eliminate noise

    Preprocessing
    Identify and eliminate constants
    Identify and treat missing values
    Identify (and treat) outliers
    Identify (and treat) non-linearity
    Identify (and treat) non-normality
    Create derived features based on string-to-numeric conversions
    Create derived features based on dates
    Create derived features based on time series smoothing
    Discretize or bin continuous features
    Discretize or bin nominal features based on a criterion
    Create derived features based on feature interactions
    Create derived features based on transformations
    Identify feature measurement scales: nominal, continuous, etc.

    - Exploratory Data Analysis (EDA), including but not limited to:

    Collinearity screening (elimination of redundant features)
    Feature dimensionality reduction
    Feature subset selection
    Data visualization

    - Model Development and Implementation, including but not limited to:

    Application of data mining algorithm(s)
    Evaluation of alternative algorithms, modeling technologies
    Validation of results (to avoid over-fitting)
    Interpretability of extracted patterns
    Data visualization
    Return on investment (ROI) or back-end analysis
    Application of learned knowledge to the universe, i.e., scoring.

    B. CLUSTERING OR SEGMENTATION CATEGORY:

    In the clustering or segmentation category, the validity of the final
    solution will be determined based on a combination of the relevant
    items listed above and one or more of the following:

    - External evaluation, i.e., using samples from known clusters

    - Internal evaluation, i.e., using statistical or other measures
    to characterize the goodness of fit of the clustering solution

    - Replicability, i.e., using cross-validation samples

    - Relative criteria, i.e., comparison of cluster solutions obtained
    from alternative clustering algorithms applied to the same data
    set.

    Visualization of the final clustering solution will also be important.


    +-----------------------+
    | REGISTRATION BROCHURE |
    +-----------------------+

    To participate in the KDD-CUP-97, please complete the application form
    below and sent it in plain ASCII format to (e-mail preferred):

    +-----------------------------+
    | Ismail Parsa |
    | Epsilon Data Management |
    | 50 Cambridge Street |
    | Burlington MA 01803 USA |
    | |
    | E-mail: iparsa@epsilon.com |
    | Phone: (617) 273-0250*6734 |
    | Fax: (617) 272-8604 |
    +-----------------------------+


    Detailed information regarding the rules of the competition will be
    sent to the participants later.

    ---------------------------------- cut ---------------------------------

    KNOWLEDGE DISCOVERY CUP (KDD-CUP-97)

    Registration Brochure


    Competition category..........: (_) Classification or Discrimination
    (check all that apply) (_) Clustering or Segmentation


    Will you attend the KDD-97
    conference..................: (_) Yes (_) No


    Would you like to sponsor this
    event? (terms/benefits to be
    determined).................: (_) Yes (_) No


    Name of software/product/tool
    research prototype..........:


    Status of software/product/
    tool/research prototype.....: (_) Alpha (_) Beta (_) Production


    Release date of software/
    product/tool/research
    prototype (in YYMM format)..:


    Platform availability.........: (_) PC (_) Unix (_) Mainframe
    (check all that apply) (_) Parallel environment (_) Other


    Built-in KDDM methodology/
    technology..................: (_) Graphical User Interface (GUI)
    (check all that apply) (_) Data Access
    (_) Data Selection (sampling, etc.)
    (_) Data Preprocessing
    (_) Exploratory Data Analysis
    (_) Link Analysis (Associations,
    Sequences, etc.)
    (_) Clustering or Segmentation
    (_) Time Series Analysis
    (_) Classification or Discrimination
    (_) Prediction or Regression
    (_) Multiple Learned or Combined
    Models
    (_) Data Postprocessing
    (_) Data and Knowledge Visualization
    (_) Other, specify: _______
    _______


    Data mining algorithms........: (_) Supervised Neural Networks (MLP,
    (check all that apply and RBF, etc.)
    specify the algorithms) (_) Statistical Methods (Logistic,
    ^^^^^^^^^^^^^^^^^^^^^^ OLS, MARS, PPR, GAM, Nearest
    Neighbors, etc.)
    (_) Decision Trees (ID3, C4.5, CHAID,
    CART, etc.)
    (_) Hybrid Systems (Neuro-fuzzy systems,
    GA optimized neural systems, etc.)
    (_) Unsupervised Algorithms (Kohonen
    networks, K-means clustering, etc.)
    (_) Case-Based Reasoning
    (_) Associations and Sequence Discovery
    (_) Other, specify: _______
    _______


    Is your software/product/tool/
    research prototype:

    Freeware....................: (_) Yes (_) No
    Available for purchase......: (_) Yes (_) No
    if 'yes' then
    Price (optional, in US$)..:
    Number of sites installed.:


    Does your software/product/
    tool/research prototype
    have limitations, e.g.,
    number of variables and
    rows it can handle, etc.....: (_) No (_) Yes, please specify: _______


    Other relevant information....:


    PRIMARY CONTACT:

    Name..........................:
    E-mail Address................:
    Phone Number..................:
    Fax Number....................:
    Title.........................:
    Name of Company/Institution...:

    Mailing Address...............:


    SECONDARY CONTACT:

    Name..........................:
    E-mail Address................:
    Phone Number..................:
    Title.........................:
    Name of Company/Institution...:

    Mailing Address...............:

    ---------------------------------- cut ---------------------------------


    Previous  2 Next   Top
    Date: Wed, 4 Jun 1997
    From: GPS (gps)
    Subject: Mining the Cosmos ?

    URL:
  • http://www.nytimes.com/library/cyber/surf/060497mind.html.


  • New Yort Times Cyberedition Mind & Machines section (June 4, 1997)
    published this interesting article.

    by Ashley Dunn

    Breaking Down the Search for Extraterrestrials With Distributed Computing.

    Searching for signs of extraterrestrial life has been one of those
    quixotic ventures on the fringes of science whose chances of success
    with current technology is somewhere around slim to none.

    Radio telescopes have been able to reach deep into the cosmos, but they can
    analyze only a thin sliver of information for telltale energy spikes
    that could be a million-year-old beacon signal from an another
    civilization, or possibly an alien version of 'I Love Lucy' leaking into
    the cosmos.

    Every second, gigabytes of data are thrown away in these
    surveys (called SETI, for search for extra-terrestrial intelligence)
    because there just isn't enough computing power to take more than a
    rough cut at the whole universe.

    This waste of data and the allure of searching for extraterrestrial
    life was what attracted the attention of David Gedye, the director of
    online games for Starwave in Seattle. Gedye was scraping his mind for
    ways to involve the public in a mega-science project that would be fun,
    educational and significant. It was a hobby project unrelated to his
    work for Starwave, but one that had the potential to draw big numbers of
    Net users.

    Gedye, who had worked on distributed computing programs at
    Sun Microsystems, saw SETI (Search for Extraterrestrial
    Intelligence) as an appropriate project for global distributed
    computing. It might never payoff in our lifetimes, like the
    code-cracking efforts or prime number searches that have
    sprouted on the Internet, but it was interesting and educational.
    He named his project SETI@home.

    He found at the University of Washington a like-minded professor of
    astronomy, Woodruff Sullivan, who began to develop the idea of using
    tens of thousands of personal computers around the world to help re-sift
    the SETI data from the giant Arecibo radio telescope in Puerto Rico.

    ...
    (full text available at

  • http://www.nytimes.com/library/cyber/surf/060497mind.html


  • (Thanks to Michael Beddows for pointing this article. GPS)

    Previous  3 Next   Top
    From: David Isherwood (disherwo@attar.co.uk)
    Date: Tue, 3 Jun 1997 12:41:27 +0000
    Subject: AMEC Data Mining alliance

    This full text is available at:
  • http://www.attar.com/pages/amec.htm


  • AMEC signs strategic Data Mining alliance with Attar Software and Hart
    Consultants.

    The application of data mining in the oil and gas industries is being
    pioneered by AMEC Process and Energy in association with Attar
    Software and Hart Consultants. The three companies today signed an
    alliance agreement at AMEC's offices in London. AMEC's Elliott
    Cairnes explained: 'Data mining is an innovative approach to identify
    and explain independent patterns between sets of variable data. The
    resultant information can be used to improve the performance of oil
    and gas processes. In addition, this technology is equally applicable
    to a wider range of operational areas and provides a new opportunity
    for AMEC to deliver significant client benefits.'

    AMEC, Attar and Hart are now able to provide proven expertise in the
    application of advanced data mining techniques to oil and gas
    processes, to improve efficiency, provide understanding of complex
    processes, analyse performance and help identify problems, potential
    problems and opportunities in plant operation.

    Data mining is especially suited to complex processes and issues, for
    example where the underlying theory is not well understood. Examples
    include the analysis of drilling data and subsequent well performance,
    the analysis of oil in water, and similar. It can also be used to
    learn how process experts, plant operators or other key personnel make
    decisions. For example, the 'secrets' of the best shift can be learnt
    from records of their actions in response to various scenarios.

    AMEC Process and Energy Limited is part of AMEC p.l.c. the
    international engineering, construction and development group. One of
    the largest and most experienced companies in the North Sea, it is an
    international market leader providing a service capability that spans
    the full life cycle of offshore production facilities ranging from
    conceptual design and asset maintenance through to life cycle cost
    optimisation and decommissioning services. AMEC also enjoys a
    significant international reputation for value adding engineering,
    construction and maintenance of downstream related oil and gas
    terminals, refinery, petrochemical and nuclear plants, process plant
    and, in the environmental area, incineration and pyrolysis plants.

    Attar is a provider of advanced software technology with over ten
    years experience in data analysis. Its recent advances in the
    application of data mining using their Profiler software has been
    pivotal in the company being the software technology partner in the
    Pan European CRITIKAL project for large scale data mining, a project
    that is part funded by the European Community with a combined
    investment of $2 million.

    Hart Consultants are specialist process and energy consultants,
    providing innovative solutions to blue chip companies. They are
    frequent advisors to Government Agencies on energy, and in
    co-operation with Attar Software have pioneered the use of data mining
    in organisations such as BP, ICI, Carlsberg Tetley and Cleveland
    Potash.

    for further information please contact:


    Jeremy McTeague
    PR and communications executive
    AMEC Process and Energy Limited
    Tel: 44 (0)171 705 2561
    jeremy.mcteague@golden.amec.co.uk
  • http://www.apel.co.uk



  • David Isherwood
    Marketing Director
    Attar Software Ltd
    Tel: 44 (0)1942 608844
    disherwood@attar.co.uk
  • http://www.attar.com




  • Previous  4 Next   Top
    From: Usama Fayyad (fayyad@MICROSOFT.com)
    Subject: Journal DMKD - issue 2
    Date: Wed, 28 May 1997 11:34:12 -0700

    Issue 2 of the new journal: Data Mining and
    Knowledge Discovery has been finalized.
    You can access the abstracts and full text of
    the editorial at the journal's home page:

  • http://www.research.microsoft.com/datamine


  • Also, issue 1 is now available free on line
    from Kluwer's web server. Links to Kluwer's
    server are accessible via the above homepage
    or directly at:
  • http://www.wkap.nl/kapis/CGI-BIN/WORLD/kaphtml.htm?DAMISAMPLE


  • ===================================
    DATA MINING AND KNOWLEDGE DISCOVERY
    Volume 1, issue 2
    ===================================
    CONTENTS:
    --------

    Editorial
    Usama Fayyad, editor-in-chief

    ----------------------------------------------
    PAPERS
    ------
    BIRCH: A New Data Clustering Algorithm and Its
    Applications
    Tian Zhang, Raghu Ramakrishnan, Miron Livny

    Mathematical Programming in Data Mining
    O. L. Mangasarian

    A Simple Constraint-Based Algorithm for Efficiently
    Mining Observational Databases for Causal Relationships
    Gregory F. Cooper

    ----------------------------------------------
    BREIF APPLICATION SUMMARY
    -------------------------
    Visual Data Mining: Recognizing Telephone Calling Fraud
    Kenneth C. Cox, Stephen G. Eick, Graham J. Wills,
    and Ronald J. Brachman


    ================================================

    Usama Fayyad
    datamine@microsoft.com
    for more information on the journal, CFP, and
    to submit a paper, please see:
  • http://www.research.microsoft.com/datamine



  • Previous  5 Next   Top
    Date: Sat, 31 May 97 17:47:00 BST
    From: George Paliouras (paliourg@cs.man.ac.uk)
    Subject: PhD thesis on refinement of event recognition systems

    I have recently defended my PhD thesis, which is entitled:

    Refinement of Temporal Constraints in an Event Recognition System
    using Small Datasets

    I attach the abstract.

    If you would like to download the thesis, see:
  • http://www.cs.man.ac.uk/csonly/cstechrep/Theses/Paliouras/thesis.html


  • For more information about my work see:
  • http://www.cs.man.ac.uk/ai/George/mypage.html


  • George Paliouras


    =============================================================================

    Refinement of Temporal Constraints in an Event Recognition System
    using Small Datasets

    The central aim of this thesis is to develop novel approaches to the
    representation and the refinement of event recognition models. The
    event recognition system is viewed as a temporal expert system, which
    searches for interesting patterns in a stream of temporally indexed
    data. The format of the input stream is unusual in comparison to
    standard work on event recognition, such as speech and sound
    recognition. It consists of time-stamped events, rather than a set of
    signal properties measured at fixed time intervals. This format has
    only recently been studied in the area of temporal event recognition.

    This thesis proposes a new graphical representation which facilitates
    explicit modelling of time. The recognition model is a hierarchy of
    events, each defined as a sequence of subevents. A distinction is made
    between low-level events, used in the input data stream, and high-level
    events, defined by the model. Each event definition in the model
    constrains the duration and temporal association of subevents. This
    approach naturally handles overlapping events, which have been
    overlooked in event recognition systems that do not model time
    explicitly.

    Using this graphical representation, a novel method for refining the
    temporal constraints of a model is presented. The refinement of the
    model is based on a small training set, consisting of a sequence of
    low-level events and the high-level events which should be recognised.
    The small size of the data set does not allow the use of empirical
    learning methods. Instead, a knowledge refinement approach is adopted,
    which utilises the original model parameters to guide the refinement
    process. This approach differs from standard knowledge refinement
    methods, in that it can handle the temporal aspects of event
    recognition. Particular emphasis is given to the association of
    low-level to high-level events - information that is not provided in
    the data set.

    Two modes of refinement are examined in the thesis: full and partial
    supervision. The former requires the provision of training information
    for all of the high-level events in the model. This assumption is
    relaxed under partial supervision, where training information is
    provided only for the events at the highest level of the hierarchical
    model. The issue that arises under partial supervision is the correct
    distribution of the limited training information to all of the events
    in the model.

    The performance of the refinement method is evaluated on a real-world
    problem: the thematic analysis of the humpback whale song. The song of
    humpback whales has been extensively studied and analysed in the
    biological literature and data has been collected, in the form of tape
    recordings. An event recognition model is derived for the song and the
    refinement method is applied using a small set of songs. The results of
    the evaluation are very encouraging, showing that the system is able to
    improve significantly an initially inaccurate model, even with the use
    of very limited training data. This result suggests that the method is
    suitable for structured hierarchical models, such as that of the
    humpback whale song. Models of this type are used in a wide range of
    other event recognition tasks, such as fault diagnosis and image
    sequence analysis.


    Previous  6 Next   Top
    From: Freitas A A (freial@essex.ac.uk)
    Date: Mon, 2 Jun 97 20:28:06 BST
    Subject: Ph.D. thesis on KDD available in the web

    Dear Dr. Piatetsky-Shapiro,

    I would greatly appreciate if you could announce in KD
    Nuggets that the Ph.D. thesis titled:
    'Generic, Set-Oriented Primitives to Support Data-Parallel
    Knowledge Discovery in Relational Database Systems'
    is now available at the URL:
  • http://cswww.essex.ac.uk/SystemsArchitecture/DataMining/alex/thesis.html

  • The abstract of the thesis is appended to this message.

    Thanks,
    Alex
    ===========================================================
    Alex A. Freitas

    University of Essex
    Dept. of Computer Science
    Wivenhoe Park, Colchester, CO4 3SQ,
    United Kingdom
    Tel.: (44) (1206) 87-3333 ext. 3803
    Fax: (44) (1206) 87-2788
    e-mail: freial@essex.ac.uk
  • http://cswww.essex.ac.uk/projects/res/freial/web/alex.html

  • ==========================================================

    -----------------------------------------------------------------
    'Generic, Set-Oriented Primitives to Support Data-Parallel
    Knowledge Discovery in Relational Database Systems.'

    Abstract

    Efficiency and scalability are crucial issues in Knowledge Discovery
    in Databases (KDD), or Data Mining. This thesis addresses these issues
    by proposing a set-oriented, primitive-based framework for KDD that
    integrates three areas, namely: (a) Machine Learning and/or Statistics
    - particularly the Rule Induction (RI) and the Instance-Based Learning
    (IBL) paradigms; (b) Relational Database Systems; and (c) Parallel
    Database Servers (PDS).
    This integration is achieved by devising primitives (rather than
    algorithms) that capture the core, time-consuming operations of KDD
    algorithms and by exploiting data parallelism in the execution of these
    primitives. This leads to a significant speed up in the execution of
    KDD algorithms supported by the primitives.
    Two major characteristics of the primitives proposed in this thesis
    are their generality and their set-oriented nature. The primitives are
    generic in the sense that they underpin the central activity of a number
    of KDD algorithms. This is important, because there is no single 'best'
    KDD algorithm for all application domains and databases. Moreover, the
    set-oriented nature of the primitives paves the way for the efficient
    exploitation of data parallelism on PDS.
    The main contributions of this thesis are that it: (1) proposes a
    set-oriented, primitive-based framework for KDD and identify several
    benefits of this framework (not only improved efficiency and scalability,
    but also improved data re-use and software re-use, extensibility,
    data-privacy control, etc.); (2) proposes generic, set-oriented primitives
    for the RI and IBL paradigms; (3) shows how to use these primitives to
    achieve a roughly linear speed up when executing data parallel KDD
    algorithms on PDS; (4) identifies some kinds of algorithms and some
    input-parameter values of the proposed primitives that lead to an
    efficient exploitation of data parallelism; and (5) proposes extensions
    to the functionality of current PDS to improve the efficiency in the
    execution of KDD algorithms.


    Previous  7 Next   Top
    Date: Wed, 04 Jun 1997 10:24:06 -0700
    From: Cristina Lopez (cristina@airmail.net)
    Subject: Data Warehousing Meeting

    Third Annual Leadership Conference
    The Data Warehousing Institute (TDWI)
    August 24-29, 1997
    Hynes Convention Center, Boston, Massachusetts

    Business and technology professionals interested in data warehousing must
    attend TDWI�s Third Annual Leadership Conference. Over 100 one-hour
    presentations on hot topics for today�s data warehousing and data access
    professionals will be offered. For more information, log on to
    www.dw-institute.com or contact Cristina Lopez, (972) 480-9458, x125.


    Previous  8 Next   Top
    From: shirley@math.uic.edu
    Date: 3 Jun 1997 21:10:08 -0000
    Subject: NSF Workshop on Mathematical Techniques to Mine Massive Data Sets

    CALL FOR PARTICIPATION

    NSF Sponsored Tutorial Workshop on
    Mathematical Techniques to Mine Massive Data Sets

    July 12-15, 1997

    University of Illinois at Chicago

    Chicago, Illinois


    We are pleased to announce an NSF sponsored four-day workshop entitled
    'Mathematical Techniques to Mine Massive Data Sets' to be held on the
    campus of the University of Illinois at Chicago on July 12-15, 1997.

    The goal of the workshop is to introduce an invited group of mathematical
    scientists to tutorial material related to the data mining of massive data
    sets. Data mining is the automatic extraction and discovery of patterns,
    associations, changes, anomalies, and significant structures in large data
    sets. Large data sets generated by scientific, engineering, medical and
    business applications are becoming increasingly common. Developing
    algorithms which can uncover patterns in large data sets is an important
    mathematical challenge.

    If you would like to participate, please contact one of the organizers.
    Some travel support is available. Graduate students are also
    encouraged to participate.

    ---------------------------------------------------------------------------

    Workshop

    The workshop will be limited to 15 speakers and 30 invited participants.
    The primary goals of the workshop are:

    * To provide background and survey talks in order to introduce
    mathematical scientists to some of the mathematical and statistical
    techniques used in data mining.

    * To provide mathematical scientists with a 'flavor' for data mining by
    providing several case studies and some exposure to 'hands-on' demos
    and systems.

    * To begin a process to create a web-based digital library containing
    material related to data mining for the mathematical community.


    ---------------------------------------------------------------------------

    Structure of Workshop

    The four day workshop will include tutorial lectures on:

    * tree-based statistical techniques
    * graphical markov models
    * neural nets
    * model selection and model averaging
    * combinatorial techniques
    * logic-based techniques
    * data mining applications.

    In addition, there will a few advanced lectures, software
    demonstrations, and discussion sessions.

    The invited lecturers currently include:

    Michael Berry, University of Tennessee
    Ron Coifman, Yale Univeristy
    Herbert Edelsbrunner, University of Illinois at Urbanna Champaign
    Michael Jordan, MIT
    Heikki Mannila, University of Helsinki
    John McCarthy, Stanford University
    Vince Poor, Princeton University
    J. Ross Quinlan, University of Sydney
    Eric Ristad, Princeton University
    Stuart Russell, University of California at Berkeley

    Additional speakers are anticipated and will be included on our web site.
    ---------------------------------------------------------------------------

    Co-organizers:

    Robert Grossman, University of Illinois at Chicago and Magnify, Inc.
    Simon Kasif, University of Illinois at Chicago

    ---------------------------------------------------------------------------

    For more information, including registration and hotel
    information, please see:

  • http://www.lac.uic.edu/m3d-chicago.html


  • or email:

    m3d@lac.uic.edu

    Previous  9 Next   Top
    From: gordon@AIC.NRL.Navy.Mil
    Date: Fri, 30 May 97 13:16:57 EDT
    Subject: ICML-97 workshop: ML APPLICATION IN THE REAL WORLD:
    Call for Participation
    of the workshop on:

    ML APPLICATION IN THE REAL WORLD:
    METHODOLOGICAL ASPECTS AND IMPLICATIONS

    at the
    Fourteenth International Conference on Machine Learning (ICML-97)
    Nashville, TN, July 12th 1997

    WWW-page:
  • http://www.aifb.uni-karlsruhe.de/WBS/ICML97/ICML97.html



  • INTRODUCTION

    The workshop 'ML Application in the Real World: Methodological Aspects and
    Implications' will be held on Saturday, July 12, 1997 during the
    Fourteenth International Conference on Machine Learning (ICML-97) which
    will be co-located with the Tenth Annual Conference on Computational
    Learning Theory (COLT-97) at Nashville, Tennessee from July 8 through July
    12, 1997. This mailing lists the workshop objectives and format, the
    program and registration guidelines. Further information is to be found on
    the WWW pages of the Workshop and ICML.


    WORKSHOP OBJECTIVES AND FORMAT

    Applications of Machine Learning techniques to solve real-world problems
    have gained much interest over the last decade. Recent years
    have shown more and more interest in the application process. In spite
    of this attention, the ML application process is still lacking a
    generally accepted terminology, let alone commonly accepted approaches
    or solutions. Several initiatives, both conferences and workshops have
    been held concerning this topic.

    The workshop will emphasise the processes underlying the application of
    ML in practice. Methodological issues, as well as issues concerning the
    kinds and roles of knowledge needed for applying ML will form a major
    focus of the workshop. It aims at building upon some of the results of
    discussions at the ICML-95 workshop on 'Application of ML techniques in
    practice' and at the same time tries to move forward to a consensus
    regarding a methodology on the application of learning algorithms in
    practice.

    The workshop is meant for scientists and practitioners that apply ML and
    related techniques to solve problems in the real world. The workshop will
    contain three invited lectures, held by speakers with industrial and
    academic backgrounds. Five submitted papers will complete the workshops
    program. These papers cover the several aspects of the application
    process. In the afternoon, the workshop participants, the authors of
    papers and the invited speakers will be joined in two working sessions,
    with the aim to discuss and define research goals from both the industrial
    and academic point of view. The results of these discussions will be a
    first step in the direction of a comprehensive methodology for the
    development and support of real world applications of ML techniques.


    Registration for this workshop is possible via the ICML registration
    (see ICML WWW page:
  • http://cswww.vuse.vanderbilt.edu/~mlccolt/icml97/index.html.

  • After registering, participants are asked to fill in a short questionnaire.

    For further information your are referred to the workshop and ICML WWW
    pages. Additional questions can be send to MLApplic.ICML@ato.dlo.nl


    Previous  10 Next   Top