KDD Nuggets Index


To
KD Mine: main site for Data Mining and Knowledge Discovery.
Here is how to subscribe to KD Nuggets
Past Issues: 1997 Nuggets, 1996 Nuggets, 1995 Nuggets, 1994 Nuggets, 1993 Nuggets


Knowledge Discovery Nuggets 97:25, e-mailed 97-08-22

News:
* GPS, KDD-97 Conference Report and KDD-97 evaluation
* Ismail Parsa, KDD-CUP-97 Summary
Publications:
* GPS, Byte 7/97 on Data Mining at your Desk,
  • http://www.byte.com/art/9707/sec17/sec17.htm

  • * Ronny Kohavi, Paper: Data Mining using MLC++,
  • http://robotics.stanford.edu/users/ronnyk/

  • Positions:
    * Laveen N. Kanal, Maryland: Intelligent Tutoring Systems Positions
    * Tom Warden, Menlo Park, CA: Applied Research Position - Data Mining
    Meetings:
    * Yves Kodratoff, EMCSR-98 Symposium: Applications of Data Mining,
    April 14-17 1998, Vienna, Austria,
  • http://www.ai.univie.ac.at/emcsr/

  • * Chiara Giammarco, DS-7 CONFERENCE ON DATABASE SEMANTICS and Data
    Mining, October 7-10, 1997, Leysin, Switzerland
  • http://lbdwww.epfl.ch/conferences/cfpds7/participation.html

  • * John R. Koza, GP-98 PhD Student Workshop,
    Madison, Wisconsin, July 22 - 25, 1998,
  • http://www.genetic-programming.org

  • * Russell Greiner, 5th AI and MATH Symposium,
    January 4-6, 1998, Fort Lauderdale, Florida
  • http://rutcor.rutgers.edu/~amai

  • --
    Data Mining and Knowledge Discovery community, focusing on the
    latest research and applications.

    Submissions are most welcome and should be emailed, with a
    DESCRIPTIVE subject line (and a URL) to gps.
    Please keep CFP and meetings announcements short and provide
    a URL for details.

    To subscribe, see
  • http://www.kdnuggets.com/subscribe.html


  • KD Nuggets frequency is about 3 times a month.
    Back issues of KD Nuggets, a catalog of data mining tools
    ('Siftware'), pointers to Data Mining Companies, Relevant Websites,
    Meetings, and more is available at Knowledge Discovery Mine site
    at
  • http://www.kdnuggets.com/


  • -- Gregory Piatetsky-Shapiro (editor)
    gps

    ********************* Official disclaimer ***************************
    All opinions expressed herein are those of the contributors and not
    necessarily of their respective employers (or of KD Nuggets)
    *********************************************************************

    ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Surf Guru's estimates that it'll take 10,958 years to browse the
    roughly 80 million public pages online (for the typical user who
    visits 20 pages a day)
    Annette Hamilton, ZDNet AnchorDesk
  • http://www4.zdnet.com/anchordesk/story/story_1050.html


  • Previous  1 Next   Top
    Date: Thu, 21 Aug 1997
    From: GPS (gps)
    Subject: KDD-97 Conference Report and Evaluation

    Knowledge Discovery and Data Mining 1997 (KDD-97) conference, held
    Aug 14-17 in Newport Beach, CA, was a great success, attracting over 600
    people.
    The participants were treated to 7 free tutorials from leading experts
    on topics which included an introduction to data mining, data visualization,
    text mining, OLAP, and statistical approaches. There were also a number
    of invited talks and reports from related conferences.
    Among the technical highlights were papers that presented a framework
    for comparing different classifiers under different cost conditions,
    insights into how to best combine different classifiers and how to
    analyze popular approaches like bagging, and successful application
    descriptions in areas such as molecular biology and climate analysis.

    Another highlight of the conference was KDD CUP competition, ably organized
    by Ismail Parsa (see next item).

    A large number of companies exhibited their data mining systems and helped
    to generate a good interaction between researchers and developers.

    This conference also attracted a large number of statisticians from the
    adjacent statistical meeting. KDD-98 will be held in New York, Aug 27-30,
    and will be co-located with VLDB-98. Full details on KDD-98 will be announced
    soon on KDnuggets and at JPL and AAAI web sites.

    Proceedings of KDD-97 can be obtained from AAAI Press
    see www.aaai.org/Press/Proceedings/KDD/1997
    If you attended KDD-97, please visit www-aig.jpl.nasa.gov/public/kdd97/evaluationform.html

    and email to gps@kstream.com the evaluation form
    (and please put 'KDD-97 evaluation' in the Subject).
    The comments will be kept anonymous and will help us
    to make the next conference even better!


    Previous  2 Next   Top
    Date: Tue, 12 Aug 1997 15:37:27 -0400
    From: iparsa@epsilon.com (Ismail Parsa)
    Subject: KDD-CUP-97 Announcement

    On behalf of the Knowledge Discovery Cup committee, I am pleased to
    announce the winners of this year's KDD-CUP.

    The GOLD MINER award goes to two contestants this year:

    Charles Elkan from University of California, San Diego
    with his software BNB, Boosted Naive Bayesian Classifier;
    and
    Urban Science Applications, Inc.
    with their software gain, Direct Marketing Selection System.

    These two contestants are jointly sharing the 1st and 2nd place.

    The BRONZE MINER award goes to the runner-up:

    Silicon Graphics, Inc
    with their software MineSet.

    The awards will be presented at KDD-97 in Newport Beach, CA on August
    16 between 5pm and 6pm. The testing methodology will also be
    presented during the ceremony.

    Thank you for participating.

    Ismail Parsa.

    KDD-CUP-97 PROGRAM COMMITTEE
    Vasant Dhar, New York University, NY, USA
    Ronen Feldman, Bar-Ilan University, Ramat-Gan, Israel
    Ismail Parsa, Epsilon Data Management, Burlington, MA, USA
    Gregory Piatetsky-Shapiro, Knowledge Stream Partners, Cambridge, MA, USA

    Performance Evaluation Criteria and Summary of Results:
    -------------------------------------------------------

    The contestants were evaluated based on their performance on the
    validation data set. The following performance metrics were
    considered:

    a) Gains chart, i.e., lift table listing the cumulative percent of
    responders recovered in the top quantiles of the file;
    b) Receiver operating characteristics (ROC) curve analysis and the
    area under the ROC curve;
    c) Statistical tests, i.e., analysis of variance and various
    correlational measures between the actual dependent variable and the
    predicted probability estimate/score.

    The results were almost always indicative of the 'photo finish'
    situation between the BNB software and the Gain software. MineSet
    software was the consistent runner-up following the top two constants
    with very close scores.

    Because the results were too close to call, we pursued additional
    analyses by repeatedly sampling at random from the validation data
    sets and compared the results. In terms of the performance metric,
    we settled on the gains charts as the ROC curve analysis results were
    closely mirroring these results. Final calls were made based on the
    combination of the performance in the top 10 and 40 percent of the
    file. The performance in the top 10 percent is looked at as a
    measure of precision while the performance in the top 40 percent of
    the file is related to the stability and marketing coverage criteria.

    An overall performance metric based on the average cumulative percent
    of responders recovered up to the 40th percentile of the validation
    data set as a whole is listed in Table 1. Table 2 and 3 list the
    average performance in the top 10 and 40 percent of the files
    repeatedly sampled at random from the validation data set.


    ----------------------------
    Table 1: Average Overall
    Performance
    ----------------------------
    Score*
    ----------------------------
    gain 99
    BNB 99
    MineSet 97

    ----------------------------
    *Rounded to the nearest digit.


    ---------------------------- ----------------------------
    Table 2: Average Performance Table 3: Average Performance
    in TOP 10% of File in TOP 40% of File
    ---------------------------- ----------------------------
    Score* Score*
    ---------------------------- ----------------------------
    BNB 100 gain 100
    gain 97 BNB 98
    MineSet 95 MineSet 98

    ---------------------------- ----------------------------
    *Rounded to the nearest digit.


    Previous  3 Next   Top
    Date: 7 Aug 1997 09:41:10 -0500 (EST)
    From: GPS (gps)
    Subject: Byte 7/97 on Data Mining at your Desk
    URL:
  • http://www.byte.com/art/9707/sec17/art1.htm


  • Byte 1997 (international edition) has a nice article on data
    mining on desktop machines,
  • http://www.byte.com/art/9707/sec17/art1.htm

  • and a related article on
    visual data mining (see
  • http://www.byte.com/art/9707/sec17/art2.htm


  • The article, by Peter Hofland and Jim Utsler,
    technology journalists at The Visual Consultancy Corporation in Amsterdam,
    reviews software from Isoft, SAS, IVEE, Cognos, and Information Discovery,
    and WizWhy.


    Previous  4 Next   Top
    Date: Sun, 10 Aug 1997 20:20:04 -0700
    From: Ronny Kohavi (ronnyk@starry.engr.sgi.com)
    To: KDD list (gps)
    Subject: Paper: Data Mining using MLC++, A Machine Learning Library in C++

    The journal version of the paper 'Data Mining using MLC++, A Machine
    Learning Library in C++,' which received the IEEE Ramamoorthy best paper
    award at Tool with AI '96 was accepted to IJAIT, the International
    Journal on AI Tools.

    A copy of the expanded paper can be found in
  • http://robotics.stanford.edu/users/ronnyk/

  • under publications.

    --
    Ronny Kohavi (ronnyk@sgi.com,
  • http://robotics.stanford.edu/~ronnyk

  • Engineering Manager, Analytical Data Mining.


    Previous  5 Next   Top
    Date: Fri, 8 Aug 1997 10:34:23 -0400 (EDT)
    From: kanal@cs.umd.edu (Laveen N. Kanal)
    Subject: Maryland: Intelligent Tutoring Systems Positions

    Two positions open now for a project to develop/implement an

    GENERIC AUTHORING TOOL KIT AND SHELL

    for

    INTELLIGENT TUTORING SYSTEMS

    Position 1 M.S. or Ph.d with strong
    background and experience in C++, Windows95,
    strong Math.background and skills, and
    experience or strong interest in Computer Aided Education and
    Training. Knowledge of UNIX, Artificial Intelligence,
    Neural Net and Fuzzy Logic techniques a plus.

    Position 2. M.S or Ph.D in Engineering, Computer Science or
    Operations Research, with strong background
    and experience in modeling non-linear phenomenon with Artificial
    Neural Systems; knowledge of dynamic programming, UNIX and
    experience or strong interest in Computer Aided Education and
    Training. Knowledge of Case-Based Reasoning,
    Fuzzy Logic/ Engineering, C++, and Windows95 a plus.

    Both positions require innovative individuals who are self-motivated, able to
    work well professionally in a team, able to write and speak well,
    and interact well with co-workers and customers.



    Send resume to L N K Corporation
    6811 Kenilworth Ave, Suite 306
    Riverdale, MD 20737-1333
    Fax: (301) 927-7193
    e-mail: nanak@lnk.com
    on resume use code: ITS

    L N K is an Equal Opportunity Employer

    Laveen Kanal, Ph.D
    Prof. Emeritus
    Univ. of Maryland
    President, L N K
    kanal@cs.umd.edu
    kanal@lnk.com



    Previous  6 Next   Top
    Date: Fri, 08 Aug 1997 16:52:30 -0600
    From: Tom Warden (TWARD@allstate.com)
    Subject: Menlo Park, CA: Applied Research Position - Data Mining


    ollowing is a job description for an opening we currently have. No URL
    is available.

    Applied Research Position - Data Mining

    The Allstate Research and Planning Center (ARPC), a unit of the Allstate
    Insurance Companies, is forming a group to conduct data mining
    research. The group's major objective is to evaluate, with a variety of
    techniques, the company's large operational databases for significant
    new information and relationships that can be utilized to improve
    Allstate's profitability. Areas of interest to the company for this
    research include claims fraud detection, underwriting models, pricing,
    customer retention, and investment portfolio management.

    The group will be comprised of individuals with strong backgrounds in
    data analysis, as well as individuals from within Allstate who possess
    both insurance knowledge and quantitative skills. Allstate is an Industrial
    Partner of the NCSA (National Center for Supercomputing Applications),
    whose resources will be available for use by the group.

    Qualified candidates will possess an advanced degree (Ph.D. preferred)
    in computer science, mathematics, statistics, operations research or a
    related field with a concentration in one or more of the following areas:
    machine learning, artificial intelligence, data visualization, and/or
    computational methods with very large datasets. Candidates should be
    inquisitive, creative problem-solvers who are interested in formulating
    and implementing solutions to complex business problems. They also
    need to work well both independently and in a collaborative environment.

    ARPC is located in Menlo Park, California. For over thirty years it has
    served as the basic research facility, pioneering many information-driven
    innovations, for Allstate, a publicly-traded Fortune 50 company with over
    $20 billion of revenues and $70 billion of assets.

    Allstate is an Equal Opportunity Employer.

    Please send resumes to:

    Gary Kerr (gkerr@allstate.com) or Tom Warden (tward@allstate.com)
    Allstate Research & Planning Center
    321 Middlefield Road
    Menlo Park, CA 94025

    Resumes may be faxed to: (415) 324-9347


    Previous  7 Next   Top
    Date: Tue, 19 Aug 1997 10:47:48 +0200 (MET DST)
    From: Yves.Kodratoff@lri.fr (Yves.Kodratoff@lri.lri.fr)
    Subject: EMCSR-98 Symposium: Applications of Data Mining

    URL:
  • http://www.ai.univie.ac.at/emcsr/


  • EMCSR 98 is a large international conference on Information Systems and
    Cybernetics held in Vienna (Austria). It comprises several symposia, one of
    which is devoted to 'Applications of Data Mining' (DM), chaired by Yves
    Kodratoff.

    The papers will be refereed by a panel of well-known specialists including
    specialists in Visualization, Data Bases, Statistics (and Data Analysis),
    Machine Learning, and Information Systems.

    All topics relevant to DM are welcome and will be carefully reported upon,
    but I would like to especially welcome papers that tend to fill up the gap
    between the different components of the KDD community. The following are
    examples of such gap-filling-up topics, by no means are they exhaustive.

    1 - We seek application papers describing results that have been obtained
    by a strong interaction between the miners and the application field
    specialist (describe how the interaction took place), applications that
    have been facilitated by the understandability of the software used
    (describe how understandability is achieved and why it has been so useful),
    application successful because they use clever means of selecting
    interesting patterns (describe precisely how you defined and measured
    interestingness).
    It is well-known that the academic value of application papers is sometimes
    stretchy. In such a case, if at least one referee acknowledges the
    potential of the submission, and if the paper author agrees, I will visit
    (preferably by email) the author and spend with him/her the time necessary
    to put his/her paper in an acceptable form.
    2 - Statistical packages and Neural Nets are great but their results are
    difficult to interpret. We thus look for descriptions of works issuing from
    the community of statisticians and neural nettists showing how they were
    able to improve on the user friendliness of their techniques; how they
    manage to help the field specialist to better understand the results of the
    statistical packages, how they help the user avoiding wrong interpretations
    of the results, in a somewhat automated manner.
    3 - Data Base queries are usually well understood, but they are strictly
    deductive. Thus, we are looking for description of work issuing from the
    Data Base community which describe how they introduced some kind of
    inductive, or uncertain reasoning inside the queries.
    4 - Visualization is always helpful but tends to consider as obvious that
    the user will find interesting patterns through visualization. Thus, we
    look for the description of work issuing from the Visualization community
    that explain carefully the link between the principles upon which their
    software relies, and the interestingness of the patterns visualized.
    5 - Symbolic Machine Learning techniques tend to produce understandable
    results, but they are hardly scalable to large applications. Thus, we would
    like to welcome symbolic ML papers explaining how they scaled their
    technique to a large application.
    6 - All existing techniques tend to represent field knowledge in an
    implicit way, hardly comprehensible to the field specialist. All attempts
    to use explicit representation (i.e., directly understandable to the field
    specialist) or for providing understandable translations of the encoded
    representation are also very welcome.

    Please submit 4 copies of a FULL PAPER (NOT a summary) of max. 6 pages
    (funt 10, double column).
    The dead line for submitting is: Oct. 17th 1997
    Send all your submissions to Vienna: EMCSR 98, Oesterrchische
    Studiengesellschaft fuer Kybernetik, A-1010 Wien 1, Schottengasse 3
    (Austria).
    In case you would like some discussion before submitting,
    email to me: YK@LRI.FR
    Acceptance/rejection announced by: Dec. 5th 1997.
    Last version due by: Jan 30th 1998.
    For more information about the whole congress look at:
  • http://www.ai.univie.ac.at/emcsr/.

  • To have more information about what I consider as important in KDD, consult
    my paper at KDD at my site:
  • http://www.lri.fr/equipes/ia/membres/yk.html

  • under 'miscellaneous; article Springer Verlag.'
    Conference dates & location: April 14-17 1998, Vienna.

    Previous  8 Next   Top
    Date: Fri, 8 Aug 1997 10:17:53 +0100
    From: Chiara Giammarco (Chiara.Giammarco@di.epfl.ch)
    Subject: cfp for DS-7

    URL:
  • http://lbdwww.epfl.ch/conferences/cfpds7/participation.html


  • CALL FOR PARTICIPATION
    7th IFIP 2.6 WORKING CONFERENCE ON DATABASE SEMANTICS (DS-7)
    SEARCHING FOR SEMANTICS: DATA MINING, REVERSE ENGINEERING, ETC.

    October 7-10, 1997
    Leysin, Switzerland
    -----------------------------------------------------------------------

    The IFIP 2.6 Working Group has established a tradition
    of highly appreciated Data Semantics (DS) conferences, where quality
    is preferred over quantity. DS-7, the seventh in the series,
    follows the succesful format of the previous conferences : it will be a
    four-day live-in working conference with limited attendance and
    extensive time for presentations and discussions.

    The topics for the 1997 DS Conference focus on those major problems that
    enterprises are currently facing: the reverse engineering of old legacy
    systems and applications, and the discovery of non-explicit knowledge
    hidden in existing data stores. Both issues need to be dealt with
    whenever database designers and application managers are committed to
    reusing existing data, for performance and economic reasons.

    Another major challenge for database/application designers is to be
    able to complement enterprise data with data from external sources,
    where the corresponding semantics is rarely fully available. Accessing
    data via the Web is just an example of input from external autonomous
    repositories.

    More hot topics are addressed in the contributions listed below.
    Demonstrations are also planned to illustrate some of the latest
    products in this domain.


    Conference Highlights:

    We are pleased to host an invited talk by one of the leading figures in
    data mining, Prof. JIAWEI HAN, from the Simon Fraser University,
    British Columbia, Canada.

    An invited talk will also be given by Prof. LETIZIA TANCA, from the
    Politecnico di Milano and University of Verona, Italy.

    A tutorial on the contribution of natural language processing techniques
    for text mining will be given by Dr. MARTIN RAJMAN, of the Swiss Federal
    Institute of Technology.

    full details on the conference at:

  • http://lbdwww.epfl.ch/conferences/cfpds7/participation.html


  • Previous  9 Next   Top
    Date: Sat, 9 Aug 1997 12:21:42 -0700 (PDT)
    From: 'John R. Koza' (koza@CS.Stanford.EDU)
    Subject: GP-98 PhD Student Workshop

    FIRST CALL FOR GRADUATE STUDENT
    PARTICIPATION IN A GENETIC PROGRAMMING
    WORKSHOP AND PRESENTATION SESSIONS AT
    GENETIC PROGRAMMING 1998 CONFERENCE (GP-98)

    CHAIR: Una-May O'Reilly, MIT Artificial Intelligence Lab

    PANEL (to date): David B. Fogel, Natural Selection Inc
    David E. Goldberg, University of Illinois
    John R. Koza, Stanford University
    Una-May O'Reilly, MIT Artificial Intelligence Lab

    DATE OF STUDENT WORKSHOP: Tuesday July 21, 1998

    LOCATION: Memorial Union Building, 800 Langdon Street,
    Madison, Wisconsin, USA (Same as site of the Genetic
    Programming 1998 Conference)

    DATES OF PRESENTATION SESSIONS: July 22 - 25
    (Wednesday - Saturday), 1998 (During GP-98 conference)

    DATE FOR SUBMISSIONS: Wednesday, January 21, 1998
    (Same date as CFP of GP-98)

    MORE DETAILED INFORMATION at:
  • http://www.genetic-programming.org



  • Previous  10 Next   Top
    Date: Wed, 13 Aug 1997 18:08:29 -0400 (EDT)
    From: Russell Greiner (greiner@scr.siemens.com)
    Subject: Re: 5th AI and MATH Symposium, 1st CFP
    URL:
  • http://rutcor.rutgers.edu/~amai


  • Fifth International Symposium on
    ARTIFICIAL INTELLIGENCE AND MATHEMATICS
    ---------------------------------------
    January 4-6, 1998,
    Fort Lauderdale, Florida


    APPROACH OF THE SYMPOSIUM
    -------------------------

    The International Symposium on Artificial Intelligence and Mathematics
    is the fifth of a biennial series. Our goal is to foster interactions
    among mathematics, theoretical computer science, and artificial
    intelligence.

    The meeting includes paper presentation, invited speakers, and special
    topic sessions. Topic sessions in the past have covered computational
    learning theory, nonmonotonic reasoning, and computational complexity
    issues in AI; this year, we plan to also include one on DataMining.

    INVITED TALKS will be given by
    ------------------------------
    Robert Aumann (Hebrew University, Israel)
    Joe Halpern (Cornell University)
    Pat Hayes (University of West Florida)
    Scott Kirkpatrick (IBM, Yorktown Heights)
    William McCune (Argonne National Laboratory)

    SUBMISSIONS
    -----------
    Authors must e-mail a short abstract (up to 200 words) in plain
    text format to amai@rutcor.rutgers.edu by SEPTEMBER 23, 1997,
    and either e-mail postscript files or TeX/LaTeX source files
    (including all necessary macros) of their extended abstracts
    (up to 10 double-spaced pages) to

    amai@rutcor.rutgers.edu

    or send five copies to

    Endre Boros
    RUTCOR, Rutgers University
    P.O. Box 5062
    New Brunswick, NJ 08903 USA

    to be received by SEPTEMBER 30, 1997. Authors will be notified of
    acceptance or rejection by OCTOBER 31th, 1997. The final versions
    of the accepted extended abstracts, for inclusion in the conference
    volume, are due by NOVEMBER 30, 1997.

    -----

    For more information (including sponsors, programme committee, etc.),
    please see
  • http://rutcor.rutgers.edu/~amai

  • or email to
    amai@rutcor.rutgers.edu



    Previous  11 Next   Top