Knowledge Discovery Nuggets Index


To
KD Mine: main site for Data Mining and Knowledge Discovery.
Here is how to subscribe to KD Nuggets
Past Issues: 1997 Nuggets, 1996 Nuggets, 1995 Nuggets, 1994 Nuggets, 1993 Nuggets


Knowledge Discovery Nuggets 97:21, e-mailed 97-06-30

News:
* Othar Hansson, Great Moments in Data Mining (#1 in a series)
* Ron Webb, Data Mining Benchmarking Study By The APQC
  • http://www.apqc.org

  • Publications:
    * G. McKiernan, Data Mining and Knowledge Discovery in MARC Databases
  • http://www.public.iastate.edu/~CYBERSTACKS/4T9R.htm

  • * Sal Stolfo, Report: Towards the Digital Government of the 21st Century,
  • http://www.isi.edu/nsf

  • Siftware:
    * Ronny Kohavi, MineSet Quicktime movies available
  • http://www.sgi.com/Products/software/MineSet/movies.html

  • * Abraham Meidan, WizRule 3,
  • http://www.wizsoft.com

  • Positions:
    * Glenn Stone, PostDoc, Sydney, Australia
  • http://www.dms.CSIRO.AU/~gstone

  • Meetings:
    * NADA LAVRAC, ILP Week in Prague, 15-20 Sept. 1997
  • http://labe.felk.cvut.cz/ai_group/kazakov/ilp97/

  • --
    Data Mining and Knowledge Discovery community, focusing on the
    latest research and applications.

    Submissions are most welcome and should be emailed, with a
    DESCRIPTIVE subject line (and a URL) to gps.
    Please keep CFP and meetings announcements short and provide
    a URL for details. Submissions may be edited for space.

    To subscribe, see
  • http://www.kdnuggets.com/subscribe.html


  • KD Nuggets frequency is 3-4 times a month.
    Back issues of KD Nuggets, a catalog of data mining tools
    ('Siftware'), pointers to Data Mining Companies, relevant Websites,
    Meetings, and more is available at Knowledge Discovery Mine site
    at
  • http://www.kdnuggets.com/


  • -- Gregory Piatetsky-Shapiro (editor)
    gps

    ********************* Official disclaimer ***************************
    All opinions expressed herein are those of the contributors and not
    necessarily of their respective employers (or of KD Nuggets)
    *********************************************************************

    ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    'I can do 12 months work in 9 months, but not in 12 months'
    Brij Masand
    (commenting on the need to take a longer vacation).

    [Note: I will be taking a much shorter vacation, July 3-20, and
    will not be checking email or sending KD Nuggets at this time. GPS]


    Previous  1 Next   Top
    Date: Sun, 15 Jun 97 13:47:18 PDT
    From: othar@Thinkbank.COM (Othar Hansson)
    Subject: Great Moments in Data Mining (#1 in a series)

    Amazon.com now has a clustering feature. On many of their book-blurb
    pages, they will point you to three books commonly purchased by readers
    who purchased that book. Here's one such cluster that made me laugh:

    > Only the Paranoid Survive : How to Exploit
    > the Crisis Points That Challenge Every
    > Company and Career
    > by Andrew S. Grove
    > ...
    > Check out these titles! Readers who bought Only the Paranoid Survive
    > also bought:
    >
    > The Dilbert Principle : A Cubicle's-Eye View of Bosses, Meetings,
    > Management Fads & Other Workplace Afflictions; Scott Adams
    > The Road Ahead; Bill Gates, et al
    > Dogbert's Top Secret Management Handbook; Scott Adams

    So there you have it: Dogbert, Dilbert, Bill and Andy Grove. $1 prize
    to the best name for that cluster. 'Well-known business cartoon
    characters'?

    Othar Hansson --------------------------------
  • http://www.Thinkbank.COM/

  • Thinkbank, Inc. [voice] +1 510.558.8800
    1678 Shattuck Avenue, Suite 320 [ fax ] +1 510.558.8700
    Berkeley, CA 94709-1631 othar@Thinkbank.COM

    [P.S. I checked the Amazon.com on June 30, and the clustering has changed
    -- alas, the sparkling nugget of discovery may wash away in the torrent of
    change ... GPS]

    Previous  2 Next   Top
    From: Ron Webb (ronw@apqc.org)
    Subject: Data Mining Benchmarking Study By The APQC (www.apqc.org)
    Date: Fri, 20 Jun 1997 17:44:28 -0500

    KD Nuggets subscribers,

    The American Productivity & Quality Center (APQC) is conducting a benchmarking study on transforming customer data into information. It will include a large component related to data mining. It will deal with learning how best practice organizations manage customer data and information. For a copy of the proposal for the study with the full scope delineated, go to www.apqc.org where you can view or download it.

    In a nutshell the scope is:

    Organizational enablers - what do organizations which manage customer data and information well have in place organizationally (culture, politics, 'soft' stuff, etc.) to enable a best practice status.

    Technological enablers - this is the biggest piece of this study. How do best practice organizations
    *gather and store customer data (warehousing)
    *transform this data into information (mining)
    *get this information to the correct person within the organization.

    Leveraging customer information - how do best practice organizations leverage all these practices to impact the bottom line, customer retention, marketing efforts, etc.

    The kick-off of the study is August 22, 1997 and it will end on December 16, 1997. Let me know if I can help you get more information.

    Ron Webb
    Project Manager
    ronw@apqc.org
    713-685-4634



    Previous  3 Next   Top
    Date: Thu, 19 Jun 1997 16:11:32 -0400
    From: Mike Blundin (mblundin@cirrusrec.com)
    Subject: Datasage (Cirrus Recognition) Press Release

    BW1044 JUN 18,1997 4:58 PACIFIC 07:58 EASTERN

    ( BW)(DATASAGE)

    Datasage raises $2.8 million in equity financing
    from OneLiberty Ventures and Sigma Partners

    Business/Technology Editors

    BOSTON--(BUSINESS WIRE)--June 18, 1997-- Datasage, Inc., the
    leader in production data mining solutions (formerly Cirrus
    Recognition Systems) announced today that it has completed $2.8
    million in equity financing to expand its efforts in marketing and
    sales of its flagship product, Datasage(TM).
    'We are very pleased to have OneLiberty Ventures and Sigma
    Partners aboard,' said Datasage President and CEO, David Blundin.
    'Top tier venture financing will allow us to rapidly build our sales
    and marketing so we can expand our vision for production data mining
    in the marketplace.'
    'Data Mining technology is rewriting the book on how data is
    valued and leveraged at the corporate level,' said John Mandile of
    Sigma Partners.
    'We are very excited about the opportunity to invest in Datasage
    and their production data mining technology. The company has the
    potential to be the leading vendor in this explosive new market,'
    added Duncan McCallum of OneLiberty Ventures.
    Datasage provides critical software architecture and data mining
    tools that allow corporations to deploy data mining technology
    against production data sources. So-called desktop or micro-mining
    data mining tools enable analysts to discover new information in
    subsets of corporate data. Datasage allows analysts and IT
    departments to take those data mining models and seamlessly deploy
    them against live production data sources. This allows corporations
    to move beyond short term gains and realize the full strategic
    business benefit of data mining technology. According to Blundin,
    'Production data mining is particularly valuable for data-intensive
    companies with many transactions or customers, such as large
    retailers and grocers.'
    'Finding patterns is only the beginning,' said John Lunny, Vice
    President of Engineering. 'An analyst can often build excellent
    models for customer and product behavior with only a few thousand
    examples using a desktop data mining tool and a PC. But when he or
    she goes back to rank a database of perhaps 25 million transactions,
    they hit the data mining gap. Tools don't scale, data connections
    are inadequate and the increase in computation makes it a major IS
    project. Datasage(TM) fills that gap.'
    'The greatest value inherent in a corporate data warehouse is
    realized by a production data mining solution,' said Blundin.
    'Unlike desktop data mining tools which may yield information during
    a one-off ad hoc analysis, production data mining is a repeatable
    process that delivers continuous value to an organization.'
    According to the market research firm Meta Group, headquartered
    in Stamford, Connecticut, the data mining segment of the decision
    support market will grow from $120 million in 1996 to more than $800
    million by 1998, and $4 billion by 2000, a compound annual growth
    rate in excess of 250 percent.

    $2.8 Million in Equity Financing

    OneLiberty Ventures and Sigma Partners co-managed the investment
    in Datasage.
    OneLiberty Ventures is a Boston-based, privately held venture
    capital firm that focuses on start-up and early stage technology
    investments. Formed in 1982 as Morgan, Holland Ventures, OneLiberty
    has established three funds totaling more than $150 million in
    committed capital. Recent investments include Brooks Fiber
    Properties, Cerulean Technology, Cytyc, Corex Technologies,
    Extraprise Group, Indus River Networks, Linguistic Technology,
    Riverton Software, Satara Networks, and Vista Medical Systems.
    Sigma Partners, with offices in Menlo Park, CA. and Boston, MA.,
    is a privately held venture capital partnership organized in 1984,
    with $185 million under management in three funds. Sigma's
    investments include Cascade Communications, Cerulean Technology,
    Chipcom, Electronic Arts, FileNet, Global Village Communications
    and Wellfleet Communications.
    John Mandile of Sigma Partners and Duncan McCallum of OneLiberty
    Ventures will be joining the Datasage board of directors. John has
    15 years experience in the high technology industry, most recently as
    the president and CEO of Vermeer Technologies, Inc., the developers
    of FrontPage, the leading Web authoring tool. Previously he was an
    early principal at SQL Solutions, and following their acquisition by
    Sybase, took responsibility for the new Systems Management Unit which
    he grew to $55 million in 30 months. Currently he is a director of
    FutureTense, Inc., Novera Software, Inc. and OnDisplay.
    Before joining OneLiberty Ventures, Duncan was at Haemonetics
    where his roles included Assistant to the President, Director of
    Blood Bank Marketing, and Business Development Manager. Previously
    he was a management consultant at the Boston Consulting Group and a
    Senior Member of the Technical Staff and Program Manager at Draper
    Laboratory. He holds BS and MS degrees from MIT and an MBA from
    Harvard Business School. His current investments include Extraprise
    Group and Cerulean Technology.

    About Datasage, Inc.

    Datasage, Inc., headquartered near Boston, MA, provides
    comprehensive data mining software solutions that enable corporations
    to turn raw data into business opportunity. Today corporate
    databases and data warehouses are growing dramatically, to the point
    where the wealth of customer and transaction data far outstrips
    capability to effectively use it. Datasage data mining software
    allows corporations to put data mining technology into full scale
    production so they can quickly put their data to work and realize
    payback on their growing data assets.
    Datasage(TM) is the first software solution to meet the rigorous
    demands of corporate data mining in a production environment. It
    delivers the performance, scalability, reliability, integration and
    architecture demanded by corporate IS departments for their critical
    systems. Datasage(TM) is based on the innovative Database-Centric
    Architecture(TM) which maintains robust, high speed data throughput
    between data sources and data mining technology. In addition to
    direct connectivity to industry standard RDBMS's such as Oracle,
    Informix, DB2 and SQL Server, the architecture offers a
    comprehensive set of open APIs that allow integration of new data
    sources, incorporation of existing business logic, and best-of-breed
    data mining algorithm selection. Datasage also includes advanced
    data mining algorithms (neural networks, rule induction, genetic
    algorithms, etc.) that are coupled with the architecture to allow the
    algorithms to deliver peak performance. The high throughput enables
    corporations to move from summary level data analysis to atomic level
    data mining - mining the lowest level of transaction detail for
    extremely accurate forecasting, customer scoring and anomaly
    detection.

    --30--mb/bos ls/bos

    CONTACT: Michael Blundin, Datasage, Inc.
    (617) 942-3600
    mblundin@datasage.com

    KEYWORD: MASSACHUSETTS
    INDUSTRY KEYWORD: COMED COMPUTERS/ELECTRONICS

    REPEATS: New York 212-752-9600 or 800-221-2462; Boston 617-236-4266 or
    800-225-2030; SF 415-986-4422 or 800-227-0845; LA 310-820-9473

    Today's News On The Net - Business Wire's full file on the Internet
    with Hyperlinks to your home page.
    URL:
  • http://www.businesswire.com



  • Previous  4 Next   Top
    Date: Sat, 14 Jun 97 14:29:23 CDT
    From: 'Gerry McKiernan' (JL.GJM@ISUMVS.IASTATE.EDU)
    Subject: _Data Mining and Knowledge Discovery in MARC Databases

    _Data Mining and Knowledge Discovery in MARC Databases_

    I am in the process of preparing a review article on
    the application of data mining and knowledge discovery in
    databases (KDD) to MARC record databases. These techniques
    are efforts to identify 'hidden' information within large
    data sets. It is my belief that there exists important, yet
    overlooked, relationships within MARC records created through
    the descriptive and subject cataloging process that have not
    been as fully exploited as they might. A good example would
    be to identify significant works on a subject based upon
    associations within records of a given publisher, author(s)
    and subject heading and call number.

    I am particularly interested in the application of Data
    Mining and KDD as potential enhancement to future online
    public access systems (e.g OPACs).

    For a description of an associated project, folk are invited
    to review my 4T9R(sm) URL. It contains not only a project
    description but links to an excellent review article from
    DBMS magazine and to the outstanding _KDNuggets Data Mining and
    Knowledge Discovery Resource center at its new URL. The URL for
    4T9R9(sm) is

  • http://www.public.iastate.edu/~CYBERSTACKS/4T9R.htm


  • As always, any and all suggestions, leads, critiques, opinion
    and/or positive (or negative) thoughts will be much appreciated.

    Regards,

    Gerry McKiernan
    Curator, CyberStacks(sm)
    Iowa State University
    Ames IA 50011

    gerrymck@iastate.edu

  • http://www.public.iastate.edu/~CYBERSTACKS/


  • 'I Know It's In There Somewhere'

    P.S. MARC is a bibliographic format standard that has been in use for
    over a generation. It offers a means by which bibliographic entitys (e.g.,
    books) can be consistently described and the associated data elements used
    in creating a variety of value-added databases (e.g the local library
    onlin ecatalog (OPAC).

    For examples of MARC records you may wish to search the Iowa
    State University OPAC that is directly accessible at:

  • http://www.lib.iastate.edu/scholar/db/icat.html



  • Previous  5 Next   Top
    Date: Thu, 26 Jun 97 13:51:32 EDT
    From: Sal Stolfo (sal@cs.columbia.edu)
    Subject: WORKSHOP report: Towards the Digital Government of the 21st Century

    Dear Colleague:

    I would like to bring to your attention a recent report I've had
    the pleasure of coauthoring with Herb Schorr of USC/ISI.

    Please point your favorite WEB browser to

  • http://www.isi.edu/nsf


  • to see the homepage for the Workshop on R&D Opportunities in Federal
    Information Services.

    The homepage now has a link to the issued Workshop Report (in multiple
    formats). A Press Release has also been issued.

    We have submitted this report to several key Federal Government agencies,
    and recently we presented the report to the Presidential Advisory Council
    on HPCC/IT/NGI.

    The report recommends that the Government fund a major new Applied Research
    Program to develop pilot projects with Federal agencies to invent
    the Digital Government for the citizens of the 21st Century.
    A number of applied research opportunities are involved including
    data mining over the huge collection of publicly available
    federally-held data.

    The effort has the support and encouragement from a broad range of
    Federal agencies, as well as the executive branch of government and
    we are now seeking broad support and involvement from the research community.

    If you have any interest in this activity, set a bookmark and please
    browse the web site routinely for further announcements about this effort.

    Please also forward this message to any other person that you
    believe may have an interest in this exciting opportunity.

    This research community and others will be informed of the outcome of
    this effort when it becomes known.

    best regards

    sal stolfo



    Previous  6 Next   Top
    Date: Thu, 19 Jun 1997 22:18:35 -0700
    From: Ronny Kohavi (ronnyk@starry.engr.sgi.com)
    Subject: MineSet Quicktime movies available

    Silicon Graphics' MineSet is well known for its data visualization
    capabilities: both for direct visualization and visualization
    of the models built by the analytical engines using MLC++.

    Two Crows corporation in their book Data Mining: Products,
    Applications & Technologies (1997) wrote
    We really liked MineSet. The visualization tools, particularly,
    are without parallel.

    In Data Management Strategies, Premier issue (1997), Curt Hall wrote
    MineSet's data visualization provides the best means I've seen to
    view and analyze generated rules, decision trees, and other models.
    Most impressive is its 'fly-through navigational' paradigm that
    displays decision trees in a well-organized 3D landscape format, so
    you don't end up overwhelmed with a screen full of rules or decision
    trees when you analyze large data sets.

    We now provide voice annotated quicktime movies so you can see examples
    of MineSet visualizations on any platform that has a quicktime movie player:

  • http://www.sgi.com/Products/software/MineSet/movies.html


  • For more information about MineSet and for a free evaluation copy for
    Silicon Graphics machines, see
  • http://www.sgi.com/Products/software/MineSet/

  • under more information, or contact us at mineset@postofc.corp.sgi.com

    --

    Ronny Kohavi
    Engineering Manager, Analytical Data Mining.


    Previous  7 Next   Top
    From: Abraham Meidan (Abraham@wizsoft.com)
    Subject: WizRule 3
    Organization: WizSoft Inc.

    WizSoft Inc. has released WizRule ver 3. WizRule is a data auditing
    applications that reveals cases to be audited in the data. WizRule reads
    the data, automatically reveals the rules that govern the data, and points
    at the deviations from the set of all the discovered rules as suspected
    errors.

    WizRule contains 4 algorithms:
    (1) An algorithm that reveals ALL if-then rules with no limit as to the
    number of conditions. (This algorithm is similar to IBM association rules
    algorithm, the input and the output are the same, but WizRule's algorithm
    is faster).
    (2) An algorithm that reveals mathematical formula rules, such as: Field A
    = Field B - Field C * Field D.
    (3) An algorithm that calculates the Level Of Unlikelihood of each case
    that deviates from the discovered rules.
    (4) An algorithm that reveals rules in the spelling of names, and points at
    strings that deviates from these rules.

    In its previous version WizRule pointed at each deviation from each rule as
    a suspected error. This method resulted in many cases of false alarms, i.e.
    deviations from rules that were not indeed errors. This problem has been
    solved in the new version by calculating the level of unlikelihood of each
    deviation. The level of unlikelihood signifies how unlikely a certain value
    of a certain field is, in regard to the set of ALL the discovered rules and
    the frequencies of the values. The higher the level of unlikelihood, the
    higher the probability that the case is indeed an error.

    A working demo, limited to files having up to 1,000 records, can be
    downloaded from
  • http://www.wizsoft.com


  • Best regards,

    Abraham Meidan
    President


    Previous  8 Next   Top
    Date: Mon, 23 Jun 1997 09:55:55 +1000
    From: Glenn Stone (Glenn.Stone@dms.csiro.au)
    Subject: PostDoc, Sydney, Australia


    Postdoctoral Fellowship
    CSIRO Mathematical & Information Sciences
    North Ryde NSW Australia


    Postdoctoral Fellowship - Term 3 years
    $AUS 41,000 - 47,000 + superannuation

    We wish to appoint a Post-Doctoral fellow to join a research team
    working on large and complex datasets. Your PhD in statistics,
    computer science, or related discipline or equivalent must have been
    awarded with the last three years.

    The team consists of Statisticians and Computer Scientists with
    interest in techniques for handling and cleaning large datasets,
    methods for modeling large datasets, wavelet methods for feature
    extraction, statistical visualisation and modeling multiple time
    series. The team is working on datasets coming from areas as diverse
    as motor vehicle insurance, finance, marketing and astronomy.

    The project would suit an applicant with experience analysing
    real-world datasets. You will need excellent computing skills in C or
    C++, or a statistical package such as S-Plus or SAS. Ability to work
    in a team and demonstrated ability to meet deadlines.

    The position is for a term of three (3) years. Further information
    about the position may be obtained from
    Dr Glenn Stone, tel +61 2 9325 3216 email: glenn.stone@cmis.csiro.au
    The job description and selection criteria may be obtained from
    Lucinda Wells, tel +61 2 9325 3277 email: lucinda.wells@cmis.csiro.au

    Applications for the position should address the selection criteria,
    be marked 'Confidential' quoting reference number MS 97/1, and be
    sent to: The Human Resources Manager, CSIRO, Division of Mathematical and
    Information Sciences, Locked Bag 17, North Ryde NSW 2113 by
    25th July, 1997.

    --

    Glenn Stone
    Statistician, CSIRO
    Locked Bag 17, North Ryde, NSW 2113
    Phone:+61 2 9325 3216, Fax:+61 2 9325 3200
    Glenn.Stone@cmis.csiro.au
  • http://www.dms.CSIRO.AU/~gstone




  • Previous  9 Next   Top
    Date: Fri, 27 Jun 1997 18:08:39 +0001
    From: NADA LAVRAC (Nada.Lavrac@ijs.si)
    Subject: ILP Week in Prague, 15-20 Sept. 1997

    Please find at the URL below the information on the ILP Week in Prague,
    Czech Republic:
    * General information on the ILP Week in Prague, 15_20 September 1997
    * International Summer School on ILP and KDD, 15-17 September 1997
    * 7th International Workshop on ILP, ILP-97, 17-20 September 1997
    * CompulogNet Area Meeeting on representation issues in reasoning
    and learning, 20 September 1997
    * Registration form and payment information

    ============================================================================
    ILP Week in Prague
    15-20 September 1997
  • http://labe.felk.cvut.cz/ai_group/kazakov/ilp97/

  • ----------------------------------------------------------------------------
    Since the very start of machine learning research, logic has been very popular
    as a representation language for inductive concept learning and the
    possibilities for learning in a first order representation have been
    investigated. Recently, this research has concentrated in the lively research
    field of Inductive Logic Programming (ILP), which studies inductive machine
    learning within the framework of logic programming.

    The ILP Week in Prague will consist of the following three events:

    15-17 September 1997 - The International Summer School on Inductive
    Logic Programming and Knowledge Discovery in
    Databases (ILP and KDD)
    17-20 September 1997 - The Seventh International Workshop on Inductive
    Logic Programming (ILP-97)
    20 September 1997 - CompulogNet Meeting of the Area 'Computational
    Logic and Machine Learning' (CL and ML)

    Schedule:
    ---------
    Monday, 15 September - ILP and KDD Summer School
    Tuesday, 16 September - ILP and KDD Summer School
    Wednesday, 17 September - ILP and KDD Summer School (morning)
    - ILP-97 Workshop (afternoon)
    - Welcome party (evening)
    Thursday, 18 September - ILP-97 Workshop
    - Guided-tours (afternoon - optional)
    Friday, 19 September - ILP-97 Workshop
    - Farewell dinner (evening)
    Saturday, 20 September - ILP-97 Workshop (morning)
    - CL and ML Area Meeting (afternoon)

    Program Organization:
    ---------------------
    Nada Lavrac and Saso Dzeroski, J. Stefan Institute, Ljubljana, Slovenia
    Email: Saso.Dzeroski@ijs.si, Nada.Lavrac@ijs.si


    Previous  10 Next   Top