(text)
Rob Tibshirani, Modern Regression and Classification:
Chicago, IL: Nov. 23-24, 1998. http://stat.stanford.edu/~trevor/mrc.general.html
--
Knowledge Discovery Nuggets (tm) is an electronic newsletter focusing
on the latest news, publications, tools, meetings, and other relevant items
in the Data Mining and Knowledge Discovery field.
KD Nuggets is currently reaching over 4800 readers in 65+ countries
2-3 times a month.
Items relevant to data mining and knowledge discovery are welcome
and should be emailed to gps
in ASCII text or HTML format.
An item should have a subject line which clearly describes
what is it about to KDNuggets readers.
Please keep calls for papers and meeting announcements
short (50 lines or less of up to 80-characters), and provide a web site for
details, such as papers submission guidelines.
All items may be edited for size.
Back issues of KD Nuggets, a catalog of data mining tools
('Siftware'), pointers to data mining companies, relevant websites,
meetings, etc are available at KDNuggets Directory at http://www.kdnuggets.com/
********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you do not expect, you cannot find
the unexpected
Heracletes (thanks to Alex Tuzhilin)
On behalf of the KDD-CUP-98 Committee, I am pleased to announce
the winners of this year's KDD-CUP.
The GOLD MINER award goes to:
Urban Science Applications, Inc.
with their software GainSmarts.
The SILVER MINER award goes to:
SAS Institute, Inc.
with their software Enterprise Miner.
And finally, the BRONZE MINER award goes to:
Quadstone Limited
with their software Decisionhouse.
Honorable mentions go to
CARRL
and AMDOCS
The awards were presented at KDD-98 in New York, NY on August
29, 1998.
KDD-CUP-98 PROGRAM COMMITTEE
Summary of Evaluation Results
-----------------------------
KDD-CUP Committee evaluated the results based on the net revenue
generated on the hold-out or validation sample.
The measure we used is:
Sum (the actual donation amount - $0.68) over all records for
which the expected revenue (or predicted value of the donation)
is over $0.68.
This measure is simple, objective and a direct measure of profit.
Table 2 depicts the results. The participants are listed based on
the last column.
===============================================================
TABLE 1: Summary Statistics for Key Fields included in
Validation Dataset
===============================================================
Field N MIN MEAN STD MAX SUM
===============================================================
Response 96,367 0 5.1% 21.9% 1 4,873
Donation Amount 96,367 $0 $0.79 $4.73 $500.00 $76,090
Profit Damt-.68 96,367 -$0.68 $0.11 $4.73 $499.32 $10,560
===============================================================
TABLE 2: KDD-CUP-98 Summary of Evaluation Results: Total Profits
for Records with Predicted Donation > $0.68
===============================================================
Participant N* MIN MEAN STD MAX SUM**
===============================================================
GainSmarts 56,330 -$0.68 $0.26 $5.57 $499.32 $14,712
SAS 55,838 -$0.68 $0.26 $5.64 $499.32 $14,662
Quadstone 57,836 -$0.68 $0.24 $5.66 $499.32 $13,954
CARRL 55,650 -$0.68 $0.25 $5.61 $499.32 $13,825
Amdocs 51,906 -$0.68 $0.27 $5.69 $499.32 $13,794
#6 55,830 -$0.68 $0.24 $5.63 $499.32 $13,598
#7 60,901 -$0.68 $0.21 $5.43 $499.32 $13,040
#8 48,304 -$0.68 $0.25 $5.83 $499.32 $12,298
#9 56,144 -$0.68 $0.20 $5.32 $499.32 $11,423
#10 90,976 -$0.68 $0.12 $4.84 $499.32 $11,276
#11 62,432 -$0.68 $0.17 $5.13 $499.32 $10,720
#12 65,286 -$0.68 $0.16 $4.53 $224.32 $10,706
#13 64,044 -$0.68 $0.16 $4.99 $499.32 $10,112
#14 76,994 -$0.68 $0.13 $4.91 $499.32 $10,049
#15 54,195 -$0.68 $0.18 $5.29 $499.32 $9,741
#16 79,294 -$0.68 $0.12 $4.47 $249.32 $9,464
#17 51,477 -$0.68 $0.11 $4.00 $111.32 $5,683
#18 30,539 -$0.68 $0.18 $5.34 $499.32 $5,484
#19 50,475 -$0.68 $0.04 $3.44 $99.32 $1,925
#20 42,270 -$0.68 $0.04 $3.64 $99.32 $1,706
#21 1,551 -$0.68 -$0.03 $3.60 $53.32 -$54
===============================================================
* N is the number of for which the predicted donation amount > $0.68
** SUM=sum of (Actual Donation-$0.68) for all records with
predicted donation > $0.68
A major objective of the National Science Foundation (NSF) is to
improve the nation's capacity for intellectual and economic growth.
It does this by supporting the discovery of new knowledge and the
enhancement of a skilled workforce. Industry can outline new
technical challenges and assist in the support of academic
institutions. By serving as a catalyst for industry-university
partnerships, NSF helps ensure that intellectual capital and emerging
technologies are brought together in ways that promote economic growth
and an improved quality of life.
The GOALI initiative aims to synergize university-industry
partnerships by making funds available to support an eclectic mix of
industry-university linkages. Special interest is focused on
affording the opportunity for: (1) faculty, postdoctoral fellows and
students to conduct research and gain experience with production
processes in an industrial setting, (2) industrial scientists and
engineers to bring industry's perspective and integrative skills to
academe, and (3) interdisciplinary university-industry teams to
conduct long-term projects. This initiative targets high-risk/high-
gain research with a focus on fundamental topics that would not have
been undertaken by industry, new approaches to solving generic
problems, development of innovative collaborative industry-university
educational programs, and direct transfer of new knowledge between
academe and industry.
PROPOSAL SUBMISSION:
Proposals should refer to this Announcement by number, NSF 98-142 . The
proposal Cover Sheet (NSF Form 1207 in GPG) should identify the
disciplinary program area in the top left box of the 'NSF Organizational
Unit', and the initiative 'GOALI, NSF 98-142'; in the lower box assigned for
the 'Program Announcement/Solicitation no./Closing date'. When a specific
announcement applies for a division or a directorate, the respective
solicitation number must be first.
Ten (15) copies of the formal proposal should be sent to:
NSF 98-142/(NSF Program/Division)
Proposal Processing Unit, P60
National Science Foundation
4201 Wilson Blvd.
Arlington, VA 22230
SUBMISSION TARGET DATES:
Proposals submitted to the Programs in the Information and Intelligent
Systems (IIS) Division, including the Information and Data Management
Program (IDM), should be submitted within 2 weeks of the annual target
dates:
September 15
February 15
GENERAL INQUIRIES:
CISE Representative: John Cherniavsky jchernia@nsf.gov
TECHNICAL/TOPIC/PROGRAM INQUIRIES:
Technical Program Directors (for IIS Program Directors, see IIS Staff at http://www.cise.nsf.gov/iis/
I am including below a message from Dr. Maria Zemankova of NSF
that has important implications for the knowledge discovery
in databases and data mining community.
--------- -----------
I strongly recommend that you all look at the original Science
article dated August 7, 1998 that she refers to.
- sam
-------------------------------------------------
From: Maria Zemankova mzemanko@nsf.gov
Subject: (IP issues in US): Database Protection, Access to Information
There is an insightful article in Science, 1998 August 7, 786-787:
INTELLECTUAL PROPERTY: Database Protection and Access to Information
William Gardner and Joseph Rosenbaum
SUMMARY:
Gardner and Rosenbaum discuss the 'Collections of Information Antipiracy
Act,' which is currently before the U.S. Senate. If enacted into law, the
Act would significantly increase the property rights of database owners.
Proponents argue that pervasive networking of computers and weaknesses in
U.S. law mandate increased protection of the rights of information owners.
Opponents of the Act believe that the Antipiracy Act's expansion of
property rights in data could make information vital to science
prohibitively expensive or inaccessible.
W. Gardner is in the Departments of Medicine and Psychiatry of the
University of Pittsburgh School of Medicine and is the American Association
for the Advancement of Science (AAAS) Chair of the National Conference of
Lawyers and Scientists (NCLS), a joint committee of the AAAS and the
American Bar Association. E-mail: gardnerwp@msx.upmc.edu.
J. Rosenbaum is
an attorney with Pryor Cashman Sherman & Flynn, LLP, and is a member of the
NCLS.
---------
Access to the full article is free, at http://www.sciencemag.org/cgi/content/full/281/5378/786.
Needless to say, this Act could have a great impact on research in digital
libraries, information retrieval, databases, data mining, etc., etc., ...
--------------------------------------------------------------------------
Table of contents of the book 'DATA MINING Methods for KNOWLEDGE DISCOVERY'
by Cios/Pedrycz/Swiniarski. Kluwer Academic Publishers, 1998, ISBN
0-7923-8252-8. US$149.00 (20% discount when ordered before Oct. 15, 1998 by
the attendees of Conf. on Knowledge Discovery and Data Mining, NYC,
August 28-29, 1998).
See kluwer@wkap.com
for more information.
IEEE Engineering in Medicine and Biology Magazine will publish a
special issue on 'Medical Data Mining and Knowledge Discovery.' The
articles should describe knowledge discovery process in any medical
field, using any type of data. Of particular interest, however, are
papers describing results from databases of medical images.
The articles should be written very clearly, using a tutorial-like style, to
appeal to a broad audience that includes both medical professionals and
engineers.
It is planned that about eight articles will be accepted for the special issue
to be published by the end of 1999. If you are interested in submitting a
paper please contact Krzysztof Cios (kcios@eng.utoledo.edu),
guest editor for
the special issue, for more details. Full papers should be submitted to the
guest editor by October 1, 1998. Each paper will be reviewed by at least two
reviewers.
With best regards,
Krzysztof J. Cios
Professor of Bioengineering &
of Electrical Engineering and Computer Science
Department of Bioengineering
University of Toledo
Toledo, OH 43606-3390, U.S.A.
phone: (419)530-8167
fax: (419)530-8076
email: kcios@eng.utoledo.edu
Call for Book Proposals
for Kluwer Book Series on Genetic Programming
Kluwer Academic Publishers
Announces the
GENETIC PROGRAMMING
Book Series
Genetic programming is a technique for automatically
synthesizing computer programs to solve problems.
The Kluwer book series on genetic programming will cover
applications of genetic programming, theoretical foundations of
genetic programming, technique extensions, and implementation
issues. It be the first collection of monographs, edited
collections, and advanced texts to cover this rapidly growing
field. In order to publish material that is timely and reflects the
state of the art, the series will focus on books of relatively
narrow scope and moderate length and will feature a rapid
publication schedule. The first book of the series, Langdon's
Genetic Programming and Data Structures: Genetic
Programming + Data Structures = Automatic Programming! has
already been published. Topics may include, but are not limited
to design, control, classification, system identification, data
mining, pattern recognition and image analysis, data and image
compression, evolvable machine language, evolvable hardware,
and automatic programming of multi-agent and distributed
systems.
Prospective Authors:
If you have an idea for a book
which would fit in this series, we would welcome the
opportunity to review your proposal. Should you wish to discuss
any potential project further or receive specific information
regarding our book proposal requirements, please contact either
John Koza or Scott Delman. Pleas enclose a short biography
with your proposal.
John R. Koza
Consulting Editor
Section on Medical Informatics
Department of Medicine
School of Medicine
Medical School Office Building
Stanford University 94305-5479 USA
E-MAIL: koza@genetic-programming.org
PHONE: 650-941-0336
FAX: 650-941-9430
Scott Delman
Senior Publishing Editor
Kluwer Academic Publishers
101 Philip Drive
Assinippi Park
Norwell, MA 02146
Phone: 781-871-6311 ext. 299
Fax: 781-871-7507
Email: sdelman@wkap.com
Previous7NextTop
Date: Thu, 20 Aug 1998 14:57:23 -0700
From: Shannon Pemberton spemberton@insweb.com
Subject: Job Posting, InsWeb Corporation, San Mateo, CA
Web:
Title: Manager of Site Reporting and Analysis
Department: New Product Development
Reports To: Senior Vice President of New Product Development
Established in March 1995, InsWeb Corporation, San Mateo, CA
www.insweb.com
is an exciting, fast growing technology company.
InsWeb has become the most diverse and inclusive insurance site on the
World Wide Web. InsWeb has created a 'one stop shopping' forum for
insurance consumers to gather price quotations and product information
abut various types of insurance products from multiple insurance
providers using the Internet. The successful candidate will be an
important member of a close-knit team of technology professionals.
The Manager of Site Reporting and Analysis will manage a small team of
developers that will be responsible for developing and producing
timely and accurate reports on web site activity, developing and
maintaining a data warehouse and automating report production,
reconciliation and distribution. Will evaluate, recommend and
implement reporting and analysis tools. This person will work with
groups throughout the company to establish reporting requirements,
respond to ongoing requests for special reports and analyses and
coordinate with product development teams to establish data collection
requirements and coordinate with system team on server requirements.
The ideal candidate must have experience in technical project
management, Business report development and experience with relational
database and SQL. Strong verbal and written communication skills are
required. Must have experience in writing and reviewing formal
requirement specifications. Focus on quality and reliability and time
management is a must. Experience with MS SQL Server, MS Access,
Excel, and OLAP is desired. Background in Internet development and/or
statistical analysis is a strong plus. This position will be located
in our new Redwood City office.
Contact: Shannon Pemberton
Submit your resume:
By Mail: 3000 Executive Pkwy
Suite 530
San Ramon, CA 94583
By Fax: (925) 830-9081
By E-mail: resumes@insweb.com
Visit our site: http://www.insweb.com
at SmithKline Beecham Pharmaceuticals, King of Prussia PA, USA
(near Philadelphia)
In the pharmaceutical industry, new technologies for high
throughput screening and high throughput synthesis have been
increasing by orders of magnitude the volume of data relating
molecular structure to biological activity. This rapidly growing
body of data offers opportunities for the discovery of new
knowledge to advance drug discovery. To expand our capability in
processing and analyzing chemical and biological information in
direct support of drug discovery research programs, we seek an
individual who has
-- An advanced degree (Ph.D. preferred) in biological
sciences, chemistry, computer science or statistics, or
expertise and relevant experience equivalent to a Ph.D.
-- Experience in information analysis, pattern recognition,
machine learning or chemometrics.
-- Excellent written and spoken communication skills, and
sufficient depth of chemical and biological knowledge to
communicate effectively with scientists in drug discovery
programs.
-- The ability to think critically and drive his/her projects
to completion.
-- Experience with scientific software in a Unix environment.
The primary duties of a successful applicant will be
-- Application of methods for knowledge discovery in
databases to find relationships within and among large
chemical and biological databases.
-- Collaboration with SB scientists in other disciplines to
identify targets of inquiry, and with in house computer and
statistics specialists where appropriate.
-- Development and maintenance of external collaborations to
take advantage of newly developed methodology in pattern
recognition and data base mining.
SmithKline Beecham offers a competitive compensation/
benefits/ relocation/ package. For confidential consideration,
please send your resume and the names of three references to :
SmithKline Beecham Pharmaceuticals, Job Code: H8-0302, PO Box
40047, Philadelphia, PA 19106, or apply online from our website
at Previous9NextTop
Date: Fri, 28 Aug 1998 1:45pm
From: 'Patrick Perrin' perrin@logos-usa.com
Subject: Mount Arlington, NJ: Job position in knowledge extraction
Web:
Logos Corporation
Logos has been a leader in machine translation software and services for
more than 25 years. We invite outstanding applications for several
positions as computational linguists and computer scientists.
Successful candidates will have demonstrated a strong interest in and ability to
develop applications in one or more of the following: AI or cognitive
science (esp. NLP and MT), machine learning (esp. probabilistic
models), statistical methods/models for NLP/MT, OOP, GUI, Java/C++, databases,
and knowledge of a foreign language (esp. any romance language or
Japanese).
Applications consisting of a resume, a letter of motivation, and a list
of references are to be sent to: Dr. Patrick Perrin, Senior Project
Manager, Logos Corporation, 200 Valley Rd., Suite 400, Mount Arlington,
NJ 07856, email: perrin@logos-usa.com,
fax: (973) 398-6102, or (800)
564-6768. Open until all positions are filled. Logos is an AA/EOE
committed to nondiscrimination. M/F/D/V encouraged.
Thanks,
- Patrick Perrin.
Patrick Perrin, Ph.D. 973-398-8710/x116
Senior Project Manager/Research Scientist
ANNOUNCEMENT OF NSF-NATO POSTDOCTORAL FELLOWSHIPS IN SCIENCE AND
ENGINEERING INCLUDING SPECIAL FELLOWSHIP OPPORTUNITIES FOR VISITING
SCIENTISTS FROM NATO PARTNER COUNTRIES FOR 1999
On behalf of the North Atlantic Treaty Organization (NATO), the National
Science Foundation (NSF) invites applications for 12-month postdoctoral
fellowships from beginning scientists, mathematicians, and engineers.
Approximately 20 fellowships will be offered for research abroad and
approximately 20 awards will be made to U.S. institutions that would like
to host a Visiting Scientist from NATO Partner Countries. Eligible fields
of research are: mathematics, engineering, computer and information
science, geosciences, the physical, biological, social, behavioral, and
economic sciences, the history and philosophy of science, and
interdisciplinary areas comprised of two or more of these fields. Research
in the teaching and learning of science, mathematics, technology, and
engineering is also eligible for support.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ +++
+++ Modern Regression and Classification: +++
+++ +++
+++ Widely applicable statistical methods +++
+++ for modeling and prediction +++
+++ +++
+++ Chicago, Illinois: Nov. 23-24, 1998. +++
+++ +++
+++ Trevor Hastie & Rob Tibshirani, Stanford University +++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
This two-day course will give a detailed overview of statistical models
for regression and classification. Known as machine-learning in
computer science and artificial intelligence, and pattern recognition
in engineering, this is a hot field with powerful applications in
finance, science and industry.
A popular course given by two world experts in the field.