KDD Nuggets Index


To KD Mine: main site for Data Mining and Knowledge Discovery.
To subscribe to KDD Nuggets, email to kdd-request
Past Issues: 1996 Nuggets, 1995 Nuggets, 1994 Nuggets, 1993 Nuggets


Data Mining and Knowledge Discovery Nuggets 96:13, e-mailed 96-04-15

Contents:
News:
* GPS, ComputerWorld on Victoria Secret Data 'Wearhouse',
http://www.computerworld.com/search/AT-html/open/960408SL15vic.html

* E. Colet, WSJ article on NBA's use of data mining and technology.
Publications:
* B. MacReady, An Efficient Method To Estimate Bagging's Generalization
Error, http://www.santafe.edu/~wgm/papers.html
* R. Tibshirani, Bias, variance and prediction error for classification
rules, http://utstat.toronto.edu/reports/tibs/biasvar.ps
Positions:
* P. Riddle, Data Mining Job at Boeing
* S. Cornell, Senior Data Mining position for a start-up in Bay Area

--
Data Mining and Knowledge Discovery community,
focusing on the latest research and applications.

Contributions are most welcome and should be emailed,
with a DESCRIPTIVE subject line (and a URL, when available) to (kdd@gte.com).
E-mail add/delete requests to (kdd-request@gte.com).

Nuggets frequency is approximately weekly.
Back issues of Nuggets, a catalog of S*i*ftware (data mining tools),
and a wealth of other information on Data Mining and Knowledge Discovery
is available at Knowledge Discovery Mine site, URL http://info.gte.com/~kdd.

-- Gregory Piatetsky-Shapiro (moderator)

********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) *
* and not necessarily of their respective employers (or GTE Laboratories) *
*****************************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'If I had six hours to chop down a tree,
I'd spend the first four sharpening the axe'
- Abraham Lincoln
(thanks to Brij Masand)

Previous  1 Next   Top
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 8 Apr 1996
From: Gregory Piatetsky-Shapiro (gps@gte.com)
Subject: ComputerWorld on Victoria Secret Data 'Wearhouse'
see
http://www.computerworld.com/search/AT-html/open/960408SL15vic.html

Data 'wearhouse' gains
Michael Goldberg and Jaikumar Vijayan
04/08/96

Victoria is revealing some newsecrets with the help of a data warehouse.

In what observers said is a classic case of value gained by analyzing
data in new ways, Victoria's Secret Stores has embarked on a $5
million data warehousing project that its information systems and
business managers say will benefit the bottom line.

'We were spending way too much time trying to find information and
not enough time analyzing it,' said Rick Amari, vice president of IS
at Victoria's Secret. The company is known for its lacy lingerie and
associated catalogs that are chock-full of supermodels in sensual
poses.

The lingerie chain last May started working with Tandem
Computers, Inc. on a trial set of queries to see if a data
warehouse could boost the $1.3 billion chain's fortunes.

Managers considered 25 items mostly brassieres from the chain's
1,000-item inventory, Amari said. They learned that Victoria's
Secret's system of allocating merchandise to its 678 shops, based
on a mathematical 'store average,' was wrong, he said. Last
summer, they found out the following:

An average store sells an equal number of black and ivory bras,
but Miami-area consumers buy the ivory designs more often by a
margin of 10-to-1.

The demand at New York shops for bras with a bust size of 32 in.
outstrips the company average inventory model by 20-to-1.

Although Victoria's Secret applied merchandise discounts across
the board at its stores, geographic demand patterns showed that
some outlets should be able to continue charging full price. Amari
said a more precise application of discounts could boost sales by
an estimated $3 million.

These revelations about sales and inventory yielded a new way of
thinking at Victoria's Secret shops, Amari said. 'Our processes
and systems were built around an average-shop concept, when in
reality, our chain has few average shops. We found we were
missing opportunities. And from that, we recognized we needed
deeper [levels] of information at a lower level of detail and with
rapid access to it,' he said.

Retail industry consultant Mohsen Moazami said the lingerie
chain joins a growing army of merchants who are trying to
understand their customers' behavior with precision.

A data warehouse application is a competitive advantage now,
but it may not be in a year, said Moazami, national director of
the advanced technology group at Kurt Salmon Associates, Inc. in
Los Angeles. Food, clothing and other retailers that fail to invest
in such technology 'are going to be out of the game,' he said.

Sandy Taylor, an analyst at Standish Group International, Inc. in
Dennis, Mass., said the experience of the Victoria's Secret chain
highlights a common theme among companies that are having
success with data warehouses.

'Data warehousing is one of the few technologies where you can
get back some very visible payback,' Taylor said. 'We have seen
situations where the company discovers an early trend in product
inventory where they basically recoup the price of the system
with just one decision.'

The value of the information in its test last summer along with
technical support from Tandem prompted Victoria's Secret to
pick a 10-processor Himalaya server as the hardware platform
for its data warehouse, Amari said. The $5 million project will be
in full production by August.


Previous  2 Next   Top
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Subject: WSJ article on NBA's use of data mining and technology.
Date: Fri, 12 Apr 1996 10:43:50 -0400
From: Ed Colet (xcolet@watson.ibm.com)


A Wall Street Journal article (3/22/96) on the NBA's use of technology
mentions data mining and knowledge discovery (through the use of
IBM's Advanced Scout software), and ponders the question of whether
technology can create a winning edge.


'Let's Go to the Digital Videotape! NBA Teams Byte Into Cyber-Coaching',
Wall Street Journal, March, 22 1996, page B7.

- Chicago Bulls commissioned a secretive $2 million technology system in
1993. Joe Inzerillo (Chicago Bulls) claims it gives them approximately
a 5% edge.

- Drove other teams to search for 'high-tech weapons' to compete with
the Bulls, and to remain competitive.

- NBA lagged behind other leagues in use of technology, well behind
Major League Baseball in the use of statistics, and well behind the
National Football League in the use of video.

- The search for a high-tech weapon to counter the Chicago system led
Bob Salmi, assistant coach of the NY Knicks, and Tom Sterner,
assistant coach of the Orlando Magic, to work with Inderpal Bhandari
(IBM Research). Bhandari was working on attribute focusing (AF) or
data mining software for lay users untrained in analysis.
Appropriate users with appropriate amount of data were
available in basketball.

- '400 individual events (plays, shots, fouls, etc) a game; myriad
combinations thereof (fastbreak, screen, shot location, etc); no
automatic stops between events; so much going on so fast that tons
of data go unrecorded and lots that do get generated sit around
unanalyzed; lots of games won or lost by a point or two in the last
seconds (Since 1990, there has been a 3.47 point difference in the
average NBA game).'

- Compared to baseball or football, basketball does not have clear
start and stop sequences.
'Too many things happening too fast to record in numbers. Until now.
Computers and video systems are fast enough now to digest huge
amounts of data. . . Can the data create a winning edge?'

- AF data mining was applied to data from the Knicks vs. Rockets
1994 championship series.
'By definition, you don't know what you are trying to discover',
Inderpal Bhandari, on initially mining data.

- AF data mining discovered a somewhat known pattern when Knicks lose
(John Starks taking the most 3 point shots), and a previously unknown
but intriguing pattern when Knicks won (Charles Smith led team in
shots blocked).

- Salmi was impressed. Attribute focusing was then customized for
basketball and implemented as 'Advanced Scout'.

- An NBA Technology summit introduced 'Advanced Scout' and digital
video-editing techniques to all the teams in the NBA. 16 teams wrote
to Bhandari after the summit and now have the software.

- 'This is like the first space-launch', Michael Goldberg (executive
director of the NBA Coaches' Association), characterizing the summit.

- 'This stuff points to situations in the stats to check out on the
[game] video. . . A coach can see, for example, that something
worked really well against us. The question is why. The software
suggests why. And the digital video allows you to call up all those
plays instantly to see why.', Tom Sterner (Orlando Magic).

- Tom Sterner data-mines up and coming opponents for weakness-revealing
patterns...experimenting with analyzing lineups: If they have these
five guys on the floor in the fourth quarter, which of ours does best
against them?'

- Article concludes with the point that while talent is required,
technology can be important.

- 'Sooner or later, you need talent', Jim Cleamons (Chicago Bulls).

- 'It's not clear to me yet the extent to which this system can overcome
talent.', Inderpal Bhandari.

- 'Technology gives a very talented team a better edge, because you have
players with the abilities to use the information you find.' Dave Wohl
(Miami Heat).


(Summarized by Ed Colet, ecolet@watson.ibm.com)


Previous  3 Next   Top
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From Neuron-Digest

Subject: An Efficient Method To Estimate Bagging's Generalization Error
From: Bill Macready (wgm@santafe.edu)
Date: Wed, 10 Apr 1996 14:22:20 -0600

We would like to announce a paper entitled:

An Efficient Method To Estimate Bagging's Generalization Error

D.H. Wolpert, W.G. Macready


In bagging one uses bootstrap replicates of the training set to try to
improve a learning algorithm's performance. The computational requirements
for estimating the resultant generalization error on a test set by means
of cross-validation are often prohibitive; for leave-one-out
cross-validation one needs to train the underlying algorithm on the order
of $m^2$ times, where $m$ is the size of the training set. This paper
presents several ways to exploit the bias-variance decomposition to
estimate the generalization error of a bagged learning algorithm without
invoking yet more training of the underlying learning algorithm. In a set
of experiments, the accuracy of this estimator was compared to both the
accuracy of using cross-validation to estimate the generalization error of
the underlying learning algorithm, and the accuracy of using
cross-validation to estimate the generalization error of the bagged
algorithm. The estimator presented here was comparable in its accuracy to,
and sometimes even more accurate than, the alternative
cross-validation-based estimators.


This paper is available from the web site:
'http://www.santafe.edu/~wgm/papers.html'
or by ftp from
ftp://ftp.santafe.edu/pub/wgm/error.ps.gz


Previous  4 Next   Top
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From Neuron-digest

Subject: New paper available
From: tibs@utstat.toronto.edu
Date: Wed, 03 Apr 1996 21:48:00 -0500


Bias, variance and prediction error
for classification rules

Robert Tibshirani
University of Toronto

We study the notions of bias and variance for classification rules.
Following Efron (1978) and Breiman (1996) we develop a decomposition of
prediction error into its natural components. Then we derive bootstrap
estimates of these components and illustrate how they can be used to
describe the error behaviour of a classifier in practice. In the process
we also obtain a bootstrap estimate of the error of a ``bagged''
classifier.

Available at:

http://utstat.toronto.edu/reports/tibs/biasvar.ps
ftp://utstat.toronto.edu/pub/tibs/biasvar.ps
Comments welcome!

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Rob Tibshirani, Dept of Preventive Med & Biostats, and Dept of Statistics
Univ of Toronto, Toronto, Canada M5S 1A8.
Phone: 416-978-4642 (PMB), 416-978-0673 (stats). FAX: 416 978-8299
computer fax 416-978-1525 (please call or email me to inform)
tibs@utstat.toronto.edu. ftp://utstat.toronto.edu/pub/tibs
http://www.utstat.toronto.edu/~tibs
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Previous  5 Next   Top
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 8 Apr 1996 10:59:01 -0700
From: Patricia Riddle (riddle@redwood.rt.cs.boeing.com)
To: kdd@gte.com
Subject: Job Opening

* Outstanding Applied Researchers needed *

The Boeing Company, the world's largest aerospace company, is
actively working research projects involving NASA, FAA, Air Traffic
Control, and Global Positioning as well as airplane and manufacturing
research. The Research and Technology organization located
in Bellevue, Washington, near Seattle, has positions open.
We are the primary computing research organization for Boeing.
We have contributed heavily to both short term technology advances and
to long range planning and development.

- Machine Learning
BACKGROUND REQUIRED: Data Mining, Knowledge Discovery
Statistics, Artificial Intelligence or related
field.

RESEARCH AREAS: We are developing techniques for mining a very diverse
set of data: Safety Data -- safety incident data, flight data
recorders; Reliability Data -- maintenance actions, airplane
maintenance warnings; Manufacturing Data - rejected parts, quality
assurance data. These are not areas where most large R&D datamining
efforts are currently focused. There are many new innovative research
directions which are not being addressed elsewhere. At the same time,
we can achieve major practical impacts in the short-term both at
Boeing and in the airline industry as a whole: airplane or factory
redesign, new pilot regulations or training, which may result in a
safer more cost effective air travel industry.

- Knowledge Representation
BACKGROUND REQUIRED: A strong background in Artificial
Intelligence plus some specialization in Knowledge
Representation and Reasoning, Ontology Development, Knowledge
Based Engineering, Knowledge Sharing and Reuse or related
field.

RESEARCH AREAS: Knowledge Based (KB) design methods are gaining
acceptance as a way to implement standard design methods used in
all parts of the design lifecycle. We are developing methods for
neutral representation of design rules, methods to capture and
refine specifications, and are looking for ways to capture and
manage design assumptions. The design of aircraft is one of the
most challenging tests of knowledge based methods: an airplane
consists of millions of parts, design is subject to rigorous
certification, and more designed tooling is used than for almost
any other application.

A PhD in Computer Science or equivalent experience is highly
desirable for both positions. We strongly encourage diversity
in backgrounds including academic and industrial experiences.

APPLICATION: If you meet the requirements and you are interested, please
send your resume via electronic email in plain ASCII format to
riddle@redwood.rt.cs.boeing.com (Pat Riddle).

The Boeing Company is an equal opportunity employer.


Previous  6 Next   Top
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 11 Apr 1996 09:56:46 -0700
From: Dynamic Synergy (dynasyn@earthlink.net)
Subject: Re: data mining project

I am an executive recruiter with Dynamic Synergy, doing a search for a
Director R&D/VP Engineering with data mining experience for a start-up
company in the Bay area. The company has been exclusively a consultant
services organization but is moving into the software development area.
They are offering great compensation, including an equity position,
relocation and benefits.
>>>
For more information, contact Stacey Cornell at (310) 576-7600 or reply to
this email address, Attn: Stacey.
>>>
>>>Thank you.
>>>


Previous  7 Next   Top
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~