Excerpt from: The Journey of Knowledge Discovery, by Gregory Piatetsky-Shapiro

Here is an excerpt from my chapter in the book: Journeys to Data Mining: Experiences from 15 Renowned Researchers, just published by Springer.

Excerpt from my chapter, in

Journeys to Data Mining: Experiences from 15 Renowned Researchers [Hardcover]
Mohamed Medhat Gaber (Editor)
Springer, 2012
ISBN-10: 3642280463

... 5. First Workshop on KDD - Knowledge Discovery in Databases
What should I call this workshop? The name "data mining", which was already used in the database community, seemed prosaic and, besides, statisticians used "data mining" as a pejorative term to criticize the activity of unguided search for any correlations in data which was likely to find something even in random data. Also "Mining" sounded prosaic and "data mining" gave no indication what are we mining for. "Knowledge mining" and "knowledge extraction" did not seem much better. I came up with "Knowledge Discovery in Databases", which emphasized the "discovery" aspect and the focus of discovery on "knowledge".

With encouragement and help from Jaime Carbonell (CMU), William "Bud" Frawley (GTE Labs), Kamran Parsaye (IntelligenceWare), J. Ross Quinlan (U. of Sydney), Michael Siegel (BU), and Sam Uthurusamy (GM Research), I put together a

Knowledge Discovery in Databases (KDD-89) workshop at IJCAI-89 in Detroit [6]. The KDD-89 workshop was very successful [7], receiving 69 submissions from 12 countries. It was the largest workshop at IJCAI-89, with standing-room only attendance. KDD-89 had 9 papers presented in 3 sessions, on Data-Driven Discovery, Knowledge-Based Approaches, and Systems and Applications. While some topics discussed at KDD-89 have faded from the research agenda (e.g. Expert Database Systems), other topics such as Using Domain Knowledge, Dealing with Text and Complex Data, and Privacy - remain just as relevant today.

6. First Knowledge Discovery Project

After the success of KDD-89 workshop I convinced GTE management that Knowledge Discovery was a good research idea with many applications, and was put in charge of a new project, which I named "Knowledge Discovery in Data". I believe it was the first project with such name. We worked on several small tasks dealing with fraud detection and GTE Yellow Pages until we came up with a really good application to health care.

6.1 KEFIR - Key Findings Reporter
Already in 1995, US healthcare costs consumed 12 % of GDP - Gross Domestic Product - and were rising faster than GDP. (footnote: As of 2010, US healthcare costs were estimated at 15% of GDP [8]).

Some of the health-care costs are due to potentially fixable problems such as fraud or misuse. Understanding where the problems are is first step to fixing them. Because GTE, a large telephone company, was self-insured for medical costs, it was very motivated to reduce them. GTE's healthcare costs in the mid-90s were in hundreds of millions of dollars.

The task for our project in support of GTE Health Care Management was to analyze employee health care data and identify problem areas which could be addressed. With Chris Matheus and Dwight McNeil, we developed a system called Key Findings Reporter, or KEFIR [9]. The KEFIR approach was to analyze all possible deviations, then select most actionable findings using interestingness [10] (see Figure 1). KEFIR also augmented key findings with explanations of plausible causes and recommendations of appropriate actions. Finally, KEFIR converted findings to a user-friendly report with text and graphics [11].

Fig. 1. KEFIR measure of Interestingness

KEFIR was very innovative project for its time, and received a top GTE technical award.

Currently we can see some of the same ideas in Google Analytics Intelligence, which automatically finds significant deviations from the norm across multiple hierarchies. ...

7. Many Names of Data Mining
Data Mining and Knowledge Discovery field has been called by many names.

In 1960-s, statisticians have used terms like "Data Fishing" or "Data Dredging" to refer to what they considered a bad practice of analyzing data without a prior hypothesis.

The term "Data Mining" appeared around 1990's in the database community. Some started to use "database mining™", but found that this phrase was trademarked by HNC (now part of Fair, Isaac), and could not be used. Other terms used include Data Archaeology, Information Harvesting, Information Discovery, and Knowledge Extraction.

I coined the term "Knowledge Discovery in Databases" (KDD) for the first workshop on the same topic (1989) and this term became popular in academic and research community. However, the term "data mining" became more popular in the business community and in the press.

In 2003, "data mining" acquired a bad image because of its association with US government program called TIA (Total information Awareness). Headlines like "Senate Kills Data Mining Program", ComputerWorld, July 18, 2003, referring to a US Senate decision to close down TIA, helped increase the negative image of "data mining".

In 2006, the term "Analytics" jumped to great popularity, driven by introduction of Google Analytics (Dec 2005) and later by a book "Competing on Analytics" [13]. According to Google Trends, "Analytics" became more popular than "Data Mining", as measured by Google searches, around 2006, and continued to climb ever since (See Fig 2).

Read the full chapter in

Journeys to Data Mining: Experiences from 15 Renowned Researchers [Hardcover].