KDnuggets : News : 2008 : n16 : item24 < PREVIOUS | NEXT >

Publications

From: Bruce Ratner
Date: 17 Aug 2008
Subject: Data Mining: An Ill-defined Concept

The term Data Mining is an ill-defined concept in statistics and related disciplines. Before the late 1970s/early 1980s, statisticians had known about data mining for a long time, albeit under various names such as data fishing, snooping, and dredging, and most disparaging "ransacking" the data. Because any discovery process inherently exploits the data, producing spurious findings, statisticians did not view data mining in a positive light. A concept is well-defined when its definition specifies the concept -- an idea that includes all that is characteristically associated with or suggested by the concept -- in an unambiguous way along with its unique value. I performed a world-wide-web search for "definition of data mining." Based on the search results, discussed below, I declare that data mining is not a well-defined concept. Data mining is in a state of helter skelter: [1] Entry-to-mid level data miners are in a quandary as to which definition to use, and practiced data miners produce research, from which there are nil meta-analyses of data-mining based studies. I conclude that either a singular definition for multi-disciplines or a multitude of discipline-specific definitions of data mining is long over due.

Today's data mining is a high-concept: having elements of fast action in its development, glamour as it stirs the imagination for the unconventional and unexpected, and a mystic that appeals to a wide audience that knows curiosity feeds human thought. I googled "definition of data mining" and received a gross (vis-a-vis net) number of 236,000 definitions! (Curiously, one of the entries was "Data mining is derogatory ... ") To have a sound working assumption for the task at hand, I netted the "gross" google-number to 2360. (This netting in and of itself coincidentally reflects that the definition of google's search engine optimization is also ill-defined.) Suffice it to say that data mining is an ill-defined concept, as 2360 definitions are clearly not needed to unambiguously explain the concept. Unprecedentedly, the data mining concept early on (circa 1970s/early 1980s) did not have, and currrently does not have the scholarly cause to take form. I conclude that data mining is an ill-defined concept. And, I declare that the net number of definitions suggests there are discipline-specific data mining definitions; but how many are there: 18, 36, 54, ... ? [2] Regardless of an agreed number of disciplines, 2360 divided by the "agreed-number" presents data mining proper or data mining discipline-specific as an ill-defined concept.

To the seemingly incapable problem of developing a well-defined definition of data mining, I would like to add entry # 2,361: Statistics Definition of Data Mining:

Read more.

Bookmark using any bookmark manager!


KDnuggets : News : 2008 : n16 : item24 < PREVIOUS | NEXT >

Copyright © 2008 KDnuggets.   Subscribe to KDnuggets News!