KDnuggets Home » News » 2010 » Mar » Publications » Misconceptions About Statistics  ( < Prev | 10:n05 | Next > )

Misconceptions About Statistics


 
  
Traditional statistical tools ... are overly simplistic and, in many cases inappropriate for the task of modeling human behavior - true?


Steve Miller Steve Miller, Information Management Blogs, March 1, 2010

I participate in a data mining/predictive modeling discussion group with a major social networking web site. Recently, a topic entitled "Misconceptions about statistics" surfaced, the reaction to an article in the BI media. That post had leveled several criticisms at statistical methods for predictive modeling. Among them:

Traditional statistical analysis is often of limited value. It is not that these tools are somehow flawed. Rather, it is that they are overly simplistic and, in many cases inappropriate for the task of modeling human behavior.

The advanced modeling tools used in data mining are not "better" tools. They are simply better suited to modeling the realities of human behavior.

My take on the article was less literal. I think the author's point that traditional statistical models might not be up to the task of predicting the realities of human behavior is quite valid. In fact, one of the giants of the statistical world, the late UC Berkeley professor Leo Breiman, originator of Classification and Regression Trees (CART) and Random Forests, said as much in a provocative 2001 article, Statistical Modeling: The Two Cultures, in which he criticized the statistics status quo. The abstract for this paper is telling:

There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.

Read more.


KDnuggets Home » News » 2010 » Mar » Publications » Misconceptions About Statistics  ( < Prev | 10:n05 | Next > )