KDnuggets Home » Polls » Concise Laws In Large Datasets Poll (Mar 2010)


Do you think that there are concise, mathematical laws (patterns) in large datasets in business, social, and biology data? (123 votes)
Frequently (43) 35%
Occasionally (44) 36%
Rarely (27) 22%
Don't know (9) 7%


Ed R, Concise Laws in Large Data sets
If one were to apply data mining techniques to astronomical data, some very consistent patterns would likely be discovered. What isn't clear, is whether those patterns would be expressed in a way that was conducive to deeper analysis (e.g. closed-form equations, integral & differential relations, etc..).
For data from the noisy or chaotic systems such as those involving people, you may be able to identify some simple, generalizable relationships - but the statistical nature of these results would make it even more difficult to identify the fundamental relations at work. Just because you can fit an equation to data does *not* mean that the equation has any deep connection to the underlying system!

Tom Dietterich, simple laws
The whole power of data mining methods is that they scale with the complexity of the data and therefore they can detect very complex patterns. We don't really need them if we are looking for simple patterns.
Much of the human social and information world is characterized by "arbitrary complexity". Fred Brooks has defined computer science as the study of arbitrary complexity, and data mining is a key tool for doing this.

Ross Bettinger, Patterns in Large Datasets
I doubt that concise formulations of human behavior analogous to physics, e.g., F=ma, will be frequently found among living organisms and populations because individual variation often contributes as much noise as signal to any detection algorithm (what is the "signal" in a Jackson Pollack painting?). Among humans, irrational behavior, e.g., anything that does not follow an investigator's presumptions of right and proper conduct susceptible to predetermined classification, muddies the observational waters and reduces the accuracy of prediction. Economists postulate "rational behavior" for the convenience of their mathematically-based theories but there are probably very few people who base their choices on the theory of diminishing marginal returns. Rather, like John Lewis the UMW labor leader, if asked what they want, they would probably say "More." We need a fuzzy mathematics to deal with fuzzy behavior.

KDnuggets Home » Polls » Concise Laws In Large Datasets Poll (Mar 2010)