KDnuggets : News : 2008 : n12 : item26 < PREVIOUS | NEXT >

Publications

From: Bruce Ratner
Date: 16 Jun 2008
Subject: The Importance of Straight Data: For Simplicity, Desirable for Good Modeling

For DMers (namely, statisticians, data analysts, data miners, knowledge discovers, and the like), exploratory data analysis, better known as EDA, places special importance on straight data, not in the least for the sake of simplicity itself. The paradigm of life is simplicity (at least for those of us who are older and wiser). In the physical world, Einstein uncovered one of life's ruling principles using only three letters: E=mc2. In the visual world, however, simplicity is undervalued and overlooked. A smiley face is an unsophisticated, simple shape that nevertheless communicates effectively, clearly and efficiently. Why, then, should the DMers accept anything less than simplicity in their life's work? Numbers, as well, should communicate clearly, effectively and immediately. In the DMer's world there are two features that reflect simplicity -- symmetry and straightness in the data. The DMer should insist that the numbers be symmetric and straight.

The straight-line relationship between two continuous variables, say X and Y, is as simple as it gets. As X increases or decreases in its values so does Y increase or decrease in its values, in which case it is said that X and Y are positively or negatively correlated, respectively. Or, as X increases (decreases) in its values so does Y decrease (increase) in its values, in which case it is said that X and Y are negatively correlated. As further demonstration of simplicity, Einstein's E and m have a perfect positively correlated straight-line relationship.

Read more.

Bookmark using any bookmark manager!


KDnuggets : News : 2008 : n12 : item26 < PREVIOUS | NEXT >

Copyright © 2008 KDnuggets.   Subscribe to KDnuggets News!