9 Bizarre and Surprising Insights from Data Science
The petabytes of information currently available to analysts amounts to a boundless playing field of possible truths.
Data is the world's most potent, flourishing unnatural resource. Accumulated in large part as the byproduct of routine tasks, it is the unsalted, flavorless residue deposited en masse as organizations churn away. Surprise! This heap of refuse is inherently predictive. Thus begins a gold rush to dig up insightful gems.
Does crime increase after a sporting event? Do online daters more consistently rated as attractive receive less interest? Do vegetarians miss fewer flights? Does your e-mail address reveal your intentions?
Yes, yes, yes, and yes!
We’ve entered the golden age of predictive discoveries. A frenzy of number crunching churns out a bonanza of colorful, valuable, and sometimes surprising insights
Predictive analytics' aim isn’t limited to assessing human hunches by testing relationships that seem to make sense. It goes further, exploring a boundless playing field of possible truths beyond the realms of intuition. And so it drops onto your desk connections that seem to defy logic. As strange, mystifying, or unexpected as they may seem, these discoveries help predict.
Welcome to the Ripley’s Believe It or Not! of data science - the Freakonomics of big data.
Below are nine colorful discoveries, each pertaining to a single predictor variable - from the likes of Walmart, Uber, Harvard, Shell, Microsoft, and Wikipedia. These examples are new in this year's Revised and Updated edition of my book, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, bringing the book's more extensive "Bizarre Insights" table up to 46 total. (For more information about the examples below, access the book's Notes PDF - provided at no charge at www.PredictiveNotes.com - and search by organization name.)
|Pop-Tarts before a hurricane. Prehurricane, Strawberry Pop- Tart sales increased about sevenfold.||Walmart
|In preparation before an act of nature, people stock up on comfort or nonperishable foods.|
|Higher crime, more Uber rides. In San Francisco, the areas with the most prostitution, alcohol, theft, and burglary are most positively correlated with Uber trips.||Uber||“We hypothesized that crime should be a proxy for nonresidential population. . . Uber riders are not causing more crime. Right, guys?”|
|Typing with proper capitalization indicates creditworthiness. Online loan applicants who complete the application form with the correct case are more dependable debtors. Those who complete the form with all lower-case letters are slightly less reliable payers; all capitals reveals even less reliability.||A financial services startup company
|Adherence to grammatical rules reflects a general propensity to correctly comply.|
|Users of the Chrome and Firefox browsers make better employees. Among hourly employees engaged in front-line service and sales-based positions, those who use these two custom Web browsers perform better on employment assessment metrics and stay on longer.||A human resources professional services firm, over employee data from Xerox and other firms||"The fact that you took the time to install [another browser] shows . . . that you are an informed consumer . . . that you care about your productively and made an active choice.”|
|Men who skip breakfast get more coronary heart disease. American men 45 to 82 who skip breakfast showed a 27 percent higher risk of coronary heart disease over a 16-year period.||Harvard University medical researchers
|Besides direct health effects—if any—eating breakfast may be a proxy for lifestyle: People who skip breakfast may lead more stressful lives and “were more likely to be smokers, to work full time, to be unmarried, to be less physically active, and to drink more alcohol.”|
|More engaged employees have fewer accidents.Among oil refinery workers, a one percentage-point increase in team employee engagement is associated with a 4 percent decrease in the number of safety incidents per employee.||Shell||More engaged workers are more attentive and focused.
|Smart people like curly fries. Liking “Curly Fries” on Facebook is predictive of high intelligence.||Researchers at the University of Cambridge and Microsoft Research||An intelligent person was the first to like this Facebook page, “and his friends saw it, and by homophily, we know that he probably had smart friends, and so it spread to them . . . ,”and so on.|
|Female-named hurricanes are more deadly. Based on a study of the most damaging hurricanes in the United States during six recent decades, the ones with “relatively feminine” names killed an average of 42 people, almost three times the 15 killed by hurricanes with “relatively male” names.||University researchers
|This may result from “a hazardous form of implicit sexism.” Psychological experiments in a related study “suggested that this is because feminine- versus masculine-named hurricanes are perceived as less risky and thus motivate less preparedness. . . . Individuals systematically underestimate their vulnerability to hurricanes with more feminine names.”|
|Higher status, less polite.Editors on Wikipedia who exhibit politeness are more likely to be elected to “administrative” status that grants greater operational authority. However, once elected, an editor’s politeness decreases.||Researchers examining Wikipedia behavior
|“Politeness theory predicts a negative correlation between politeness and the power of the requester.”|
And now a word of warning! In the table of examples above, do not give much credence to the “Suggested Explanation” column’s attempt to answer “why” for each insight. For each one, there are also other plausible explanations, and, in most cases, only intuition rather than scientific evidence behind the particular answer provided. The reasons behind each discovery in the left column are generally unknown. Every explanation put forth, each entry in the rightmost column, is pure conjecture with absolutely no hard facts to back it up.
The dilemma is, as it is often said, correlation does not imply causation. The discovery of a predictive relationship between A and B does not mean one causes the other, not even indirectly. No way, no how. My Quartz article on this topic explores it in detail.
But do not fret. When applying predictive analytics, even though we generally don’t have firm knowledge about causation, we often don’t necessarily care. For many projects, the value comes from prediction, with only an avocational interest in understanding the world and figuring out what makes it tick. The freak show of surprising discovers delivers predictive value even when it does little to explain itself.
Bio: Eric Siegel, Ph.D. is an author, most recently of Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, and founder, Predictive Analytics World.
Original. Reposted with permission.
- Predictive Analytics Introductory Key Terms, Explained
- Four Major Predictions for Predictive Analytics and Big Data in 2016
- Big Data, Bible Codes, and Bonferroni