OpenText Data Driven Digest Aug 21: College Majors, Hacking Glory, Innovation Performance
The simple beauty of X-Y coordinates belies the power they hold; indeed, many of the best data visualizations created today rely on, and build upon, on the Cartesian plane concept to show complex data sets. Here are three examples.
By Fred Sandsmark, (OpenText Analytics).
The second* data visualization we all learned in school was the Cartesian coordinate system. By plotting figures on a two-dimensional graph, we learned the relationship between numbers and space, unlocked patterns in those numbers, and established foundations for understanding algebra and geometry. The simple beauty of X-Y coordinates belies the power they hold; indeed, many of the best data visualizations created today rely on, and build upon, on the Cartesian plane concept to show complex data sets. Here are three examples. (Note that none of these are textbook Cartesian visualizations, because the X and Y axes represent different units.)
Back to School: Our favorite “Data Tinkerer,” Randy Olson, published a blog post this week exploring correlations between earnings, gender, and college major. Using data from the American Community Survey (and building on a FiveThirtyEight article by Ben Casselman), Olson created the graph above to show his findings. Then he generated a variety of graphs (one of which is below) that fit a linear regression onto the data and add bar charts along the graphs’ sides to show quantity along both axes. The results very effectively illuminate more aspects of the same data in a very efficient format.
Statistically Significant: Scientists are sometimes accused of adjusting their experiments to yield the answers they want. This practice is called p-hacking (for p-value) and is explained in a fine FiveThirtyEight article by Christie Aschwanden, Science Isn’t Broken – It’s just a hell of a lot harder than we give it credit for. The article is accompanied by the endlessly fun interactive shown above; click through to play with it. As you add or subtract parameters, the data on the Cartesian plane and the linear regression of that data change before your eyes. If you can find a connection that yields a p-value of 0.05 or less, Aschwanden says, you have data that’s suitable for publishing in an academic journal. Click here for a great explanation of p-values.
Business Time: At the Harvard Business Review, Ronald Klingebiel and John Joseph delved into whether it’s better to be a pioneer or a follower by studying a very specific slice of data: German mobile-handset makers in the years 2004-2008. Their chart (above) plots many manufacturers along two axes; the number of features on the x axis, and the month of entry into the market along the Y axis. Klingebiel and Joseph then highlight two companies that succeeded (Samsung and Sagem) and two that didn’t (HP and Motorola). The authors’ hypothesis was that a handset manufacturer was more likely to succeed if it came to market early with lots of features, or if it arrived later with fewer, better-focused features. The chart, while very good, would benefit from interactivity; I’d like to hover on any dot to get the company name, and click any dot to get details of how that company performed. Without this context, I must rely on the authors’ definition of success.