Follow Gregory Piatetsky, No. 1 on LinkedIn Top Voices in Data Science & Analytics

KDnuggets Home » News » 2015 » May » Tutorials, Overviews, How-Tos » How to reduce Data Hoarding, get Better Visualizations and Decisions ( 15:n17 )

How to reduce Data Hoarding, get Better Visualizations and Decisions

Creating a hodge-podge of pretty pictures of every datapoint is a guaranteed way to destroy the value of a visualization. We examine how to reduce such data hoarding and improve decisions.

By Alex Jones.

My name is Alex and I’m a data-hoarder.

I have a compulsion to collect, cleanse, and correlate. As you may have guessed, my love-affair with data is inversely related to the quality of my social life.

I used to believe this dashboard was perfectly acceptable. After all, “If you can’t measure something, you can’t manage it”.

What I need to work through is this– This dashboard is lazy. It’s taxing to interpret and questionably actionable.

It’s a classic example of data hoarding.

Not to mention, it uses gauges… If you have the time, here’s an amusing diatribe from Stephen Few, one of the “fathers” of data visualization: Visual Business Intelligence – A Preview of Tableau 9: Gauges?!

The key to visualization is not to create a hodge-podge of pretty pictures of every data point, that’s a guaranteed way to destroy the value of a visualization tool.

Instead, good visuals require an understanding of problems, processes, and business. It’s the ability to drive action, perform analysis, and build predictive models on top of our data.

In mathematics, it’s conceptually similar to Occam’s razor. Occam’s razor can be distilled to “keep it simple“– use data, eliminate assumptions, make predictions and if you have two models with equivalent accuracy, you should choose the simpler model. Opinions are secondary to data.

That takes time and it goes against our psychological underpinnings. In fact, there was a study in 2005 that found that the human mind can only reliably process 3 variables at a time, 4 is a huge decline, and 5 is less accurate than a coin toss.

In other words, data-hoarding dashboards are more likely to drive poor decisions! Oh the irony.

With that, we know what we need to do: Eliminate human decision making.

Oh wait… that’s me! Disregard that, better idea– Let’s use computers, mathematics, and expertise to simplify decision making!

Wonderful. How do we simplify?

Well, there are a lot of options– most of the “big ones” are listed in Let’s Get Nerdy: Data Analytics Explained for Business Leaders but statistics can be tricky, as we saw in Causation vs Correlation: Visualization, Statistics, and Intuition!

Let’s consider ways to “reduce dimensionality“. What is “dimensionality”? It’s the number of columns, variables, factors, Xs to our Ys, inputs to our… Ok, I’ll stop.

How can eliminating some columns help make decisions? Well, think of Credit Scores— that one number consolidates a multitude of factors into a comprehensible universal metric. Psh, humans… am I right?

Ok, now let’s get a little philosophical– what is it that makes data “informative”?
Great question, Data is only informative for decision making if it has variability! Take a look at the dataset below (X, Y, Z)– note how column Z is pretty much all 40.

Let’s visualize X, Y, and Z

Hmm, let’s rotate:

Still not great… Let’s rotate again

Sign Up