How to reduce Data Hoarding, get Better Visualizations and Decisions
Creating a hodge-podge of pretty pictures of every datapoint is a guaranteed way to destroy the value of a visualization. We examine how to reduce such data hoarding and improve decisions.
Perfect, see how Variable Z is flat and how the other variables were all spread out? Well that “spread” is variability. Our goal is to identify those “flat” planes, isolate them into “non variable” columns and throw them out, to simplify our analysis!
With that in mind, if we had to eliminate a column, we’d eliminate Z. Why? Because almost every value is 40. There’s incredibly little variability in it.
I’m lost… What’s variability? Home Buying Example
Let’s say you’re looking to buy a house and you’re a data-hoarder like myself so you build a spreadsheet. After narrowing down, we get to the last contenders:
So what columns can we eliminate to simplify our decision? Well… since there’s no variability in the blue columns– Neighborhood and Beds— we can throw them out. WHY?! Because Neighborhood/ beds won’t be the “difference” that we use to decide between houses.
Notice how Size Sq Ft and Baths move together. In other words, for every additional bath, you get another 1,000 sq feet (humor me). Therefore, if we can combine those into a new column called Size— we’re not losing any info, we’re just asking the underlying question– “What size home do I want?”
Finally, we can drop “Realtor is creepy” and “Smells like wet dog”— because those don’t stay with the house.
Now we can weight our options by the decision drivers. That’s the goal of the wonderful world of data and decision science!
Now, let’s look at dataset U, V, and W. (Note- there’s no “nearly constant” column)
Well darn, looks super “variable” to me…
Hang on! Let’s rotate our data (something we discussed in Causation vs Correlation: Visualization, Statistics, and Intuition!) to see if there’s a better perspective of U, V, W.
Still a blob… One more try!
Got it! See how from this perspective that U, V, W appears to be a disc? That’s what we’re looking for! Let’s isolate that flat plane into a single column.
Sounds great. How do we do that?