Beware of Two Data Obfuscation Tactics
We examine 2 common tactics by data "skeptics": demanding more precision and demanding unanimity. These techniques are especially effective against data scientists, who should be aware of them, and able to counteract them.
By Kaiser Fung, Junkcharts.
I want to parse this statement by new EPA Chief Scott Pruitt, as quoted in this New York Times article (Mar 9, 2017):
I think that measuring with precision human activity on the climate is something very challenging to do and there’s tremendous disagreement about the degree of impact, so no, I would not agree that it’s a primary contributor to the global warming that we see.
I'm not going to talk about the politics of climate change in this post but rather to point out that Pruitt used two popular obfuscation tactics adopted by people who don't like what the data are suggesting to them. I have been party to countless such dialogue in business meetings.
Tactic #1: Claiming that data should be ignored until it is made "precise"
Since nothing can be 100% precise, the request for more precise data is akin to an ask for eliminating "terrorism". Imprecision, like terrorism, is a feature of the world we live in. One can work to decrease imprecision, or reduce terrorism, but one can't and won't eradicate either.
This tactic is specifically tailored to data analysts, who being logical thinkers, will never vouch for anything 100%. It works particularly well with ethical data analysts. People using Tactic #1 are daring the analysts to stand up and make false claims of 100% precision. (The tactic might not work on, say, IBM marketers, who have made some stupendous claims.)
The demand for more "precision" always leads to a demand for more analysis. The cycle continues.
Tactic #1 stipulates a black-and-white world. The data is either 100% precise (good) or 100% imprecise (bad). Anything in between is lumped with 100% imprecise. If this criterion were to be applied to all business decisions, there would be no risk-taking, no investments of any kind, and capitalism would grind to a halt. If Pfizer would not spend any money developing new drugs unless the data is 100% precise, it would never start any projects. If a real estate developer would not take out a loan unless he is 100% sure all of the space could be rented out at desirable prices within the first year, then he would never undertake any projects.
However, when it comes to such decisions, the same decision-makers who fear the scourge of imprecision suddenly re-make themselves as "betting men."
Tactic #2: Pointing to disagreement as a reason to refute the conclusion
An easy way to derail meetings is to have a few people blow smoke at the data. The questions raised are frequently trivial, sometimes irrelevant, but the questioning produces an air of doubt.
For example, an analyst might conclude that the overall customer satisfaction rating has been trending down. Now even if the aggregate rating is in decline, there will definitely exist a few counties or neighborhoods in which the rating has sharply risen. The people who resist will insist on investigating those counties - it doesn't matter if those counties only include 0.01% of the customer base.
Imagine you are a third party to the debate, and you have no knowledge of the subject matter (for example, you are the controller attending a meeting about customer loyalty). You listen to a prolonged discussion of cases that may or may not contradict the data analyst's conclusion. You only know what you are hearing at the meeting. It's not surprising that you think there is "tremendous disagreement" and that the conclusion may be dubious.
In reality, there are only a few loud dissenters, who are objecting because the conclusion does not confirm their pre-conception.
For any budding data analyst, you have to be prepared to handle these situations. Pruitt's quote is a perfect encapsulation of the common data-busting tactics. He is saying (a) the data is imprecise and (b) some people disagree, and therefore (c) I reject your conclusion and (d) I am free to believe whatever I want to.
See also The Problem With Facts by the great Tim Harford.
Kaiser Fung is a recognized expert, speaker, author and teacher in business analytics, and data visualization. He directs the Master of Science in Applied Analytics at Columbia University.
His most recent book is Numbersense: How to use Big Data to Your Advantage (McGraw-Hill, 2013).
Editor: this is a slightly revised version of this post. Reposted with permission.
- It’s Getting Hot In Here: Data Science vs Fake News
- What Happened Last Night in Sweden: Data Science vs Fake News
- Climate Change Denial and CO2 Emissions – What is the Connection?