Interview: Kaiser Fung, NYU on Why Ignoring Data Integrity is a Recipe for Disaster
We discuss different levels of Data Integrity, logical fallacies in Analytics, measures to boost accountability, role for human intelligence in Analytics and relevance of OCCAM framework.

Here is my interview with him:
Anmol Rajpurohit: Q1. The Big Data revolution is partly based on the immense rise in our capability to measure and collect wide variety of data. You argue that a lot of this data is pushed into Analytics projects without being evaluated for correctness. So, are you suggesting that there is a gap between what we think of this data to be and what it actually is? Can you share some examples to illustrate this gap?
Kaiser Fung: I like to say “data integrity” rather than “correctness”. There are multiple levels of integrity. One level is value integrity, which most people recognize. Are there invalid values? Do values get dropped accidentally? Another level is label integrity. For example, there is code to track clicks on the button on the left side of a webpage. If the designer moves the button to the right side, the developer copies and pastes the previous tracking code but once in a while, forgets to edit the tag so the analyst continues to interpret the data as left-button clicks.

And then there is analytical integrity. Say, the traffic to your home page plunged last Monday because the tracking tag was inadvertently removed. The traffic existed, and just wasn’t measured, so the analyst extrapolated the missing value. However, no Web analytics software I know of has a solution to fix such mishaps permanently, and so anybody who ever looks at traffic data for any period that includes that Monday must make the adjustment. Needless to say, most analysts won’t even know about the anomaly.
AR: Q2. What are some of the most common logical fallacies in Analytics? How can we ensure to avoid them?
KF:
One fallacy I’d like to see less of is “story time.” This term describes a popular structure found in many data analyses: first, the author ropes readers in with details of the statistics and the data-driven models, and then comes a moment when the narrative becomes more elaborate, and drifts away from the data.
The Deflategate controversy during the recent Superbowl provides a nice example. A data analyst made noise over a statistic showing that

AR: Q3. As a business manager, how can one bolster accountability in Analytics processes? What factors would be indispensable for a checklist to ensure credibility of Analytics results?
KF: As analytics managers, we ought to taste our own dog food. We should quantify the impact of analytical projects. It is crucial to tie the outcomes to corporate metrics so business managers see the value of our work. For instance, if I run an A/B test on a specific page, and the results show a 10-percent increase of sales for daytime visitors, I express that gain in terms of total sales of all visitors, which is what the CEO cares about.
AR: Q4. As the world we live in becomes more and more data-driven, do you see any role for gut instinct? What is the most important role of human intelligence in Analytics?
KF: When I talk about “numbersense,” my point is that intuition or gut feeling plays an indispensable role in data analysis. The conventional wisdom is that data and intuition are polar opposites. That is a myth. The best data analysts are the ones who excels at harnessing their intuition to guide the analytical work. In my book, I argue that data analysis is inherently subjective.

AR: Q5. What is the OCCAM framework? How is it relevant?
KF: OCCAM is the acronym for a set of characteristics that are becoming more and more prevalent in new “Big Data” datasets and make the analyses of such datasets challenging. In short, many new

Second part of the interview
Related:
Top Stories Past 30 Days
|
|