Big Data Desperately Needs Transparency

If Big Data is to realize its potential, people need to understand what it is capable of, what information is out there and where every piece of data comes from. Without such transparency and understanding, it will be difficult to persuade people to rely on the findings.

People have an innate suspicion of numbers.

We understand that the answer to life, the universe, and everything is too complex to be boiled down to the number 42 (according to Hitchhiker’s Guide to the Galaxy), but in the search to quantify our existence, we do allow our lives to be ruled by numbers. We count the calories in our food, we count the minutes on our daily commute, and we definitely count the number of emails in our inboxes. We use our experience to decide which numbers are good and which numbers are bad. If we didn’t manage our lives by numbers, we would be obese, late and overwhelmed.

However, there are increasing amounts of data in our lives where we are not certain of the origin. The failure of opinion polling has already been widely debated, and if such a “fine art” can get it wrong, who is to say that Big Data is any different? Isn’t polling Big Data by another name? We haven’t really got a clue how these opinion pollsters got their figures, and most corporate people are equally unsure where all their stats are conjured up from.

If Big Data is to have the impact that it could have, there is some serious education required. People have to understand what Big Data is capable of, what information is out there and where every piece of data comes from. Without this transparency and understanding, you will have difficulty persuading an intelligent group of people to rely on the findings.

Part of this transparency starts with a story. Seeing numbers on their own doesn’t mean much to most people, but when they hear the context behind them, they start to understand their significance. These stories take time and imagination, and this is what sets apart the best Big Data professionals – they can translate the numbers into a language that their colleagues understand. They can allay their suspicions and draw them into the story that the numbers are weaving. When people get involved with the numbers in this way, they take ownership of the numbers, and they are also able to explain them to others.

The second aspect of transparency is the requirement to show the numbers “warts and all.” If the management team gets a sense that the numbers have been doctored to support a certain cause, they will reject them out of hand. Every statistical set has outliers and exceptions, and rather than present a clean set of numbers, these anomalies should also be included. This gives a fuller picture to the decision makers, and the numbers will seem more natural.

Lastly, I suppose that transparency is about trust. The analytics team knows what the rest of the business needs to understand and they don’t seek to manipulate the numbers to show what they want to hear. The numbers are what they are, and the analysts won’t colour them or amend them in any way. When the business learns to trust the numbers, they have a solid foundation for making the best decisions and subsequent growth.

If data is transparent, people are inclined to believe it.

Original. Reposted with permission