Data Science Governance - Why does it matter? Why now?

Everyone is talking about GDPR, Data Governance and Data Privacy, these days. Here we discuss what is it and why does it matter.

By Martin Hack, Kensu.

Governance If you needed any proof that Europeans are decisive about enforcing regulations you don’t have to look any further than the recent $2.7 Billion antitrust fine against Google. This comes in anticipation of new EU law called GDPR (General Data Protection Regulation). GDPR is just around the corner (May 2018) and carries significant financial fines for non-compliance. Without a doubt, the advent of ML, AI and Data Science has had a massive impact on our lives over the last couple of years and will continue to do so in the foreseeable future. In this post I’ll talk about the emergence of Data Science Governance.

Data science is … moving from a “wild west” attitude to quickly becoming a crucial part of most Global 2000’s enterprises

There are several reasons why data science governance is becoming a critical requirement in the very near future:

  1. GDPR (European privacy law to be in effect May 25, 2018)
  2. Performance & build vs. buy. Data Governance is highly unlikely to be built in-house
  3. “Model-Interpretability” will become a main obstacle for AI with no apparent answer

Here is a more detailed look:

  1. GDPR is the perfect storm of urgency, need and technical complexity. Similar to Y2K and regulations like PCI and HIPAA there’s an actual drop-dead date with draconian fines for non-compliance (4% of annual revenue or €20 Million whatever is higher). Not only does it effect every single corporation within the EU, it also affects a majority of Fortune 500’s (anyone doing business in Europe). While there’s currently little awareness within the US, EU companies with exposure are somewhat in panic mode. While there’s a heavy focus on information security and data privacy, there’s also a data science component of the regulation that until know is not well understood and easily solved.
  2. Data Science Performance & build vs buy. In the current eco-system of AI, Machine Learning and Data Science, the in-house do-it-yourselfers are mostly leading the charge today. However, data science or as some call it “decision science”, is also moving from a “wild west” attitude to quickly becoming a crucial part of most Global 2000’s enterprises. As the importance of data science is increasingly recognized, there is a need for software that helps manage the performance of the data science efforts by discovering connections and reporting on KPIs, irrespective of the underlying data science technologies. Virtually anyone using machine learning or AI would want to measure and track efforts. Solutions should not only provides a “what’s going on” view of their data science projects but might also monitor past, present and future performance. 

Virtually anyone using machine learning or AI would want to measure and track efforts

  1. Model Interpretability is already an issue today. The increased use of non-parametric machine learning models and by inclusion every neural (and deep) learning approach have become the de-facto standards for “modern” AI/ML. However, these techniques are entirely “black-box” and usually non-interpretable, meaning humans can not interpret nor understand or follow the decision logic on how a particular answer or result was achieved. This leads to outright banning in some industries[1] or great hesitancy to deploy an otherwise very useful technology such as AI. What we really need is a “activity” based approach rather than simply parsing log-files.We need “explainer” like functionality that will be used by anyone who would like to get an inside view of what their model does.

[1] Fair lending laws in the US makes the use of non-parametric methods for consumer lending and finance difficult to impossible since credit decisions have to be human-reproducible e.g. based on a specific reason code and coefficient.

Bio: Martin Hack is the Executive Chairman of Kensu, a company that has developed the first of its kind GCP (Governance, Compliance and Performance) solution for Data Science. Prior to that he was the CEO & Co-founder of Skytree, one of the first machine learning companies of the new era. 
Follow him at @mhackster.