Data Science: 4 Reasons Why Most Are Failing to Deliver
Data Science: Some see billions in returns, but most are failing to deliver. This article explores some of the reasons why this is the case.
By Nick Elprin, CEO and Co-Founder, Domino Data Lab
One of today’s organizational dilemmas: it’s pretty well understood that data science is a key driver of innovation, but few organizations know how to consistently turn data science output into business value. Sixty percent of companies plan to double the size of their data science teams in 2018. Ninety percent believe data science contributes to business innovation. However, less than nine percent can actually quantify the business impact of all their models, and only 11 percent can claim more than 50 predictive models working in production. This data stems from a recent survey of more than 250 data science leaders and practitioners.
What’s at the root of the disconnect? There is a divide between organizations that view data science as a technical practice and those that go further to embrace data science as an organizational capability woven throughout the business fabric. Companies that have mastered the technical and management practices necessary to embed algorithmic-driven decision-making at the core of the business are reaping the most returns. They can be considered “model-driven.” (Think of Amazon, Netflix, Stitch Fix and Tesla as examples.)
Of course, that’s easier said than done. Let’s zero in on four of the toughest challenges that are emerging in organizations struggling to derive value from their investments in data science (which, by the way, are the majority):
- Silos of knowledge. Hiring data scientists does not guarantee that your organization will profit from data science. For most firms, adding a data scientist to the team does not generate the same output as previous data scientists — you get diminishing returns to scale, instead of the exponential returns that high-functioning data science teams achieve. A recurring theme is that data scientists working on individual laptops or other siloed environments often duplicate efforts. They don’t have visibility into what work has been done by others that they could benefit from. One major insurance company, for instance, had dozens of scientists working in uncoordinated ways on the same business problems — leading to lost investment and missed opportunities. There’s a difference, in other words, between having a collection of individuals who create models, and having a dynamic team capable of leveraging its collective knowledge, skills and past work to collaboratively build better and better models with faster time to value.
- Friction in model deployment. Well-run data science teams operate within a continuous and iterative lifecycle — from research to production, and back - and measure the impact of models in production. Unfortunately, the research process is too often completely separate from model deployment process - and when (if) a model is deployed there is no link to business impact. If this rings true to you, then your organization likely cannot iterate on models that are deployed to improve them and worse yet, fails to measure business impact of their models. One major financial services firm said they “build their new headquarters in less time than it took to get a model into production.”
- Tool and technology mismatch. IT departments have spent the last decade-plus building a big data infrastructure to support data storage and processing – but that infrastructure doesn’t necessarily lend itself to data science. Data scientists can use as many as 3-5 different tools or packages monthly, leveraging the latest packages consistently. There were over 365,000 updates to the popular open-source programming language Python in 2017 alone! Furthermore, data science work demands access to elastic compute to perform specific experiments, like deep learning that requires powerful machines with GPUs. Lacking access to elastic compute and the latest tools limits your team’s agility, constrains the pace of research - and results in delayed development. Even worse, sometimes the result is shadow IT, as was the case with a large global bank where it took so long to approve new Python packages that data scientists ended up bringing their personal laptops to work and covertly using them.
- Model liability. Without proper management, a model in production may do more harm than good. If you are actively managing models in production, then you’ve likely realized this. A model that is not tightly monitored or actively controlled can cause serious harm to the business, either in the form of compliance failures, revenue losses, damage to brand or reputation. Knight Capital Group, for instance, lost $440 million in 45 minutes after a mistake in updating a model. This is an extreme scenario, but makes the case that organizations must continually validate and monitor their models for misuse and performance degradation.
How these four challenges are addressed will determine much of how the next five to ten years looks for your organization. If you’d consider your company a laggard with respect to data science maturity, seek consolation in the fact that you’re not alone: some 46 percent of all survey respondents, in fact, fell into the laggard category. Forty percent are considered “aspiring,” and a mere 14 percent rose to the top and demonstrated ability to manage data science as an organizational capability.
It’s not too late. To start delivering and measuring business value from data science, your organization must develop and implement a consistent and sensible framework around people, process, and technology. Companies that put the time and energy into this framework, and that successfully leverage data science as a core business competency, are reaping the rewards. Netflix, for example, integrates models into each part of their business; they estimate a billion dollars in incremental value from their personalization and recommendation models alone.
The message is clear: achieving success with data science is not easy. There are significant barriers that must be overcome. But ultimately, those that solve the hard problems -- figuring out how to develop and deploy and high-impact models at scale, and truly instrumenting their businesses with data science -- are better equipped to leverage them for competitive advantage over the long term.
Bio: Nick Elprin is CEO and Cofounder of Domino Data Lab. He received MS in Computer Science from Harvard in 2005.
- Scientific debt – what does it mean for Data Science?
- [eBook] Solving 4 Big Problems in Data Science
- How to Turn your Data Science Projects into a Success