Data Scientists think data is their #1 problem. Here’s why they’re wrong.
We tend to think it's all about the data. However, for real data science projects at real organizations in real life, there are more fundamental aspects to consider to do data science right.
By James Taylor, CEO and leading authority on Digital Decisioning and delivering business impact from AI and machine learning.
I often see articles or posts that identify data integration or preparation as the key issues facing data science projects. This always puzzles me as this is not our lived experience - not what we see when we work with Fortune 500 companies adopting predictive analytics, machine learning, or AI. But I think I have figured it out. The problem is as follows:
What data scientists think counts as a "data science project" is not, in fact, a data science project.
Let me illustrate this with some data from a great study. Back in 2016, the Economist Information Unit did a survey on "Broken links: Why analytics investments have yet to pay off" and below, you see how this data appears to support the argument that data problems are #1.
Wow - pretty clear that Data integration/preparation is the biggest problem, with nearly twice as many projects reporting it as a problem as the next one.
In fact, though, this is a subset of the data from the survey. Here's the full data set:
Data integration and preparation only ranks #4. Problem definition/framing, Solution approach/design, and Action/change management all rank higher. This is our experience.
In large, established "grown-up" companies, data science projects fail for one or both of two reasons:
- They are solving the wrong problem. They are building an analytic that is not what the business need, that will not solve a true business problem, or that is poorly designed to fit into the business context.
- Because they cannot action the model they build. They can't change the business decision making to take advantage of the analytic by changing the decisions made and actions taken.
And this illustrates the problem.
The problem is that data scientists THINK their project starts with data and ends with the communication of their analysis. If that's your focus, then data is your #1 problem.
But this is not where data science projects start nor where they end. They have to start and end with the business. That means starting with a business problem - a business decision that the business wants to improve - and ending with that problem being solved - the business behaves differently (better). If that's your focus, then your problem is not data but problem definition and operationalization - making the analytic work IRL.
Here's the difference shown in those phases. On the left, what many data scientists think their projects involved, and on the right, what it really involves.
Bottom line: If your data science team is telling you that data is their #1 problem, then they're doing it wrong.
I've written about this before - check out this article on the study itself and this one on adopting decision modeling as a better way to define the problems your data science team is trying to solve. You might also like our recent white paper and videos on Building an Analytic Enterprise.
Original. Reposted with permission.
- Top 10 Statistics Mistakes Made by Data Scientists
- Common mistakes when carrying out machine learning and data science
- What is the most important question for Data Science (and Digital Transformation)