Data Science as a Product – Why Is It So Hard?

Developing machine learning models as products that deliver business value remains a new field with uncharted paths toward success. Applying well-established software development approaches, such as agile, is not straightforward, but may still offer a solid foundation to guide success.



By Tad Slaff, Product Lead - Data Science at Picnic Technologies.

At Picnic, we see ourselves as tech’s answer to groceries. From the app-only store, 20-minute delivery windows, and just-in-time supply chain, the technology and the underlying data are critical to the growth of Picnic.

As the Data Science team within Picnic, it is our job to take data-driven decision making to the next level. We are charged with building automated systems that have the intelligence, context, and empowerment to make decisions with a business impact in the tens of millions of euros per year.

However, building these systems is hard. And getting them into production and used by the business is even harder. Let’s go through what it takes at Picnic to productize our data science projects, a term we’re affectionately coining ‘Data Science as a Product’.

 

Why is this so hard?

 

A study from July 2019 found that 87% of data science projects don’t make it to production. There are numerous reasons cited; everything from lack of support from leadership, siloed data sources, and lack of collaboration. Beyond these issues, there are inherent aspects that make data science and machine-learning projects different from other software development.

First, data science, and especially machine learning, lives in the world of probabilities and uncertainties. A typical output of a machine learning-based payment fraud model would be along the lines of, ‘the probability of this order being fraudulent is 73% +/- 5% with a 95% confidence interval’. Our counterparts on the business side live in a determinist world, ‘we want to block all fraudulent orders’. Translating between these worlds is not an easy task.

Additionally, there is a non-linearity in data science projects that (usually) doesn’t occur in ‘traditional’ software development. We don’t know how well a model will perform before we start building it. It can take a week, three months, or it may not be possible to get an acceptable level of performance. This makes it very difficult to put together a nice project plan with timelines and deliverables that the business wants to see.

Finally, the importance of model trust is hard to overstate when releasing a model to production. When working with the business to productionalize a model, we are entering a domain in which they are the experts. In many cases, we are looking to automate a manual process or replace a set of carefully crafted business rules. While these rules aren’t perfect, they were built by those with a deep understanding of the business. Handing off a black-box machine learning algorithm and telling the business that it is going to replace their current way of working is a challenging task. In the end, it’s the business that owns the profit/loss from whatever process the model is looking to automate, and we as data scientists need to convince them to put their livelihood in the hands of our models.

From our experience, successfully productionalizing models across a wide range of domains can be achieved by the following factors:

  1. Use case selection
  2. Business alignment
  3. Agile (data science) development

 

Use Case Selection

 

“I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.” — Abraham Maslow

The universe of problems that could be solved by machine learning is massive. You have countless use cases in customer success, supply chain, distribution, finance, and more. Given the ease of accessing high-quality data in Picnic’s beautifully maintained data warehouse, it’s difficult to know where to start. Choosing the correct use case is crucial to the success of a data science project.

So how are you going to decide what use case to pick?

  • The one with the most business value?
  • The ‘low hanging fruit’ to deliver a quick win?
  • The one that aligns with the company’s strategic objectives?

Here at Picnic, we take those into account, but the critical deciding factor comes down to one thing:

How confident are we that machine learning is the best approach to solve this problem?

(Remember when I said that we data scientists are used to probabilistic thinking?)

We want to make sure that our data scientists’ time is used most effectively. Let’s say there is a compelling problem that can generate huge value, but a few carefully crafted business rules can get us 80% of the value. Is it the best use of resources to have the data science team spend months trying to get an additional 10%? Probably not.

Using our Zen of Data Science principles to guide us, we can break down the use case selection criteria into several components:

  1. Do we have sufficient clean, high-quality data to model the problem?
  2. Is there a clear objective criterion (or loss function) that we are looking to optimize?
  3. Is business ready to have this process automated?
  4. How will it fit into a production system? Does that product team have the bandwidth to implement it?
  5. Are there case studies, research papers, or other resources on successful machine learning implementations to solve this type of problem?
  6. Are there any biases or ethical concerns we need to address?

If there are concerns with any of those questions, we will reconsider if this is the optimal project for our team to pick up.

No matter what resources you have to throw at a problem, without the right use case, the chances of success are low.

 

Business Alignment

 

The perfect project plan is possible if one first documents a list of all the unknowns. — Bill Langley

Making sure there is alignment on the goal of the project seems both simple and obvious. The business wants more accurate forecasts. You are confident you can beat the existing system. What’s the issue?

The issue is that it’s not just about the performance of the model.

Let’s say you build a great model and set up a daily job to have it executed. Well, it might turn out that the business needs to be able to update their forecasts throughout the day. All of a sudden, you need a real-time service. Your model performs well on a majority of the articles/segments/regions, but a new product is launching this quarter. Now your model is making predictions with no historical data (the cold start problem says hello).

Machine learning projects require a certain level of understanding from the business on how the systems work. They need to know the inherent strengths and weaknesses of machine learning models, how edge cases are handled, and what features are used.

On top of that, you need to know how the model will be used. What is the expected output? How will the predictions be consumed? What happens if the model doesn’t run, does there need to be a fallback mechanism? It saves many headaches, tense discussions, and late nights reworking when you know the answer to these questions before you start development.

The issue of model trust comes up again. What happens if the business doesn’t trust the output of your model?

You can present all the ROC curves, F1 scores, and test set performance you want, but if the first few predictions your model makes happen to be incorrect, will it be given a chance to recover? The basic business rules that were previously in place weren’t great, but the business knew which cases it worked well and when it didn’t, then could intervene accordingly. Your models (hopefully) have an operational impact, and if the business doesn’t trust them, then they won’t be used. Simple as that.

Discussions on model trust are uncomfortable but essential. You need to understand upfront what it will take for the business to be able to use your model in production. At the very least, an evaluation period with performance metrics needs to be decided and signed off on by both parties.

Differences in expectations between data scientists and the business cause the end of many data science projects. Dialogue needs to happen before months of work have been spent on development. Your model’s life may depend on it.

 

Agile Data Science Development or...

 

MVP over POC.

“When you’re fundraising, it’s AI. When you’re hiring, it’s ML. When you’re implementing, it’s linear regression. When you’re debugging, it’s printf().” — Baron Schwartz

 

Drake agrees, MVP over POC.

Agile development has become the de facto standard for software development but hasn’t made its way to the data science world (yet). Data science projects today tend to happen with the ‘build it and they will come’ mentality. A data scientist meets with the business on the problem, decides what metric to optimize, and asks how they can get access to the data. Then they go off and spend a few months building a beautiful, robust model and present the finished product. And then…

…it doesn’t get used. The same core reason why agile development works can be applied to data science: it needs to be customer-focused.

What is effective is skipping a proof of concept (POC), which tends to never leave the data scientist’s laptop, and focusing on creating a minimum viable product (MVP).

For an MVP, the goal is getting an end-to-end solution built as quickly as possible. You build the data pipeline, start with a basic baseline model (also known as linear or logistic regression), and expose the predictions to the end consumer. A real-life example can be seen in how we find the optimal drop times using machine learning.

The same reasons why this has become the de-facto standard in software development can be applied to machine learning. We are doing what we can to follow the core principles of the agile manifesto:

Focus on working software

  • Don’t spend time fine-tuning a model that may never get used. Spend it on building a working, viable

Customer collaboration

  • Cut the time to market down significantly so your ‘customer’ sees the output as it would be from a more sophisticated system. You can iterate and improve from there.

Responding to change

  • It’s better to find out in the second week than the second quarter what works. Maybe the in-house built system you were planning to integrate with doesn’t have a way to expose the data you need. Be flexible with the requirements and ship working code early and often.

The hard part of data science projects isn’t the modeling. It’s everything else. By focusing on an MVP, you can get a working system into production, outputting predictions much faster. You will discover issues sooner and give your customers a new, shiny model within weeks instead of months.

In the end, our goal isn’t to build a model for the sake of building a model. We are building a product where one component is a model. And we can take the learnings from decades of product development to get us there.

 

Conclusion

 

Building machine learning-based products is not easy. You have all the components of software development with the added complexity of machine learning at its core. It’s a new field without blueprints on how to do it well. By ensuring that you’ve selected the right use case, aligned with the business, and followed tried-and-true agile software development practices, you can set yourself up for success.

At Picnic, technology, data, and machine learning are at our core. We have an extremely talented group of data scientists, a scalable, self-service data science platform, and the full support of the business to get our models into production.

Original. Reposted with permission.

 

Related: