How to Turn your Data Science Projects into a Success
This interview with Dr. Olav Laudy, Chief Data Scientist for IBM Analytics, is a summary of a recent conference where he participated in a panel on the Big Data and Analytics
With the wealth of knowledge and the extensive experience of rolling up his sleeves in countless data science projects in various industries worldwide, Dr. Olav Laudy shares a unique perspective as to what companies are doing wrong, and how best to get the most out of Data Science.
Q: Within IBM, you’ve been known to have built over a hundred predictive models for companies around the world. Can you share some examples of projects you have been involved with?
Dr. Olav: I helped a large retailer build up its analytic supply chain model; the predictions of the model indicated the likelihood of sales for every product in every store and the predictions were used to determine the prioritization of products in the warehouse. The 125,000 products in 600 stores lead to 75 million predictions on a monthly basis. The predictive model optimized the spare part assortment, ultimately leading to an increased customer satisfaction by ensuring the right products in the right stores always being available while keeping the in-stock assortment to a minimum. The project was very interesting because it was a mix between the analytical modeling and deployment of the predictive models in the business. The latter is a common challenge for companies.
Q: I’ve heard this before, but once you have built a predictive model, scoring is the simple part. Can you explain that challenge?
Dr. Olav: A model as end-product of a data science project is not at all ready to be deployed. It typically requires rationalization and need a lot of governance around it before it’s acceptable for the business. The rationalization revolves around understanding which predictions are used and which predictions are ignored. For example, if the retailer sold more than 10 items of a product in a store the year prior, and the model indicated that the demand for next year would be very low, the business decided to override the prediction and ensure the item was not taken out of the assortment. This is a very simple example, but it shows that you can’t just take every prediction and apply that in the business context without further thought.
The governance is concerned with setting up processes for model scoring on new data, auditing those scores, doing continuous model validations against new observations, setting up procedures to handle escalation if models do not work as expected. All those processes need to be carefully craftedfor an analytic project to be successful. A pure data scientist might be proud with a high AUC or F1 score for a predictive model, but that’s only the start of the analytical deployment process. I always say: a model is as strong as its deployment.
Q: I see. But isn’t that ‘just’ regular project management?
Dr. Olav: ‘Regular’ project management is not aware of the subtleties of working with analytical models. It takes a company quite some effort to become analytic-minded. Maybe there’s a data scientist who knows what to do with data, but it’s the business who needs to learn to appreciate working with analytical outcomes. For example, I once worked with a B2B company who were trying to create an analytical retention program. They had a ‘save desk’ that handled angry customers calling in because of service issues. The model that I built was able to identify customers who were likely to leave – with quite a high success rate. However, when the save desk started reaching out to those customers, they didn’t feel the model was working, because they expected angry customers on the phone. Instead, the model identified issues prior to the customer being at its boiling point. The save desk had to make a turn to become a client relationship desk and once they managed to do so, the analytics started working for them.
Q: Do you think the deployment issue is the largest hurdle for companies to make analytics successful?
Dr. Olav: For me it all starts with the usecase identification. This is not simply stating: “I want to predict X, please apply some deep learning method.” A use case is something that needs to be carefully crafted. I see many opportunities here for data scientists as well. The market will move towards auto-predictions. Today you already see multiple software offerings available that evaluate a large number of predictive models, tune some hyper-parameters and choose the best model. Does this mean you don’t need data scientists anymore? I think the contrary is true: we need many more data scientists. People who have an analytics-savvy view on the business and have the skills to carefully prepare data to answer the exact right business question.
I frequently explain data science as cooking: the better cooking equipment you have, the more sophisticated your meals become. I’ve not seen a kitchen that cooks itself. It’s the data scientist that rather than grilling over a bonfire now needs to learn to operate a modern oven with temperature and time sensors. So, the question now is: does the quality of the roast improve because of the oven or because of the cook who knows how to operate the oven? For the answer is the latter one: although the oven is a joy to work with, it’s the artistry of the chef who makes the meal an unforgettable one.
Q: I like the analogy. Can you get more concrete? How do you identify a use case?
Dr. Olav: Asking the right questions is a good start: Ask yourselves this:
What are you going to do with the predictions?
You need to have that really clear. It’s not enough to say: “we will do marketing with it,” you have to spell out how exactly the prediction is being acted upon. This also forces you to think through the timing of the data (when will what data be available) and the deployment option (automation, data pipelines, real-time or batch scoring). With this comes the question:
–In order for the model to be successful, what is the minimum accuracy required? The easy (business) answer is 90%, but that is seldom thought through. It’s better to understand the efficiency of the current process and understand how much better a model needs to be to make some improvement in the business process. Then the next question follows:
–Do you think that the data will give rise the accuracy required? If the data only contains global characteristics (for example, professional status in 6 categories), then this flatness will also reflect in the predictions. Another simple question is: how often does the data change for one person (provided you predict behavior)? If the answer is: ‘rarely’, then you will find that the predictions will rarely change (meaning: every month you score a customer, the same customer is given the same probability – think what this implies for your marketing approach).
Q: Thank you, those are great pointers. As a final question, do you have anything to say to all data scientists?
Dr. Olav: Mind the AI (Artificial Intelligence) hype. Many companies have barely properly implemented predictive analytics. I see companies dream on AI – specially on executive level – as the golden solution to not have to deal with data anymore. You ‘just’ collect all internal data in one data lake, add external data such as Facebook, Twitter, blog posts, and the new AI will automatically make sense of it all. The truth couldn’t be further away. Although the most advanced models today do surprising things, they all have one thing in common: they rely on carefully selected and curated data – typically only of one or two types at the time (images, text, etc). This means that for the majority of data science projects in companies, the AI hype is far away.
My best recommendation is: don’t dream that your models will solve your problems, but instead ensure what you are doing makes commercial sense. Dig deep into the data and let your findings guide your next exploratory steps. The curious mind will suddenly hit the “Aha!” moment and then use your data science model knowledge to capture your findings into an actionable structure. This is true data-based creativity which is for me the ultimate data science.
Bio: Dr. Olav Laudy is Chief Data Scientist for IBM Analytics, Asia-Pacific. In his current role, he helps IBM clients identify and quantify analytic opportunities. He enjoys articulating complex analytical concepts in layman’s terms, contextualized to the business, and he is known for his ability to accelerate analytic deployments. In his prior position as Worldwide Predictive Analytics Solutions leader, he stood at the birth of great many Analytics projects across all geographies and industries such as Telco, Banking, Automotive, Retail, and Insurances. He is passionate about analytics-based data monetization and deep-learning.
- Deploying Production-grade Data Products – Special Report
- Top Reasons Why Big Data, Data Science, Analytics Initiatives Fail
- An ode to the analytics grease monkeys