Reasons Why Data Projects Fail
Many companies seem to go through a pattern of hiring a data science team only for the entire team to quit or be fired around 12 months later. Why is the failure rate so high?
By Martin Goodson, Data Science Strategist.
Data science continues to generate excitement and yet real-world results can often disappoint business stakeholders. How can we mitigate risk and ensure results match expectations? Working as a technical data scientist at the interface between R&D and commercial operations has given me an insight into the traps that lie in our path. I present a personal view on the most common failure modes of data science projects.
Slides in one pdf here. The long version with slides and explanatory text below.
There is some discussion at Hacker News
First, about me:
This talk is based on conversations I've had with many senior data scientists over the last few years. Many companies seem to go through a pattern of hiring a data science team only for the entire team to quit or be fired around 12 months later. Why is the failure rate so high?
A very wise data science consultant told me he always asks if the data has been used before in a project. If not, he adds 6-12 months onto the schedule for data cleansing.
Do a data audit before you begin. Check for missing data, or dirty data. For example, you might find that a database has different transactions stored in dollar and yen amounts, without indicating which was which. This actually happened.
No it isn't. Data is not a commodity, it needs to be transformed into a product before it's valuable. Many respondents told me of projects which started without any idea of who their customer is or how they are going to use this "valuable data". The answer came too late: "nobody" and "they aren't"
Don't torture your data scientists by witholding access to the data and tools they need to do their job. This senior data scientist took six weeks to get permission to install python and R. She was so happy!
Well, she was until she sent me this shortly afterwards:
Now, allow me to introduce this guy:
He was a product manager at an online auction site that you may have heard of. His story was about an A/B test of a new prototype algorithm for the main product search engine. The test was successful and the new algorithm was moved to production.
Unfortunately, after much time and expense, it was realised that there was a bug in the A/B test code: the prototype had not been used. They had accidentally tested the old algorithm against itself. The results were nonsense.
This was the problem:
Original. Reposted with permission.