Data Science is Overrated, Here’s Why
Think twice before jumping on the data science bandwagon.
Image from freepik
People have been raving about data science for around 10 years now, ever since Harvard Business Review dubbed it the “sexiest job of the 21st century.” I got sucked into the hype myself, which is why I pursued a degree in the subject 4 years ago.
Over time, however, I realized that data science was too hyped up for its own good. The field is so glamorized that the term is thrown around by people who don’t really understand what it entails.
You must have heard the phrase “85% of all productionized machine learning models will fail.” This was a prediction made by research firm Gartner, and honestly, I wouldn’t be surprised if the number was higher. Personally, I myself have worked on multiple failed data science projects in the past.
And a failed data science investment can incur massive expense to organizations, especially if an incorrect business decision is made based on the output of a predictive algorithm.
In this article, I will highlight some of the major problems that stem from there being so much hype around data science and how they can be overcome.
People Tend to Put Machine Learning On a Pedestal
Machine learning isn’t a one size fits all solution to every problem, which is necessary for stakeholders and non-technical managers to grasp. Not every problem can be solved with machine learning, and not every problem should be.
I once worked with a senior professor (who didn’t specialize in data science) to complete my capstone project. Our faculty partnered with a company that wanted to implement a cyber-security solution in place to filter malicious traffic from entering the system.
The data was available in the form of log files, which they didn’t send us enough of.
We cleaned it and tried to build a dashboard to analyze the data points. Our professor insisted we use a machine learning model. He said it would impress the client.
We said it wouldn’t work. There wasn’t enough data.
He didn’t listen.
We asked for more data.
He said it wasn’t possible, and that we had to work with what we had.
We built the model and presented it, and he was happy with the high accuracy score he saw during the presentation. The dataset was imbalanced, so our model was only predicting one class. He didn’t know that. We tried to explain it, but it didn’t seem to matter to him.
Anyone who has worked in the data industry for over a month has likely experienced a situation like the above, although probably less extreme.
Upper-management, stakeholders, and business teams are often people who don’t come from a technical background. When hearing terms like “data science” and “machine learning,” their expectations tend to rise. It is up to the data scientist to bring them back to reality and clearly explain the difference between what is possible and what isn’t.
Using Machine Learning When it Isn’t Required
I get it. It can be tempting to suggest building a machine learning model even when it isn’t really required. I do it too. And many people, especially non-technical ones, often equate data science to machine learning.
However, as a data scientist, don’t start building a predictive algorithm until you are sure that machine learning modelling is your best option.
Building machine learning models, especially on large amounts of data is computationally heavy and can incur unnecessary expense to the organization.
And if the problem can be solved with hard-coded logic or simple calculations in a spreadsheet, why waste time building a machine learning model?
Here’s an example of a data scientist using machine learning to solve a non ML problem:
Hiren was recently hired as a data scientist in one of India’s top banks. He built a KNN algorithm to identify the most profitable industry segments that the bank should focus on. He emphasized that they should target 2 out of the 33 industries in their portfolio.
The outcome underwhelmed the business team members, as they already knew this. They were able to come to the same conclusion with simple back-of-the-envelope calculations.
It was unnecessary for him to waste time building a machine learning model that provided them with the exact same results.
Hiren then decided to dig deeper and learn more about the business needs, so he could better contribute to the team using his data science expertise. After acquiring some domain knowledge, he realized that the company would benefit from recommendations at a customer level rather than an industry level.
Instead of just telling the company which industries to target, he could identify the most profitable clients within these industries. These were insights that the business team did not already possess, as it was a lot more complex for them to derive meaning from a large client level dataset.
The story above teaches us two crucial lessons when attempting to build a data science solution:
- Don’t use machine learning if there is a simpler solution available.
- Only start building predictive models once you understand the business requirement. Otherwise, you will end up spending time on a fancy algorithm that nobody else uses.
Hiring Data Scientists With No Data In Place
When data science skyrocketed in popularity in 2012, companies started to mass hire data scientists. Most of them didn’t have a data pipeline in place, but expected data scientists to come in and start adding value.
Unsurprisingly, this didn’t happen.
A majority of data scientists don’t know how to build data pipelines. They work with prepared data that can be easily pulled out from an existing, pre-processed database.
This led to a lot of frustration on both sides.
The data scientist came in ready to start building machine learning models, and was unable to do so. The company had invested a lot into their data science team, from whom they were receiving no business gain whatsoever.
By deciding to jump on the data science bandwagon solely due to hype, these organizations lost a massive amount of time and money.
So…Is a Career in Data Science Still Worth Pursuing?
The problems above all stem from there being too much hype around data science. Students tend to rush into the field too quickly because they want to learn a skill that is highly in demand. Employers start mass hiring data scientists without completely understanding the role.
These misaligned expectations lead to a lot of frustration on both sides.
Data scientists also do have the tendency to put too much emphasis on predictive modelling when in most cases, a simple data analysis would suffice.
And when the machine learning algorithm doesn’t live up to employer expectations, this again leads to a very uncomfortable working situation.
However, if expectations are aligned on all sides, pursuing a career in data science can still be extremely worthwhile and fulfilling. If you are someone with data science skills, here are some tips on finding a suitable company to join:
- Make sure the company has a data team in place. This means that the organizational data literacy is high, and the chances of you being asked to meet bizarre expectations are lower.
- Interview your employer. Ask your hiring manager about what the role really entails; how much data the company has, how the data is being stored at the moment, and the value you’re expected to bring to the table.
"Keep in mind that the advice above mostly applies to junior data scientists. Senior data scientists, consultants, or managers might be hired to help implement the company’s data science processes, even at a stage where there is no structured data pipeline in place."
Once you land a job at a company you’re comfortable with, make sure to always question whether machine learning really is the right approach to take. Understand the business problem to the best of your ability before coming up with a solution, which would otherwise be irrelevant to stakeholders.