4 Reasons Why You Shouldn’t Use Machine Learning
It's time to learn: machine learning is not a Swiss Army knife.
By Terence Shin, Data Scientist | MSc Analytics & MBA student
Photo by Nadine Shaabana on Unsplash
When machine learning initially emerged, many speculated that it would spark another industrial revolution. Fast forward to today and many would say that it’s nothing more than a buzzword.
Don’t get me wrong. Machine learning is a useful tool, but it’s nothing more than that. And it’s a stretch to say that it’s anything like a swiss army knife — I’d think of it more like a water jet (something rather niche).
From my experiences, there are certainly a number of applications where machine learning shines. For example, Amazon’s recommendation system increased sales by over 30%. However, there are a greater number of applications where machine learning is a suboptimal solution.
In this article, we’re going to go over 4 reasons why you shouldn’t use machine learning.
With that said, let’s dive into it!
1. Data-related issues
As seen in the AI hierarchy of needs, machine learning relies on several other factors that serve as a foundation. This foundation encompasses everything from collecting data, storing data, moving data, and transforming data. It’s important that you have a robust process that achieves these preliminary steps or it’ll be less likely that you have reliable data.
Why is this so important? You’ve heard of the saying “garbage in, garbage out” — the performance of your machine learning models are limited by the quality of your data, which is why it’s so important that you have reliable data to start with.
Not only do you need your data to be reliable, but you need enough data to leverage the power of machine learning. Without these two criteria checked out, you won’t be able to get the full power of ML.
There are two general categories of models: predictive models and explanatory models:
- Predictive models solely focus on the model’s ability to produce accurate predictions.
- Explanatory models focus more on understanding the relationships between the variables in the data.
Machine Learning models, particularly ensemble learning models and neural networks, are predictive models — they are excellent at formulating predictions and far exceed the predictive power of traditional models like linear/logistic regression.
That being said, when it comes to understanding the relationships between the predictive variables and the target variable, these models are a black box. While you may understand the underlying mechanics behind these models, it’s still not very clear how they get to their final results.
And while some techniques like feature importance and correlation matrices exist, they are still quite limited in understanding relationships in your data. Overall, ML and deep learning are great for prediction, but lack in explainability.
3. Technical Debt
Maintaining machine learning models over time is challenging and expensive. Particularly, there are several types of “debt” to consider when maintaining machine learning models:
- Dependency debt: Dependency debt refers to the debt incurred from unstable data dependencies and underutilized data dependencies. In simpler terms, this refers to the cost of maintaining multiple versions of the same model, legacy features, and underutilized packages.
- Analysis debt: This refers to the idea that ML systems often end up influencing their own behavior if they update over time, resulting in direct and hidden feedback loops.
- Configuration debt: The configuration of machine learning systems themselves also incur a debt similar to any software system. It should be easy to make small configurations, it should be hard to make manual errors, and it should be easy to see the difference between different models.
The full paper is here if you want to read about the hidden technical debt in machine learning systems.
4. Better Alternatives
Lastly, machine learning shouldn’t be used when simpler alternatives exist that are equally as effective. In my previous article, “Want to be a Data Scientist, Don’t Start with Machine Learning,” I emphasized the point that machine learning is not the answer to every problem.
A simple solution that takes 1 week to build that is 90% accurate will almost always be chosen over a machine learning model that takes 3 months to build that is 95% accurate.
Ideally, you should start with the simplest solution that you can implement and iteratively determine if the marginal benefits from the next best alternative outweighs the marginal costs.
If you can solve your problem with a Python script or a SQL query, you should do that first. If you can solve your problem with a decision tree, you should do that first. If you can solve your problem with a linear regression model, you should do that first.
You get the point. Simpler = Better.
Thanks for Reading!
I hope that this provides some insight into the limitations of machine learning and how it’s not a one-size-fits-all solution. Do keep in mind that this is more of an opinionated article that is backed by anecdotal experience, so take what you want from this article. But as always, I wish you the best in your learning endeavors!
- If you enjoyed this, SUBSCRIBE to my Medium for exclusive content!
- Likewise, you can also FOLLOW me on Medium
- Interested in collaborating? Let’s connect on LinkedIn
Bio: Terence Shin is a data enthusiast with 3+ years of experience in SQL and 2+ years of experience in Python, and a blogger on Towards Data Science and KDnuggets.
Original. Reposted with permission.