My machine learning model does not learn. What should I do?

This article presents 7 hints on how to get out of the quicksand.

By Rosaria Silipo, KNIME & Diego Arenas, St Andrews University, UK

If you work with data in general, and machine learning algorithms in particular, you might be familiar with that feeling of frustration when a model really does not want to learn the task at hand. You have tried it all, but the accuracy metric just won’t rise. What next? Where is the problem? Is this an unsolvable task or is there a solution somewhere you’re not aware of?

First of all, do not worry. This feeling of being lost and not knowing how to proceed has happened - and still happens - to all of us, even the most experienced machine learning experts. Artificial Intelligence (AI), data science, or predictive analytics - whatever the name - is more complex than just a series of predefined steps to follow. The choice of suitable model, successful architecture, best hyperparameters, depends strongly on the task, the data domain, and the available dataset. There is still a large amount of manual crafting involved to obtain a viable AI model, which partially justifies the aversion some data scientists feel towards many advertised automated applications claiming to train successful data science models.

Now that we have hopefully calmed you down, let’s see what you can do to get out of this impasse. Here we propose a few tricks we use when the model training seems to slip out of our grasp.

Before unleashing your creativity, please run some due diligence and make sure that there are no obvious mistakes in your training procedure. Here’s a checklist you might like to use to make sure that:

The purpose of the project is clear and well defined, and aligned with all stakeholders. This means that you have a clear question your project will answer.
The quality of your data is sufficient.
The training set has an adequate size for the model architecture you are trying to train.
Your training process is not too slow. If your training set is too large, you can extract a smaller sample for training
For classification problems: All classes are described adequately within the training set. If you have a few classes with very low representation, you can group them together as ‘Others’. If you have too many classes, you could try to reduce the number by grouping them first, using a clustering algorithm.
There is no data leakage from the training set into the test set
The dataset does not have noisy/empty attributes, too many missing values, or too many outliers
Data have been normalized if the model requires normalization
All hyperparameters have been set and optimized correctly, which implies that you understand the algorithm. Always learn the basics before using the technique, you will need to be able to explain the process and results to the stakeholders.
The evaluation metric is a meaningful one for the problem at hand
The model has been trained long enough so as to avoid underfitting
The model does not overfit the data.
Performance is not too good. Be suspicious of models that are either 100% accurate or show 0% error: this is a red flag that something’s wrong with the model, most likely the dataset includes a synonymous variable to the target variable you are trying to predict.
And finally, are you sure you are stuck? Sometimes, the model works nicely, or at least reasonably well, and what we miss is just the next step.

If any of the previous mistakes occur, the problem should be quickly identified and the solution implemented to fix it. We provide here a few ideas to help you with your due diligence list and with creative inspiration when the current implementation seems hopeless.

Check the Alignment with Business Purpose

Everybody says it and we repeat it. Before you keep going, coding like a bulldozer, make sure you understand what the business project is. We speak different languages in different departments. Let’s make sure we understand what is required before starting.

Let’s make sure, for example, that we are not planning to build a traffic prediction application while the stakeholders expect a self-driving car. It should of course be clear that, with the allotted time and that much of a budget, a traffic prediction application is more likely to happen than a self-driving car. But, as we said, we speak different languages. Let’s make sure our goals align.

Once we have started, let’s periodically ask for feedback on the intermediate milestones. It is much less frustrating to incorporate changes in an unfinished application rather than getting a disappointed face from the stakeholders, when presenting the final product. This happens more often than you’d think.

Iterate quickly on the first versions of the prototype and show the results early on for timely feedback.

Check the Data Quality

If you are sure that you are implementing the right application, with the stakeholders’ desired requirements, then maybe the problem is in your data. So, the next step is to check the data quality. For any data science task, you need adequate data.

First of all, the data has to be representative of the task. If it is a classification problem, all classes must be sufficiently represented in all their nuances. If it is a numerical prediction problem, the dataset must be general enough to cover all foreseeable situations. If the data quality is insufficient (tdataset is too small or not general enough) for the model you are trying to train, there are only two options: you revise the business specifications or you collect more and more general data.

Second, all dimensions of the data domain must be represented. All necessary information must be present in some of the input attributes. Make sure you import and preprocess all attributes you have.

Explore adding features using feature engineering techniques. Evaluate the usefulness of your features and focus on them. Can you create new features with more impact on your model?

Next, make sure that columns with too many errors, too much noise, or too many missing values are removed. They risk influencing the algorithm and producing unsatisfactory results. In the era of big data there is the idea that the more data the better, but this is not always true.

Also make sure that there is no data leakage from the training set into the evaluation set. Hunt for duplicate records; enforce strict time based partitioning for time series analysis; make sure that the logic of the data collection is respected. Some fields are often recorded after the target variable. Including such fields gives away the value of the target variable, leading to an unrealistic measure of the performance, and to an unfeasible deployment application. We remember a project on predicting fatalities in car accidents, where the field “race” was filled only for some races and only AFTER the fatality had happened. The algorithm started associating a high likelihood of death only to some of the possible races. Ehm …, make sure you act within the limitations of the data collection process.

Explore the Model Size and Hyperparameters

Data is good. Business goals are clear. My algorithm is still not performing. What is wrong? The moment has come to get your hands around the algorithm and tune some of its hyperparameters.

The size of the machine learning model is an important hyperparameter. A too small model leads to underfitting, but a too large model leads to overfitting. How can we find the right middle point? To do that automatically, you can use regularization terms, pruning, dropout techniques, and/or just a classic validation set to monitor progress.
If you have adopted deep learning neural networks for your project, there are a number of different units you can use, every unit somewhat specialized in dealing with a particular type of problem. If you are dealing with time and need your algorithm to remember the past, then recurrent neural units, LSTM units, or GRUs might be more appropriate than classic feedforward units. If you are dealing with images, a cascade of convolutional layers might help the neural network to work on better extracted image features. If you are dealing with anomalies, then an autoencoder architecture could be an interesting alternative. And so on.
For deep learning neural networks, there are at least as many activation functions as there are neural units. Choose wisely. Some activation functions perform well in some situations, some do not. For example, we learned the hard way that ReLU activation functions - while popular in the deep learning community - do not perform well in an autoencoder. The autoencoder seems to be partial towards old-fashioned sigmoids.
Moving away from neural networks, the information/entropy measure in decision trees and random forests is an interesting hyperparameter to explore.
The number of clusters in a clustering algorithm can discover different groups of data.

And so on, for many more machine learning algorithms. Of course, the capability to fine tune the hyperparameters of a model assumes that we have full knowledge of the algorithm and of the parameters that control it. Which brings us to the next question: do we understand the algorithm we are using?

Understand the Algorithm

With the spread of easy to use interfaces to data science tools, it has also become easier to abuse the algorithms. Do not be fooled though! The easy GUI does not take away the complexity of the algorithm behind it. We have seen plenty of random choices in setting parameters in not so well understood algorithms. Sometimes, the best thing to do is actually to sit down and learn something more about the math behind the core algorithm or its variations.

The internet is full of learning material, in the form of courses, videos, blog posts. Just use them! Every once in a while stop what you are doing and dedicate some time to deepening the knowledge, for yourself or as a group.

From the planning of the project think of a new thing to test or try out (a software, a technique, an algorithm, etc.). Use the space of the project to acquire some additional experience. In that way, internal knowledge within the organization can be expanded and drive innovation. This will be an investment for the future. Keep a backlog of ideas you think could work on your data, write down what you would do if you had more time. This backlog can help you when the project turns out successfully and stakeholders would like to know more about the implementation, or when the project doesn’t turn out so well and you need a list of potential activities to carry on right after.

Here is a hint. Work on the organizational culture at the same time as working on the project. The relevant factor in a project is often the people involved and it’s the people who create the internal culture of the company. So, make sure you create the right culture for your organization with the right people within your project.

Search for Existing Solutions

To train a machine learning model successfully, you need to understand the algorithm, sure. You also need to apply all those tips and tricks required or aversed by the chosen algorithm: normalization, shuffling, encoding, and other similar operations. It is likely that other data scientists have implemented a similar solution before, so:

Do not reinvent the wheel! Search for existing solutions to similar tasks and adapt them to your project. Search not only in books and scientific articles, but also in blogs and code repositories, and talk with colleagues. Make sure you take and reuse successful ideas from previous projects. Colleagues especially are a great source for material, hints, and ideas. Be willing to reciprocate and share your work and take meaningful feedback. Even organize review sessions, where your colleagues can help with their experience and creativity.

Do not limit yourself to solutions to similar tasks in the same domain. There is a great advantage in the cross-pollination of fields. Indeed, anomaly detection in mechanical engines and fraud detection in credit card transactions often share the same techniques. Read up what has been done in different spaces. Do not be afraid to attend a webinar in chemistry, if you are working on fraud detection. You never know the kind of inspiration you might get out of that.

What is next?

A data science project is not just training an algorithm. There is much more to it, to make sure that all pieces come together to compose a successful solution. Before training the algorithm, we need to prepare the data appropriately. Even after training and testing the algorithm, we need to trigger a few operations on the basis of the algorithm’s response. Often projects get stuck waiting on what to do now that we have the results.

If we build a churn predictor, we need to implement actions, like campaigns or answers by the call center for those customers at high risk of churning. If we build an analysis of the influencers and their posts on social media, we need to implement a strategy to approach them. If we build a chatbot, we need to know where to integrate it within the final application. If we build an automatic translator, we need to insert it at the end of the speech before reaching the other user in the conversation. If we build an anomaly detector, we need to schedule it to run regularly to discover issues with the mechanical chain in a timely manner.

We need to be prepared, when we reach this part. We need to think of the actions and decisions we will make based on the insights from the project and be prepared to take the next steps.

Just Stop & Breathe

If you have tried all of that, and your model still refuses to learn to an acceptable error level, then just stop, take a day off, get some distance from your work, and then circle back to it with a fresh mind. This has worked for us on many occasions.

Do not give up!

Machine learning models are great when they work, but it is not an easy task to make them work. Similar to the machine learning problem itself, it is a multidimensional problem figuring out what exactly went wrong and how to improve it. We hope we have provided you with a list of ideas to get out of the quicksand of a non-improving model.

However, if you are sure that the model architecture has been implemented properly, the training set is sufficiently general, the evaluation metric is the right one, the model has been trained long enough and yet does not overfit the data, if you have run your due diligence on the application and have found nothing wrong with it, then the problem must be of a more general nature. Maybe the model is not the right one, maybe the preprocessing is not suitable for that kind of data, or maybe some other basic flaw is affecting your procedure. It’s time to do some more reading & exploring!

Our general advice for you is to keep trying out things and don’t give up the first few times that your model doesn’t work. After implementing several models and a few trials and errors, you will become an expert, and it is our turn then to hear about your experiences.

Related: