How Noisy Labels Impact Machine Learning Models


Not all training data labeling errors have the same impact on the performance of the Machine Learning system. The structure of the labeling errors make a difference. Read iMerit’s latest blog to learn how to minimize the impact of labeling errors.



Sponsored Post.

Imerit Kickstart

Supervised Machine Learning requires labeled training data, and large ML systems need large amounts of training data. Labeling training data is resource intensive, and while techniques such as crowd sourcing and web scraping can help, they can be error-prone, adding ‘label noise’ to training sets.

The team at iMerit, a leader in providing high-quality data, has reviewed existing studies on how ML systems trained with noisy labels can operate effectively. If you wish to learn more about creating the training data you need to succeed in your machine learning application, please contact us to talk to an expert.

Under certain conditions, ML systems trained with mislabeled data can function well. For example, a 2018 MIT/Cornell University study tested the accuracy of ML image classification systems trained with various levels of label noise. They found that the ML systems could maintain good performance with high levels of label noise under the following conditions:

  • The ML system must have a large enough parameter set to manage the complexity of the image classification task. For example, a four-layer convolutional neural network (CNN) was sufficient for a hand-written character benchmark, but an 18-layer residual network was needed to perform well on a general object recognition benchmark.
  • The training dataset must be very large – large enough to include many properly labeled examples, even if most of the training data is mislabeled. With enough good training samples, the ML system can learn accurate classification by finding the ‘signal’ buried in the label noise.

Another factor important to this study’s results was the nature of the labeling noise. The mislabeled samples were added to the training sets in a way that was random enough to not create strong patterns that would override the ‘signal’ represented by the properly labeled samples.

While this study demonstrates that ML systems have a basic ability to handle mislabeling, many practical applications of ML are faced with complications that make label noise more of a problem. These complications include:

  • Not being able to create very large training sets, and
  • Systematic labeling errors that confuse machine learning.

One example of this is a study that used remote sensing and ML to assess earthquake damage.

In 2017, researchers analyzed the effect of mislabeled training data on ML systems used to classify rubble from the 2011 New Zealand earthquake. They had noticed that in this type of remote sensing application, label noise does not follow the sort of random patterns that ML systems were able to tolerate in the MIT/Cornell study. The labeling mistakes they observed were mainly due to inaccurate geospatial delineation, caused by lack of training (e.g., misunderstanding what to include as rubble) or inadequate tools (e.g., a coarsely drawn polygon including undamaged sidewalks as part of rubble).

The researchers simulated training data sets with the sort of geospatial labeling noise they had observed, and also with random labeling noise. They compared the performance of ML classification on these two data sets and found that geospatial mislabeling degraded classification performance about five times more than random mislabeling.

What can we take away from these studies?

  • Not all training data labeling errors have the same impact on ML system performance. If your labeling errors are mostly random in nature, they will be less harmful to your ML system. The errors will not create a large enough ‘signal’ to send training in the wrong direction.
  • If your labeling errors are structured, for example because of repeated misapplication of labeling rules, they can be very harmful to your ML system. The system will learn to recognize the patterns created by this erroneous data as if it were correctly labeled.
  • To reduce the impact of labeling errors:
    • Make sure your training data presents a strong learning ‘signal’ to your ML system with a high enough volume of accurately labeled samples
    • Clearly define labeling requirements upfront. This is absolutely critical – training with labels that don’t adequately reflect what you are looking for in your application will sabotage your ML system
    • Choose a highly skilled annotation partner. The expertise to deliver data that meets your requirements is as critical as the requirements themselves.