Silver BlogDesign patterns in machine learning

Can we abstract best practices to real design patterns yet?



By Ágoston Török, Director Data Science, AGT International

According to its definition, a design pattern is a reusable solution to a commonly occurring problem. In software engineering, the concept dates back to 1987 when Beck and Cunningham started to apply it to programming. By the 2000s, design patterns — especially the SOLID design principles for OOP — were considered common knowledge to programmers. Fast forward 15 years and we arrive at the era of Software 2.0: machine learning models start to replace classical functions in more and more places of code. Today, we look at software as a fusion of traditional code, machine learning models and the underlying data. This fusion requires a seamless integration of these components, which is often far from trivial given the fields’ often disparate history and evolution.


Today, we look at software as a fusion of traditional code, machine learning models and the underlying data.


Design patterns, however, have not been extended yet to deal with the challenges of this new era. In Software 2.0 common challenges do not appear only at the code level but at the level of problem definition, data representation, training methods, scaling, and on the ethical aspects of the design of AI-enabled systems. This creates a fertile ground for the practice of machine learning antipatterns. Unfortunately, today even blogposts and conferences feature sometimes antipatterns: practices that believed to improve things but in reality they make things worse. Since antipatterns also require skills, they are often not recognized as such by their practitioners. Therefore in the following, I will give two examples of common ML challenges but, instead of starting with the design pattern, I will introduce first their solution antipatterns.

 

The model shows bad performance on the evaluation metrics

 
In the common scenario, after collecting, cleaning, and preparing the data the engineer trains a first model and finds that it shows bad performance on the test data. A common antipattern is to replace the first model with a more complex one (e.g. often gradient boosted trees) and improve the performance by this. A variation of this antipattern may follow this step by combining several models by e.g. model averaging.



Donald Knuth famous quote “premature optimization is the root of all evil” is almost 50 years old and is still true. Image with permission from tddcomics.

 

The problem with these methods is that they look only at part of the problem, i.e. the model, and choose to resolve it by increasing the complexity of the model. This steps forces us to accept the high risk of overfitting and to trade explainability for additional predictive power. While there are efficient practices to mitigate the side effects of this choice (e.g. LIME), we cannot fully eliminate them.

The design pattern is error analysis. This in practice means looking at where our model made errors, either by assessing the model fit on different test sets or by even looking at individual cases where our model was wrong. Although, we all heard the saying “garbage in, garbage out”, still very few people appreciates how much this is true even for little inconsistencies in the data. Maybe the labels are coming from different raters, each having their own, slightly different interpretation of the labelling guidelines. Maybe the way of collecting the data has changed over time. The effect of error analysis is especially strong for small data problems. However, we should also keep in mind that in a significant proportion of big data situations we also deal with long tail events (e.g. identify rare talents from an admission exam).

The true power of error analysis comes from the fact that we do not trade either explainability or risk of overfitting by applying it, in fact solely applying it yields critical knowledge about the distribution of the data. Error analysis, furthermore, enables us to choose both model-centric (e.g. more complex model) and a data-centric (e.g. further cleaning steps) solutions.

 

Performance degradation over time on a deployed model

 
The model goes through extensive validation and is deployed to production. The users are happy and give positive feedback. Then, a month/quarter/year later, reports are coming in that tell about flaws in prediction. This is usually a manifestation of concept drift, the connection that your model learned between input and output has changed over time. There are places where such concept drift is commonly known (word semantics, spam detectors) but ‘concept’ drift can happen in any field. For instance, masks and social distancing regulations challenged many previously deployed computer vision models too.



ML systems without retraining assume no change in the learned relationship between input and output. Image with permission from tddcomics.

 

A common antipattern is to attribute these examples to noise and expect the situation to stabilize with time. This means not only a lack of action but a false attribution too, which should be generally discouraged in a data-driven business. A slightly better antipattern is to react to the reports with quick retraining and deployment of a new model. This is an antipattern even in the case when the team assumes they follow agile software development principles and therefore choose to be quick in reaction to change. The problem is that this solution addresses the symptom but not the flaw in the design of the system.

The design patterns are a continuous evaluation of performance, which means you expect drifts to happen and, hence, design the system to notice it as soon as possible. This is a completely different approach as the focus is not on the speed of reaction but on the speed of detection. This puts the entire system in a much more controlled course giving more room for prioritization of any reaction. Continuous evaluation means establishing processes and tools to continuously generate ground truth for a fraction of the new data. In most cases this involves manual label, often using crowdsourced services. In some instances, though, we can use other more sophisticated but in the deployment setting not feasible models and devices to generate ground truth labels. For example, in the development of self-driving cars the input from one sensor (e.g. LiDAR) can be used to generate the ground truth for another sensor (e.g. camera).

 

The SOLID design principles of machine learning

 
The reason I’m writing about design patterns is that this field has reached the level of maturity where we should not only share our best practices but we should be able to abstract them to real design patterns. Luckily, this work has been started by multiple groups already. In fact, two books have been published recently on the topic [1], [2]. I enjoyed reading them but I was still left with a feeling that although we are going in the right direction we are still few steps away from formulating the SOLID design principles for ML practitioners. I believe that while the underlying knowledge is already available and is used to build the AI-enabled products of today, work on design patterns and antipatterns is an important step towards the era of Software 2.0.



Design patterns are the foundation of the craftsmanship of machine learning. Image with permission from tddcomics.

 
Bio: Ágoston Török is Director of Data Science at AGT International.

Original. Reposted with permission.

Related: