The Curse of Delayed Performance

Predict the performance of your model -  before the ground truth is available.

Sponsored Post

Most data scientists in their career have developed the intuition that machine learning models are dynamic systems. You set performance metrics, train the models on historical data, and put them out into the real world to make decisions. But then the real-world changes.

But how do we know when the real world changes? What happens to these models and their performance when it does? In many cases, we have no idea until it’s too late.

“The first tricky situation is the delayed ground truth. This is very often the case in the financial industry, in which banks or other institutions use ML models to predict which customers are likely to default on their loans. Naturally, they do not know if a certain customer defaulted or not until their loan is fully paid off.” says Eryk Lewinson in his article on the subject

The Curse of Delayed Performance

“Nobody is interested in how your model performs on old data. The only thing that matters is how the model will do on new data," says Samuele Mazzanti, in another article on the topic. And when your real performance information is delayed, it leaves a gap filled with uncertainty about how the model is dealing with new data.

So how do we fill this gap, and fight the curse of delayed performance? Samuele describes how NannyML (an Open Source python library), can help by predicting how well the model is behaving before the ground truth data is available.

Confidence-Based Performance Estimation is their flagship algorithm. It reconstructs the expected confusion matrix and calculates the expected ROC AUC for a set of n predictions in the real world. All without access to ground truth.

The Curse of Delayed Performance

Once you see a change in model performance, NannyML helps you find the root cause by connecting data drift to performance issues.

You can read even more about this in a third blog discussing predicting model performance by Michał Oleszak!

If knowing the performance of your ML models is vital for you, you can check out NannyML on Github, and star it to let the creators know how much it matters!

They also have a community Slack where you can discuss your problems, how they can be addressed, and what you would like to see included in the library!

To celebrate their launch, NannyML are giving away a brand new RTX 3090 Ti graphics card! They have searched across countries to find one. Find out how to enter by following them on Linkedin.

The Curse of Delayed Performance




  • Mazzanti, S. (2022, May 11). Predict your model’s performance (without waiting for the control group). Towards Data Science. 
  • Lewinson, E. (2022, May 12). Estimating the performance of an ML model in the absence of ground truth. Towards Data Science. 
  • Oleszak, M. (2022, May 13). Estimating model performance without ground truth - Michał Oleszak. Medium. 


More on this topic:


  • Open Source Spotlight at Data Talk Clubs

  • Parreno.G. (2022, May 9). Intro to Post-deployment model performance - - Medium. MLearning.Ai.