Silver Blog6 Key Concepts in Andrew Ng’s “Machine Learning Yearning”

If you are diving into AI and machine learning, Andrew Ng's book is a great place to start. Learn about six important concepts covered to better understand how to use these tools from one of the field's best practitioners and teachers.

By Niklas Donges, SAP.

Machine Learning Yearning is about structuring the development of machine learning projects. The book contains practical insights that are difficult to find somewhere else, in a format that is easy to share with teammates and collaborators. Most technical AI courses will explain to you how the different ML algorithms work under the hood, but this book teaches you how to actually use them. If you aspire to be a technical leader in AI, this book will help you on your way. Historically, the only way to learn how to make strategic decisions about AI projects was to participate in a graduate program or to gain experience working at a company. Machine Learning Yearning is there to help you quickly acquire this skill, which enables you to become better at building sophisticated AI systems.

Table of Contents

  • About the Author
  • Introduction
  • Concept 1: Iterate, iterate, iterate…
  • Concept 2: Use a single evaluation metrics
  • Concept 3: Error analysis is crucial
  • Concept 4: Define an optimal error rate
  • Concept 5: Work on problems that humans can do well
  • Concept 6: How to split your dataset
  • Summary

Andrew NG: Taken by the NVIDIA Corporation under the “CC BY-NC-ND 2.0” license. No changes have been made. (source)


About the Author

Andrew NG is a computer scientist, executive, investor, entrepreneur, and one of the leading experts in Artificial Intelligence. He is the former Vice President and Chief Scientist of Baidu, an adjunct professor at Stanford University, the creator of one of the most popular online courses for machine learning, the co-founder of and a former head of Google Brain. At Baidu, he was significantly involved in expanding their AI team into several thousand people.



The book starts with a little story. Imagine, you want to build the leading cat detector system as a company. You have already built a prototype, but unfortunately, your system’s performance is not that great. Your team comes up with several ideas on how to improve the system, but you are confused about which direction to follow. You could build the worlds leading cat detector platform or waste months of your time following the wrong direction.

This book is there to tell you how you can decide and prioritize in a situation like this. According to Andrew NG, most machine learning problems will leave clues about the most promising next steps and about what you should avoid doing. He goes on explaining that learning to “read” those clues is a crucial skill in our domain.

In a nutshell, ML Yearning is about giving you a deep understanding of setting the technical direction of machine learning projects.

Since your team members could react skeptically when you propose new ideas of doing things, he made the chapters very short (1–2 pages), so that your team members could read it in a few minutes to understand the idea behind the concepts. If you are interested in reading this book, note that it is not suited for complete beginners, since it requires basic familiarity with supervised learning and deep learning.

In this post, I will share six concepts of the book in my own language out of my understanding.


Concept 1: Iterate, iterate, iterate…

NG emphasizes throughout the book that it is crucial to iterate quickly since machine learning is an iterative process. Instead of thinking about how to build the perfect ML system for your problem, you should build a simple prototype as fast as you can. This is especially true if you are not an expert in the domain of the problem since it is hard to correctly guess the most promising direction.

You should build a first prototype in just a few days and then clues will pop up that show you the most promising direction to improve the performance of the prototype. In the next iteration, you will improve the system based on one of these clues and build the next version of it. You will do this again and again.

He goes on explaining that the faster you can iterate, the more progress you will make. Other concepts of the book, build upon this principle. Note that this is meant for people who just want to build an AI-based application and not do research in the field.


Concept 2: Use a single evaluation metrics

(image source)

This concept builds upon the previous one and the explanation about why you should choose a single-number evaluation metrics is very simple: It enables you to quickly evaluate your algorithm, and therefore you are able to iterate faster. Using multiple evaluation metrics simply makes it harder to compare algorithms.

Imagine you have two algorithms. The first has a precision of 94% and a recall of 89%. The second has a precision of 88% and a recall of 95%.

Here, no classifier is obviously superior if you didn’t choose a single evaluation metrics, so you would probably have to spend some time to figure it out. The problem is, that you lose a lot of time for this task at every iteration and that it adds up over the long run. You will try a lot of ideas about architecture, parameters, features, etc. If you are using a single-number evaluation metric (such as precision or the f1-score), it enables you to sort all your models according to their performance, and quickly decide which one is working best. Another way of improving the evaluation process would be to combine several metrics into a single one, for example, by averaging multiple error metrics.

Nevertheless, there will be ML problems that need to satisfy more than one metric, like for example: taking running time into consideration. NG explains that you should define an “acceptable” running time, which enables you to quickly sort out the algorithms that are too slow and compare the satisfying ones with each other based on your single-number evaluation metrics.

In short, a single-number evaluation metric enables you to quickly evaluate algorithms, and therefore, to iterate faster.


Concept 3: Error analysis is crucial

(image source)

Error analysis is the process of looking at examples where your algorithm’s output was incorrect. For example, imagine that your cat detector mistakes birds for cats, and you already have several ideas on how to solve that issue.

With a proper error analysis, you can estimate how much an idea for improvement would actually increase the system’s performance, without investing months of your time on implementing this idea and realizing that it wasn’t crucial to your system. This enables you to decide which idea is the best to spend your resources on. If you find out that only 9% of the misclassified images are birds, then it does not matter how much you improve your algorithm’s performance on bird images, because it won’t improve more than 9% of your errors.

Furthermore, it enables you to quickly judge several ideas for improvement in parallel. You just need to create a spreadsheet and fill it out while examining, for example, 100 misclassified dev set images. On the spreadsheet, you create a row for every misclassified image and columns for every idea that you have for improvement. Then you go through every misclassified image and mark with which idea the image would have been classified correctly.

Afterward, you know exactly that, for example, with idea-1 the system would have classified 40 % of the miss-classified images correctly, idea-2 12%, and idea-3 only 9%. Then you know that working on idea-1 is the most promising improvement that your team should work on.

Also, once you start looking through these examples, you will probably find new ideas on how to improve your algorithm.


Concept 4: Define an optimal error rate

The optimal error rate is helpful to guide your next steps. In statistics, it is also often called the Bayes error rate.

Imagine that you are building a speech-to-text system and you find out that 19% of the audio files you expect users to submit have such dominant background noise that even humans can’t recognize what was said in there. If that’s the case, you know that even the best system would probably have an error of around 19%. In contrast, if you work on a problem with an optimal error rate of nearly 0%, you can hope that your system should do just as well.

It also helps you to detect if you are algorithm is suffering from high bias or variance, which helps you to define the next steps to improve your algorithm.

But how do we know what the optimal error rate is? For tasks that humans are good at, you can compare your system’s performance to those of humans, which gives you an estimate of the optimal error rate. In other cases, it is often hard to define an optimal rate, which is the reasons why you should work on problems that humans can do well, which we will discuss at the next concept.


Concept 5: Work on problems that humans can do well

Throughout the book, he explains several times why it is recommended to work on machine learning problems that humans can do well themselves. Examples are Speech Recognition, Image Classification, Object Detection, and so on. This has several reasons.

First, it is easier to get or to create a labeled dataset, because it is straightforward for people to provide high accuracy labels for your learning algorithm if they can solve the problem by themselves.

Second, you can use human performance as the optimal error rate that you want to reach with your algorithm. NG explains that having defined a reasonable and achievable optimal error helps to accelerate the team’s progress. It also helps you to detect if your algorithm is suffering from high bias or variance.

Third, it enables you to do error analysis based on your human intuition. If you are building, for example, a speech recognition system and your model misclassifies its input, you can try to understand what information a human would be using to get the correct transcription, and use this to modify the learning algorithm accordingly. Although algorithms surpass humans at more and more of tasks that humans can’t do well themselves, you should try to avoid these problems.

To summarize, you should avoid these tasks because it makes it harder to obtain labels for your data, you can’t count on human intuition anymore, and it is hard to know what the optimal error rate is.


Concept 6: How to split your dataset

NG also proposes a way on how to split your dataset. He recommends the following:

Train Set: With it, you train your algorithm and nothing else.

Dev Set: This set is there to do hyperparameter tuning, to select and create proper features, and to do error analysis. It is basically there to make decisions about your algorithm.

Test Set: The test set is used to evaluate the performance of your system, but not to make decisions. It’s just there for evaluation, and nothing else.

The dev set and test set allow your team to quickly evaluate how well your algorithm is performing. Their purpose is to guide you to the most important changes that you should make to your system.

He recommends choosing the dev and test set so that they reflect data which you want to do well on in the future once your system is deployed. This is especially true if you expect that the data will be different than the data you are training it on right now. For example, you are training on normal camera images but later on your system will only receive pictures taken by phones because it is part of a mobile app. This can be the case if you don’t have access to enough mobile phone photos to train your system. Therefore, you should pick test set examples that reflect what you want to perform well on later in reality, rather than the data that you used for training.

Also, you should choose dev and test sets that come from the same distribution. Otherwise, there is a chance that your team will build something that does well on the dev set, only to find that it performs extremely poor on the test data, which you care about the most.



In this post, you’ve learned about 6 concepts of Machine Learning Yearning. You now know why it is important to iterate quickly, why you should use a single-number evaluation metrics, and what errors analysis is about and why it is crucial. Also, you’ve learned about the optimal error rate, why you should work on problems that humans can do well and how you should split your data. Furthermore, you learned that you should pick the dev and test set data so that they reflect the data which you want to do well on in the future, and that dev and test sets should come from the same distribution. I hope that this post gave you an introduction to some concepts of the book and I can definitely say that it is worth reading.


Bio: Niklas Donges is a Machine Learning and Data Science enthusiast, studying Software Engineering at CODE University of Applied Sciences in Berlin, with a deep focus on Machine Learning, and working part-time for the Machine Learning Foundation of SAP..

Original. Reposted with permission.