Bayes Theorem for Computer Scientists, Explained

Data science is vain without the solid understanding of probability and statistics. Learn the basic concepts of probability, including law of total probability, relevant theorem and Bayes’ theorem, along with their computer science applications.

By Clay McLeod.

Few topics have given me as much trouble as Bayes’ theorem over the past couple of years.

I graduated with an undergraduate degree in EE (where calculus reins supreme) and was thrown into probability theory late into my MS coursework. Usually if I stare at a formula long enough, I can understand what’s going on – despite being much lower level math than what I did in EE, I just couldn’t seem to get my head around probability theory. This was especially with Bayes’ theorem. I tried many times and could never really get the idea.

I’m a visual learner, and most concepts in probability theory are expressed in a multitude of different notations and forms. Not only that, but you have to keep track of many different variables that are sometimes so close that they are hard to differentiate between. For instance, is the numerator the probability of A given B or the probability of A and B? What’s the difference between those? Sure if I sat down and thought about it for a while it would become clear — then I would sleep for a night and the concept would become more opaque again.

This article aims to clear up some foundational concepts in probability (and, briefly, how they apply to computer science) as quickly as possible.

Probability Theory

  • What? Probability theory is a branch of mathematics concerned with random processes (also known as stochastic processes).
  • Why? Most phenomena that has yet to happen in the real world can be expressed as probability distributions. Therefore, probability theory can be useful in almost any scenario where we would like to predict something.
  • How? The union of probability theory and computer science is a field called probabilistic programming. Through probabilistic programming techniques, we can estimate, within a reasonable doubt, the probability that something happens.

Relevant Theorems

Disclaimer: Statistical junkies would declare this article amiss if I didn’t mention this — all theorems listed here assume that all events are independent and mutually exclusive. All this means is that each event doesn’t affect the probability that the other happens and that each event can’t have more than one outcome (the vast majority of interesting problems fall underneath that definition).

I’ll introduce a problem to help me illustrate my points better.

Assume you have a room full of men and women. 70% of the people are women and 30% are men. Additionally, we know from polling every person that 40% of the women’s favorite color is green and 75% of the men’s favorite color is green.

Law of total probability

With the law of total probability, we can answer the question “What % people in the room said that their favorite color is green?” Let’s draw this problem in the form of a picture.


Let’s forget about probability theorems for a second. From this picture, how would you figure out how many people said that green was their favorite color? Simple — we can say there is an arbitrary number of people in the room, find out how many men and women there are (based on the percentages given), find out how many how many of each sex chose green as their favorite color (based on the percentages given), and add that amount of people together.

Assume there are 100 people in the room (so 70 women and 30 men). The amount of women that chose green as their favorite color is calculated by equation 70*0.4=28 people. Similarly, we can calculate the number of men that liked green as 30*0.75=22.5 people. Adding these together, we get 28+22.5=50.5, or 50.5% of the total amount of people in the room chose green as their favorite color.

This, in essence, is the law of total probability. The usefulness of the law of total probability is now obvious: Originally, we didn’t know what the overall probability of a person having green as a favorite color (event B) was. BUT, we did know the probability that a person was male or female (event A) and we also knew the probability of B for each value of A (favorite color percentage of males and females). Thus, we can learn something about the probability of B for any human by adding together the probabilities of each outcome of A if we know that answer for every possible outcome of A.

Formally, the equation for the law of total probability is:


in this case