The Foundations of Algorithmic Bias

We might hope that algorithmic decision making would be free of biases. But increasingly, the public is starting to realize that machine learning systems can exhibit these same biases and more. In this post, we look at precisely how that happens.

Already, you might see a problem. Who gets to decide which emails are spam and not?  What biases may factor into these decisions? If the labelers think all emails from Nigeria constitute spam, how can we justify using a system that will treat millions of people unfairly?

Once we’ve got a dataset, we can specify a flexible family of statistical models for mapping between an email and a probability that it is spam. A simple model might be to assign a score (weight) to every word in the vocabulary. If that weight is positive, then it increases the probability that the email is spam. If negative it decreases the probability.

machine learning diagramTo calculate the final score, we might sum up the counts of each word, multiplying each by the corresponding weight. This describes the family of linear models, a classical technique from statistics. Machine learning practitioners also have many, more complicated, models at their disposal, including tremendously popular neural networks (a family of techniques now referred to as deep learning). In our present discussion, the exact form of the model doesn’t matter much.

When we say the machine learns, we simply mean that as it sees more and more data, it updates its belief, as by tuning the weights, about which model in the family is best. So rather than building a spam filter with a bank of rules such as “IF contains(“Western Union”) THEN SPAM”, we’d curate a large list of thousands or millions of emails, indicating for each whether or not it’s spam. These labels are often collected actively, as by crowdsourcing low-wage workers through services like Amazon’s mechanical turk. Labels can also be collected passively, as by harvesting information when users explicitly mark emails as spam or remove emails from their spam boxes to their inboxes.

For all supervised machine learning models, the big picture remains the same. We have a collection of examples of (hopefully representative) data. We also have a collection of corresponding labels collected either actively or passively (typically from human annotators). These reflect an (often subjective) choice over what constitutes the ground truth.  Stepping back, we’ve also made a subjective choice regarding what’s worth predicting in the first place. For example, do we ask our annotators to label spam, or offensive content or uninteresting content?

And sometimes, machine learning practitioners formulate problems in such a way that the very notion of ground truth seems questionable. In many applications, researchers classify sentences or documents according to one of several sentiments. Other papers break down emotional classification into two dimensions: an arousal score and a valence scoreWhether these simplistic scores can capture anything reasonably related to the ideas indicated by emotion or sentiment seems debatable.


We can now begin to demystify the processes by which undesirable biases can infiltrate machine learning models.


Perhaps the most obvious way that a machine learning algorithm can become compromised is if the underlying data itself reflects biases.

Consider, for example, a model predicting risk of recidivism. The training examples here would consist of past prisoners’ records. The corresponding labels would be binary values (1 if they were convicted of another crime, 0 if not). However, these labels themselves can reflect profound biases. For example, an individual is only convicted of a crime if they are first caught and arrested. But arrest rates reflect well-document racial biases.  Thus, black men, in addition to facing a higher probability of incarceration in the first place, could see their misfortune compounded through use of the recidivism predictor.

You might hope that we could get around this problem by withholding sensitive demographic information from the machine learning algorithm. If the model didn’t know who was black and who is white, how could it learn to discriminate between the two?

Unfortunately, it’s not so simple. Given a rich enough set of features and a rich enough family of models, the machine algorithm deduce race implicitly, and use this information to predict recidivism. For example, zip code, occupation, even the previous crime committed could each leak clues as to the race of the inmate.

Acting upon the biased model’s predictions to make parole decisions could in turn perpetuate the cycle of incarceration. The lesson here is that if the purported underlying data is intrinsically biased, we should expect that the machine learning algorithm will produced commensurately biased models.

Another example of machine learning absorbing the biases in training data recently came to attention as researchers at Boston University and Microsoft Research led by Tolga Bolukbasi examined a technique called word embedding [5].

Word embedding is a technique in which each word in a vocabulary is a assigned to a vector. The main idea is that the meaning of each word can be captured by the angle of the vector. These vectors can be used to represent the word when used as input to a machine learning algorithm.

Researchers made waves in 2013 by showing a technique for learning these vectors by choosing the vectors which best predict the neighboring words in a large corpus of data. Serendipitously, the researchers discovered that these representations admitted some remarkable properties. Among them, the vectors could be used in straight-forward ways to execute analogical reasoning. One now-famous example showed that in this vector space <China> – <Beijing> roughly equals <Russia> – <Moscow>.

Similar examples showed that <king> – <queen> roughly equalled <prince> – <princess>. And some preliminary work showed that these embeddings were sufficiently useful to perform human-level analogical reasoning on standardized tests like the SAT.

In the last three years word embeddings have become a nearly ubiquitous tool at machine learning and natural language processing labs throughout academia and industry. But Tolga Bolukbasi and colleagues showed that in addition to picking up on meaningful semantic relationships, the word embeddings also picked absorbed common biases reflected in the underlying text corpuses.

In one example, they showed that learned embeddings also coded for man − woman ≈ computer programmer – homemaker. Similarly, in the learned embedding space, the occupations closest to “she” were 1. homemaker 2. nurse 3. receptionist 4. librarian 5. socialite 6. hairdresser.

In contrast, the occupations closes to “he” included 1. maestro 2. skipper 3. protege 4. philosopher 5. captain 6. architect.

Bolukbasi and colleagues proposed a method for identifying the subspace of learned embeddings corresponding to gender and correcting for it. However, we should note that this doesn’t correct for any of the myriad other potential biases that might lurk within the word embeddings.

The same might be said of humans. We call attention to certain biases, emphasizing them, testing for their existence, and correcting them as best we can. But only by identifying the problem and proposing a test for it can we address it. It’s hard to guess what prejudices might influence human decision-making that we’ve never thought to examine.


Even without absorbing explicit biases from datasets, machine learning could produce biased classifications and decisions as a result because the data is implicitly biased by virtue of who is represented and who is omitted.

As glaring example, Google last year added a state-of-the-art objection detection algorithm to its photo app. The algorithm annotated photos with descriptions of the objects they contained such as “skyscrapers”, “airplanes”, “cars”.

However, things went horribly wrong when the app tagged a picture of a black couple as “gorillas”.

app tagged a picture of a black couple as gorillas

There a few things to keep in mind here. First, the classifier was likely trained on the academic 1-million image benchmark dataset ImageNet [6], for which the misclassification rate per 2014 state of the art is 7.4%. That means, for any large population uploading photos, a considerable number will be misclassified.

However, this noted, it’s not hard to imagine that being black had something to do with it. To see why, consider the construction of the ImageNet dataset. An academic benchmark, imagenet was built to provide a testbed for advancing computer vision. The dataset contains 1 million images consisting of 1,000 images each from 1,000 object classes. Roughly half of these images depict organisms like humans, birds, and gorillas, while the other half depict artificial objects like airplanes and skyscrapers.

Out of curiosity, I thumbed through the ImageNet explorer, selecting for images of humans and passing over the first 500 by eye.  Out of 500 randomly selected images of humans, only 2 depicted black people. These two consisted of one image of Gary Coleman, and another of a black man dressed in drag.

A machine trained on these images might never have seen a typical black man and thus wouldn’t know upon seeing one whether to categorize based on color or physiology. Now ImageNet was built by well-meaning academics. It seems exceedingly unlikely that the creators of the dataset intended for models trained on it to misbehave in this fashion.

Released in 2009 by Dr. Fei Fei Li and colleagues, the dataset was inspired that humans see many images per second while forming their ability to recognize objects, and that a computer might need access to a similarly rich dataset.

To my knowledge, the dataset doesn’t encode any explicit human bias. There are no images of black men and women mislabeled as gorillas. However, it might alarm us that the absence of blacks from the ImageNet dataset parallels their lack of representation in computer science generally.

While the Google app incident might be isolated and somewhat benign, it’s not hard to imagine how this problem could metastasize. Consider, for example, a security system based on face recognition that only allowed employees to enter a building when it was at least 99% sure of they were correctly ID’d and called security otherwise. If a minority group were missing from training datasets used to train the face recognition algorithm, it might throw alarms disproportionately when these citizens went to work detaining them with greater frequency.

This point is important, even absent racist researchers, corporations or customers, and with algorithms which do not express any intrinsic preferences, absent critical thought we might accidentally birth a system that systematically racially profiles.

On the other hand, we might find this realization empowering because it provides straight-forward prescriptions for how to detect and avoid some kinds of unintentional bias.

Contrary to the Guardian’s John Naughton’s suggestion that our desire to scrutinize algorithms is stymied by their impenetrable, black-box nature, this particular kind of error can be found simply by examining the training data, a task that surely doesn’t require a PhD in machine learning.