Demystifying AI: The prejudices of Artificial Intelligence (and human beings)

AI models are necessarily trained on historical data from the real-world--data that is generated from the daily goings on of society. If social-based biases are inherent in the training data, then will the AI predictions highlight these same biases? If so, what should we do (or not do) about making AI fair?

By Manjesh Gupta, Associate Manager - AI/Machine Learning at Virtusa.

Our present human society is a product of millions of years of biological evolution and thousands of years of social evolution. Everything has a history. We make beliefs about people or things based on our accumulated knowledge. In such a scenario, it is quite natural that some of our beliefs are prejudiced because, at times, we do not have enough information. Gordon Allport defines “prejudice” as a “feeling, favorable or unfavorable, toward a person or thing, prior to, or not based on, actual experience.” It is often said that prejudices exist and will continue to exist. The real question is whether we as individuals or a society are willing to change our prejudiced beliefs when presented with counter-evidence. In 1953, Albert Einstein wrote in an essay, “Few people are capable of expressing with equanimity opinions which differ from the prejudices of their social environment. Most people are even incapable of forming such opinions.

In a social setting, these prejudiced beliefs manifest themselves as attitude or behavior, favorable or unfavorable, towards an individual or a group, based on their sex, gender, social class, race, ethnicity, language, political affiliation, sexuality, religion, and other personal characteristics. In such cases, generally, the group identity of an individual or sub-group takes precedence over the individual identity. We know that we behave in a prejudiced manner (which may not even be necessarily wrong at times).

Do AI algorithms reproduce this human behavior?

Let us examine a few cases.

If you ask some of the natural language processing algorithms – “Man is to Computer Programmer as Woman is to ___________?” It may answer “Homemaker.” The word-embeddings used in such algorithms are known to reflect gender (and other biases) for quite some time now. This paper examined the “word2vec” embeddings to show the presence of gender stereotypes. The paper also suggests a method to neutralize the bias.

The Gender Shades project showed that Facial recognition systems from IBM, Microsoft, and Face++ are biased against women and “darker” subjects in terms of accuracy of recognition. These algorithms were, on average, around 15% less accurate for female and “darker” subjects. Recently, an algorithm designed to generate “high-definition” faces from pixelated images generated an output of a “white” person when a pixelated face of Barrack Obama was used as input.

In an extremely alarming use case of artificial intelligence, an algorithm that suggested the risk of a person to commit a crime again was found to be biased against “black” people. This algorithm was being used by judicial systems in the United States. Recently, researchers from Harrisburg University claim to have built a deep learning algorithm that can predict if a person is a criminal based solely on a picture of their face. This paper was supposed to be published by Springer Nature. Researchers and experts from various fields joined a petition to stop its publication, saying that predicting "criminality" is not an exact science. Later, Springer clarified that they had already rejected the paper before the petition started.

So, what causes AI algorithms to exhibit this behavior?

In a way, the results of these algorithms hold a mirror to human society. They reflect and perhaps even amplify the issues already present. We know that these algorithms need data to learn. Their predictions are only as good as the data they are trained on and the goal they are set to achieve.

The data needed to train these algorithms is huge (think millions and above). Suppose we are trying to develop an algorithm to identify cats and dogs from pictures. Not only do we need thousands of pictures of cats and dogs, but they should be labeled (say the cat is class 0 and dog is class 1) so that the algorithm can understand. We can download these images off the internet (the ethics of which is questionable), but still, they need to be labeled manually. Now, consider the complexity and effort required to correctly label a million images in one thousand classes. Often this labeling task is done by “cheap labor” who may or may not have the motivation to do it correctly, or they simply make mistakes.

Another problem in the data set is that of class imbalance. Let’s say we used ten thousand images of dogs and only one thousand images of cats to train our above algorithm. This may not be the actual proportion of the cat and dog populations in the real world, and so the class cat is underrepresented. These problems are generally referred to as “dataset bias.”

The second issue in AI algorithms is feature selection. They classify examples based on different features. Let us say we are making an algorithm to predict credit risk for incoming loan customers for a bank. We try to make an unbiased algorithm and not use features like race, gender, caste, ethnicity, etc., in the algorithm. However, we include ZIP code as a feature, which seems rather harmless. Many studies have shown that ZIP code can be used as a proxy for socioeconomic status. If we try to look carefully around us, we may find that certain ZIP codes have the majority of people of a particular class, color, ethnicity, caste, etc. In this case, even though we tried to develop an unbiased algorithm, we unintentionally introduced social bias by using ZIP code as a feature. At times, it may be difficult to identify which feature introduces unintentional social bias. Also, based on selected features, there may be a trade-off between algorithm accuracy and social bias introduced into the algorithm.

Third, human beings live in a time dimension. They do not necessarily stay the same their entire lives. A person born poor can become rich while another rich person may squander all their wealth. A convicted criminal may change to become a better person while another “role model” person may commit horrific crimes in the future. AI algorithms (perhaps, just like human beings) try to make sense of the future based upon the presumption that what happened in the past will somehow repeat. This may be true for some (or maybe a lot of) phenomena and people. However, it is certainly not true for all phenomena or all people. Sure it takes time for people and culture to change (for better or worse), but they never remain static.

These AI algorithms are not aware of the context, their predictions, or the consequences of their prediction. At present, they are mostly dependent on human beings for training data and what exactly to derive from that data. Here is a brief TED talk which explains how AI algorithms learn.

This social bias introduced in AI algorithms leads to loss of social and economic opportunity and dignity in many cases. Especially with the widespread use of such algorithms, it becomes critical to examine the problems with them. The debate on this topic often leads to much deeper questions. How do we define and create a “fair” and “balanced” dataset? How do we ensure that all people developing AI algorithms use fair and balanced datasets? When a human being makes an error in judgment or action, they can be held accountable for it (or maybe not, or maybe it depends on how rich they are). Can we make an AI algorithm accountable for its errors, and how? What problems should not be solved using AI and should better be left to human judgment for now? These questions, among others, are tackled in the field of study called Ethics of Artificial Intelligence.

Personally, I believe that it is impossible to completely eliminate our prejudices, but as human beings, we should reflect deeply on how we can minimize them in ourselves and our creations.

[* The words "bias," "social bias," and "prejudice" in this article are used in a social sense and should not be confused with the mathematical "bias" of a machine learning algorithm.]

Original. Reposted with permission.


Bio: Manjesh Gupta has nine years of experience across Artificial Intelligence, Machine Learning, Data Analytics, Research, IT Project Management, Stakeholder Management, Change Management, and Capacity Building, with technical ML/DL skills in NLP, Risk Analytics, Computer Vision, and Time Series Forecasting.