What If the Data Tells You to Be Racist? When Algorithms Explicitly Penalize

Without the right precautions, machine learning — the technology that drives risk-assessment in law enforcement, as well as hiring and loan decisions — explicitly penalizes underprivileged groups.


Original published in The San Francisco Chronicle (the cover article of Sunday's "Insight" section)

What if the data tells you to be racist? Without the right precautions, machine learning — the technology that drives risk-assessment in law enforcement, as well as hiring and loan decisions — explicitly penalizes underprivileged groups. Left to its own devices, the algorithm will count a black defendant’s race as a strike against them. Yet some data scientists are calling to turn off the safeguards and unleash computerized prejudice, signaling an emerging threat that supersedes the well-known concerns about inadvertent machine bias.

Imagine sitting across from a person being evaluated for a job, a loan, or even parole. When asked how the decision process works, you inform them, “For one thing, our algorithm penalized your score by seven points because you’re black.” 

We’re already heading in that discriminatory direction — and this is all the more foreboding since Gov. Jerry Brown signed Senate Bill 10 into law last month. This new law mandates a heavier reliance on algorithmic decisions for criminal defendants. In the meantime, distinguished experts are now campaigning for discriminatory algorithms in law enforcement and beyond. They argue that computers should be authorized to make life-altering decisions based directly on race and other protected classes. This would mean that computers could explicitly penalize black defendants for being black.

In most cases, data scientists intentionally design algorithms to be blind to protected classes such as race, religion and gender. They implement this safeguard by prohibiting predictive models — which are the formulas that render momentous decisions such as pretrial release determinations — from considering such factors. But discriminatory practices threaten to infiltrate algorithmic decision-making.

I use “discriminatory” for decisions about individuals that are based in part on a protected class. For example, profiling by race or religion in order to determine police searches or extra airport security screening would be discriminatory. An exception would be when decisions are intended to benefit a protected group, such as for affirmative action, or when determining whether one qualifies for a grant given to members of a minority group.

Law enforcement is using predictive models more widely. Senate Bill 10 completely eliminates cash bail and mandates that pretrial release decisions instead rest more heavily on predictive models generated automatically by machine learning. Several other states have also made moves in this direction.

Will such crime-risk models steer clear of discrimination? Although they usually avert discriminatory decisions by excluding protected classes from their inputs, there’s no guarantee they’ll stay that way.

Without due precautions, machine learning’s decisions meet the very definition of inequality. For example, for informing pretrial release, parole and sentencing decisions, the model calculates the probability (risk) of future criminal convictions. If the data links race to convictions — showing that black defendants have more than white defendants — then the resulting model will penalize the score for each black defendant, just for being black, unless race has been intentionally excluded from the model. There couldn’t be a more blatant case of criminalizing blackness.


Civil rights has much further to go

Discriminatory decision-making by humans is pervasive, paving the way for discriminatory machine learning. Take, for example:

Screening Muslims. While the Trump administration has not attempted to implement a ban based explicitly on religion, many U.S. citizens voted for a president who ran on a campaign pledge to ban Muslims. 

Transgender individuals banned from the military. 

The lack of female players in certain big league sports indicates an intentional decision based on gender.

Hiring decisions. Résumés with “white sounding names” receive 50 percent more responses than those with “African American sounding names.” 

Housing decisions. Airbnb applications from guests with “distinctively African American names are 16 percent less likely to be accepted relative to identical guests with distinctively white names,” according to Harvard University researchers.

Racial profiling by law enforcement. Until the 1970s, the risk of future crime was estimated based largely on an individual’s race and national heritage. Although this has lessened, profiling by race and religion remains in fashion. “20 states do not explicitly prohibit racial profiling,” according to the NAACP, and U.S. Department of Justice policy allows federal agents to racially profile within the vicinity of the U.S. border

Polls show 75 percent of Americans support increased airport security checks based in part on ethnicity and 25 percent of Americans support the use of racial profiling by police. 
Discriminatory practices also threaten to infiltrate algorithms. A recent paper co-written by Stanford University Assistant Professor Sharad Goel — who holds appointments in two engineering departments as well as the sociology department — criticizes the standard that predictive models not be discriminatory. The paper recommends discriminatory decision-making “when… protected traits add predictive value.”

In a related lecture, the Stanford professor said, “We can pretend like we don’t have the information, but it’s there. … It’s actually good to include race in your algorithm.”

University of Pennsylvania criminology Professor Richard Berk — who has been commissioned by parole departments in Pennsylvania to build predictive models — also calls for discriminatory models. In a 2009 paper describing the application of machine learning to predict which convicts will kill or be killed while on probation or parole, he advocates for race-based prediction. “One can employ the best model, which for these data happens to include race as a predictor. This is the most technically defensible position.”


If the data tells you to be racist? 

Data is power. It fuels machine learning and, generally, the more you have, the better its predictions. Data scientists see it time and time again: Introducing any new demographic or behavioral data will potentially improve your predictive model. In this way, some data sets may compel discrimination. It’s the ultimate rationale for prejudice. The data seems to tell you, “Be racist.”

But “obeying” the data and making discriminatory decisions violates the most essential notions of fairness and civil rights. Even if it is true that my group commits more crime, it would violate my rights to be held accountable for the others, to have my classification count against me. We must not penalize people for their identity.

Discriminatory computers wreak more havoc than humans manually implementing discriminatory policies. Once it is computerized — that is, once it’s crystallized as an algorithm — a discriminatory decision process executes automatically, coldly and on a more significant scale, affecting greater numbers of people. Formalized and deployed mechanically, it takes on a concrete, accepted status. It becomes the system. More than any human, the computer is “the Man.”

So get more data. Just as we human decision-makers would strive to see as much beyond race as we can about a job candidate or criminal suspect, making an analogous effort — on a larger scale — to widen the database will enable our computer to transcend discrimination as well. Resistance to investing in this effort would reveal a willingness to compromise this nation’s freedoms, the very freedoms we were trying to protect with policies and law enforcement in the first place.

Eric SiegelEric Siegel, Ph.D., founder of the Predictive Analytics World and Deep Learning World conference series and executive editor of The Predictive Analytics Times, makes the how and why of predictive analytics (aka machine learning) understandable and captivating. He is the author of the award-winning Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, a former Columbia University professor, and a renowned speakereducator, and leader in the field. Read his other articles on data and social justice and follow him at @predictanalytic.