Six Ethical Quandaries of Predictive Policing

When predictive machine learning models are applied to real-life scenarios, especially those that directly impact humans, such as cancer detection and other medical-related applications, the risks involved with incorrect predictions carry very high stakes. These risks are also prominent in how machine learning is applied in law enforcement, and serious ethical questions must be considered.

By Eric Siegel, Machine Learning Week on November 6, 2020 in Bias, Crime, Ethics, Police, Predictive Analytics

comments

Nowhere could the application of machine learning prove more important -- nor more risky -- than in law enforcement and national security. In this article, I'll review this area and then cover six perplexing and pressing ethical quandaries that arise.

Predictive policing introduces a scientific element to law enforcement decisions, such as whether to investigate or detain, how long to sentence, and whether to parole. In making such decisions, judges and officers take into consideration the probability a suspect or defendant will be convicted for a crime in the future -- which is commonly the dependent variable for a predictive policing model.

Such predictive models input the defendant's demographic and behavioral factors. These independent variables may include prior convictions, income level, employment status, family background, neighborhood, education level, and the behavior of family and friends.

Law enforcement, by its nature, will always face the impossible challenge of optimizing the deployment of limited resources such as patrolling officers and investigating agents. Predictive scores prioritize -- they essentially serve to "shrink the haystack" into much smaller regions that are still likely to contain most of the needles being searched for.

But with this in place, a convict or suspect's fate now rests in non-human hands. Machine learning determines how long people spend in jail. Now, policing models only serve for decision support rather than decision automation, but that means the scores will, on many occasions, be a deciding factor. Certain cases will sway based on the predictive model, no question about it.

So computers can now commit more than just prediction errors -- they can commit an error previously reserved for humans: injustice. When a criminal who would not re-offend is kept in prison because of an incorrect prediction, we will never even have the luxury of knowing. You can prove innocent a legitimate transaction wrongly flagged as fraudulent, but we don't get to see whether an incarcerated person would have walked the straight and narrow outside of prison.

The same value proposition and the same risks pertain to national security and intelligence applications. Agencies in the U.S. such as the NSA and FBI assign terrorism risk scores to possible suspects. After all, the actual "intelligence" of an intelligence organization hinges largely on its predictive models -- on the patterns its systems search for, scanning millions of cases to find suspects. You know, like the system spots, among many phone calls, one that's potentially nefarious and triggers an alert.

The serious, impactful downside to being wrongly accused -- that is, of false positives, is only one of the perplexing, pressing quandaries that arise with applications of machine learning for law enforcement -- let's go through six more such quandaries.

1) How do you assign costs to false negatives and false positives?

For sentencing, a false positive can mean a convict stays in prison, even though they will not offend again. And a false negative means someone is allowed to go free, even though they will commit a crime again. How do these two costs compare?

For identifying suspects, a false positive means someone is investigated or even detained undeservedly, and a false negative means a perpetrator gets away with it. If the first is not as bad, and if the cost we assign to what may be "harassment" from the legal system should be lower than the cost we assign to failing to catch a criminal, then exactly how much lower should it be?

In the end, this impossible question of assigning specific, relative costs has to be answered. It defines what the system will optimize for. You can't get around answering it -- the math will ultimately default to some cost balance if you don't specify it explicitly.

This question is fundamental to the whole operation. Errors, be they false positives or false negatives, will prevail, so in the end, we can only influence the balance between the two -- we can't eliminate them. Predictive policing is not one of those unusual application areas where predictive models achieve high performance or even "accuracy," like when deep learning classifies photos of cats and dogs. There's no amazing premonition like in that Spielberg sci-fi movie, "Minority Report.

For example, a crime prediction model launched by the U.S. state Oregon, consulted by judges when sentencing convicted felons, identifies the most risky 20%, those most likely to be convicted again upon release. But within that segment, only half of them actually will. That's higher than average, but that means that if we flag that group, half will be false positives.

2) Is it okay to judge people based on only several factors?

A crime risk model pares a prior offender down to an extremely limited view captured by a small number of independent variables. Is this not the definition of dehumanization? And does it defy the standard of judging a person as an individual, since we're essentially judging the person based on the past behavior of others who are a relatively close match based on those few variables? Or, on the other hand, isn't the prediction by a human judge of one's future crimes also intrinsically based only on prior observations of others, since humans learn from experience as well? Is this all within the acceptable realm of the compromises to civil liberties that convicts endure above and beyond incarceration, especially if supporting decisions with predictive models promises to lower the overall crime rate -- as well as lowering the incidents of unnecessary incarceration?

3) Will machine predictions command undue authority and detract from human insight?

Might judges view machine learning as intrinsically authoritative science and defer to predictive scores as a kind of crutch -- basically, trusting them too much and resulting in them applying less of their keen skill and expertise, a reduction in earnest observation and expert consideration?

4) Shouldn't this power help rehabilitate rather than only punish?

With these efforts underway, should not at least as much effort go into leveraging machine learning to improve offender rehabilitation, for example, by targeting those with predictive as most likely to benefit from special programs. In one rare, groundbreaking initiative, the Florida Department of Juvenile Justice has actually done just that.

5) Does improved suspect discovery justify the bulk collection of civilian data, aka mass surveillance?

Intelligence agencies need data about great portions of the general public to provide negative examples in the training data -- that is to serve as a baseline for normal, innocent behavior.

But the American Civil Liberties Union calls bulk data collection "mass, suspicionless surveillance," and the whistleblower Edward Snowden, who leaked classified information that revealed new details about the NSA's bulk data collection, is considered by many a national hero.

You can explore this question in greater detail with the article “The real reason the NSA wants your data: automatic suspect discovery,” in which I explore how the NSA uses machine learning for automatic suspect discovery, the kinds of insights they may be finding in data, and an exploration of this controversy and how better to debate it.

6) Does predictive policing perpetuate human biases and magnify existing inequality?

In the U.S., the data analyzed to develop crime risk models include proportionately more prosecutions of certain minority group members, such as black individuals. Minority group members discriminated against by law enforcement, such as by way of profiling, are proportionately more likely to show a prior criminal record since they may be screened more often, which artificially inflates the minority group’s incidence of criminal records. This means more errors in the training data, such as proportionately more white individuals who've committed undetected crimes for which there is no record. The dependent variable is wrong. This is called a lack of ground truth. As a result, a crime risk model doesn’t predict crime per se; it predicts convictions. You don’t know what you don’t know.

Columbia law and political science professor Bernard Harcourt, author of "Against Prediction: Profiling, Policing, and Punishing in an Actuarial Age," points out that by factoring in prior offenses to predict future crimes, "you just inscribe the racial discrimination you have today into the future." It’s a cyclic magnification of prejudice’s already self-fulfilling prophecy.

And finally, there's also another source of bias. Even if you had a dataset that achieved ground truth, and even if you ensured your model was not discriminatory -- as we defined and explored in other course videos (or see the article version), by not allowing a protected class as an independent variable -- you'd still see an inequitable difference in false positive rates between groups. This form of bias is the most commonly discussed criticism of predictive policing, which I covered in these three videos (or see the article version).

Related: