Weapons of Math Destruction, Ethical Matrix, Nate Silver and more Highlights from the Data Science Leaders Summit
Domino Data Lab hosted its first ever Data Science Leaders Summit at the lovely Yerba Buena Center for the Arts in San Francisco on May 30-31, 2018. Cathy O'Neil, Nate Silver, Cassie Kozyrkov and Eric Colson were some of the speakers at this event.
Domino Data Lab hosted its first ever Data Science Leaders Summit at the lovely Yerba Buena Center for the Arts in San Francisco on May 30-31, 2018. It was a well-attended event that attracted a very interesting mix of speakers and audience. Presenting some selected highlights from this Conference.
Keynote: Cathy O’Neil
Cathy O’Neil, the author of the best-selling book Weapons of Math Destruction presented the keynote, the content of which was her joint work with Hanna Gunn, a philosopher with the U of Connecticut. It is based around the Ethical Matrix – introduced by philosopher Ben Mepham for the purposes of bioethics.
Cathy has generalized this construction, not to address ethical questions per se, but to help us to answer the broader question: Does your algorithm work? It is not enough if it works for the owner or the designer of the algorithm. To dig deeper is to surface the question: what does it mean for an algorithm to work? It is a way of making a ranked list of things to worry about.
The problems with algorithms are often individual problems, silently failing people that they don’t even know about. She started off with Weapons of Math Destruction (WMDs).
WMDs: What they are
WMDs are algorithms that decide important criteria to reject a certain category of people. They are secret – there is no appeal system and those rejected are not even aware of the existence of these all-powerful algorithms. WMDs are unfair, make mistakes and are destructive for the society. They encode “human prejudice, misunderstanding, and bias into software systems that increasingly manage our lives.”
WMDs: Where they are
WMDs are pretty much everywhere a human being interacts with the bureaucracy. They seem perfect and sanitary, innocuously deciding who is getting into high school/college, who gets a job or keeps one, who gets a loan or who is allowed to buy insurance. They are used in policing and the court system. And in health and human services. These algorithms are not well thought out and are used to replace the difficult questions.
WMDs: Why they matter
WMDs matter because of the frightfully vital role they play. They don’t work. They undermine their own goals – e.g. a teacher assessment score that was almost as bad as a random number generator; it was weeding out the good ones and so instead of decreasing the achievement gap, it was increasing the achievement gap, which was the opposite of its original goal. They undermine science and erode people’s trust in science. Thus, they decrease accountability. Nobody, in particular, is in charge when they make a mistake. Perhaps, mulls Cathy, it is the primary goal to take away that accountability. These algorithms are everywhere and constantly chip away at the equality and threaten democracy. In politics, the information we hear is specifically tailored and we no longer are informed citizens of a well-functioning democracy.
WMDs: What to do
We have to put the science into data science with rigorous mathematical oversight. Cathy calls this the era of plausible deniability in Big Data – a lot of people don’t want to acknowledge the flaws in their algorithms under the garb of proprietary code. And, as they hide the problems, other people suffer. We need to focus on causality, as essentially we are creating the past by training the algorithms on past data loaded with historical biases. For instance, if we trained a hiring algorithm on twenty-two years of Fox News data, it would predict that a qualified woman is going to fail. Because they were systemically harassed and treated badly in Fox News. There really is, no company that can claim to be perfect. Big Data does not remove the need for causality.
Cathy then dived into three case studies on how the ethical matrix works. An ethical matrix is an artifact of conversation about ethics – or, about stakeholders and concerns.
Case Study 1: Credit
How do we make credit decisions fair? You have data about people who want to borrow money – you just know their FICO scores and race. The data quality, to begin with, is bad for non-white borrowers.
The graphs suggest that the default behavior changes with race. Asians default less at lower FICO scores (from the left-side graph). From the right-side graph: percentage wise, there are more blacks who have lower FICO scores.
Question: How does one decide a fair lending decision based on this data? It’s not clear what the answer should be. Regulators, who ought to be making these rules, are busy with other stuff.
The authors of this paper (which Cathy referenced) propose five different choices. In the first graph, the focus is on profit maximization and this makes it a much higher bar for Blacks to get a loan. Thus, it would seem to be unfair if a lending company sets higher standards for Blacks, especially when the historical data could be wrong and biased against them. In the second graph, if the threshold is the same for everyone, then very few Blacks would qualify for a loan.
And finally, this graph above is a function of profitability as a function of the false positive/negative trade-off. The stricter the fairness (based on Demography), the lower the profits. Imagine what businesses would do in a competitive scene.
An ethical matrix of this would represent evaluating the Company, Customers, Minority Customers, White Customers along the Y-axis versus Profit, Fairness, False Positives, False Negatives and Data Quality. Though this is an artifact from the paper, it may not faithfully match the paper and its findings.
Red means – it's something to worry about – e.g. the Data Quality for Customers and Minority Customers is not working. Concerns (in red) may be positive too – “we want this to be fair”. Yellow is something bad may happen. The False negatives are higher for minority Customers – which means they may not get the loan.
As a ‘map of conversations’, Cathy’s own ethical matrix would go deeper: What if the FICO scores were unfair to begin with? Does accuracy imply fairness? When can we be held responsible for our data? Why are there no alternate credit scores, no appeals system?
It’s to be noted that the Ethical Matrix is not a solution. It helps to frame the problems better and highlights the limitations of the conversations. It makes the infinite problem space finite and the associated technical problems solvable.
Case Study 2: Recidivism
Recidivism risk algorithms are being used by judges to sentence people to set bail – higher risk means more likely to be arrested within two years of being released from prison. ProPublica audited such a model called COMPAS in Florida and found it to be racist. ProPublica’s contention is that a false positive is a civil rights violation. Cathy has covered this class of models in her book and in her blog posts here, here and finally here.
The Ethical Matrix for this use case would be as above. But, from the builder of the algorithm - Northpointe’s view, the Ethical Matrix would be that everything is fair and fine (see below).
Cathy’s expanded ethical matrix would represent the maximum concern for Data Quality and Fairness and False Positives. It is important to note that False Positives are not symmetrical to False Negatives. False Positives is getting imprisoned longer for something you are not going to do. False Negatives is that you get out earlier and you are a little more criminal and more likely to be arrested. The target variable is arrest within two years. And, this is not for violent crimes. The model is what it is because you could be training your model on addicts. Or, because of training the model on those with mental-health issues. So, think about that – it is not even about crime!
Cathy went through a third case study about Child Abuse.
Thus, the Ethical Matrix would allow the conversation around the moral question to occur and then the data science follows. Ideally, the conversation should occur before the algorithm is even built. Cathy then concluded by taking a couple of audience questions.
Jacob Grotta, Moody’s Analytics
Data Science in the Banking World
Jacob Grotta works with the Risk and Finance Analytics team at Moody’s Analytics. The team helps customers with credit questions – around risk and credit-worthiness. Moody’s has about 500 quants working for it.
Jacob walked through some of the historical contexts around credit-rating in the United States and settled on a graph about banks.
Two broad conclusions from these graphs: (a) Banks are highly leveraged and, (b) They are not very profitable.
Banks tend to have a large inventory of algorithmic models, perhaps in thousands, which degrade continuously over time. Such a model is at its peak just when released. Banks deal with data in large volumes and so the statistical significance of the variations is large. Banks are extremely risk-averse.
The way most institutions have handled this is called the risk management appetite. Throughout such an organization, there is now an expectation that everyone understands what a model does, what the risks are, what risks can be ignored and so on.
The triple line of defense in the figure (on the right) is like the one that the US has. In the first line of defense, is the modeler who is encouraged to be honest about what can you actually predict, what you cannot and what is the actual question asked. In the second line of defense, verification and validation rigor is enforced. Internal audit forms the third line of defense – having proper information flow and a right governance process around it is critical.
Federal guidelines around model risk management (from the OCC and FDIC) provide a framework around how validation is done. Do we know why a model is designed the way it is designed? In Jacob’s team there are about fifty models that they manage, regularly calibrate and tune. The typical cycles though are long in finance.
What the Moody’s team has learned, is to begin the model validation as early as possible. In 1998, when it was early in the game, Moody’s had unique intellectual property in its models. In 2008, Moody’s Analytics had unique intellectual property in the data. Today, in 2018, Moody’s has unique intellectual property around its process and its experienced resources, what the team has gained as part of its domain expertise.
Jacob concluded that modeling is an art – auditable, repeatable and transparent. A good model scores high in all three.
Next, we report on keynotes by Nate Silver, Nick Elprin - CEO of Domino Data Lab, and more.