Machine Learning Meets Humans – Insights from HUML 2016

Report from an important IEEE workshop on Human Use of Machine Learning, covering trust, responsibility, the value of explanation, safety of machine learning, discrimination in human vs. machine decision making, and more.

Reposted with permission from

Last Friday, the University of Ca’ Foscari in Venice organized an IEEE workshop on the Human Use of Machine Learning (HUML 2016). The workshop, held at the European Centre for Living Technology, hosted roughly 30 participants and broadly addressed the social impacts and ethical problems stemming from the wide-spread use of machine learning.

HUML joins a growing number workshops for critical voices in the ML community. These include Fairness, Accountability and Transparency in Machine Learning (FAT-ML), the #Data4Good at ICML 2016, and Human Interpretability of Machine Learning (WHI), held this year at ICML and Interpretable ML for Complex Systems, held this year at NIPS. Among this company, HUML was notable especially notable for diversity of perspectives. While FAT-ML, DS4Good and WHI featured presentations primarily by members of the machine learning community, HUML brought together scholars from philosophy of science, law, predictive policing, and  machine learning.

The event consisted of one day of talks with occasional breaks for discussion over coffee, lunch, and dinner.  Thanks to an invitation from the organizers facilitated by Professor Fabio Roli, I had the opportunity to attend and speak. In my talk, I presented The Mythos of Model Interpretability, a position piece tackling the epistemological problems that frustrate both research and public discourse on interpretable models.

Since this blog exists precisely to address the intersection of technical and social perspectives on machine learning, I was happy to learn of this summit of like-minded researchers.  In light of the tight overlap between the workshop’s objectives and this blog’s mission, I’ve created this post to share my notes on the encounter.

I’ll step through each of the 40-minute invited talks, sharing the high-level points and my personal take on each. Several themes repeat throughout. For example, one recurring issue was the tension between complex ethical issues and simple formalisms we propose to address them. Theoretical definitions of privacy, fairness and discrimination can poorly approximate the real-world meaning given to these words. These shortcomings can be hard (sometimes impossible?) to capture by looking at the mathematics alone but obvious when considering real-world scenarios.

In the opposite direction, several discussions demonstrated that a lack of formal definitions can be equally problematic. This vagueness is most prominent on the matter of interpretability / explainability of machine learning algorithms.

Of course, my reflections are subjective and describe the presentations incompletely. Fortunately, the event was live-streamed on YouTube and (presumably) archived. When the archived versions and slides become available I’ll add them here.

First Talk: Judith Simon – Reflections on trust, trustworthiness & responsibility

The workshop began with a talk by Judith Simon, a Professor of Philosophy of Science at the University of Copenhagen. Her talk addressed issues of privacy, trust, and responsibility.

To begin, Simon led with a motivating anecdote: a father complains to an e-merchant after his daughter is bombarded with advertisements for pregnancy-related products. Subsequently the father calls back and apologizes, having discovered that his daughter actually is pregnant.

Throughout the anecdote, the daughter’s life appears to be adversely  impacted by algorithmic decisions in various ways. At first, when we think she is not pregnant, she appears to be the victim of a false rumor. Subsequently, when we discover she is pregnant, it appears that the algorithm inferred and divulged a secret she might have preferred to keep in confidence.

With this context set, Simon asks precisely what constitutes an invasion of privacy? Is it:

  1. The collection of personal data?
  2. The inferences made upon the data?
  3. The divulgence of these inferences, potentially to 3rd parties?

These points warrant serious consideration. Moreover, we might observe that answering each question calls upon a range of expertise spanning traditionally siloed disciplines. What data companies can harvest and who owns that data strikes me as foremost a legal question. What inferences should we draw based on that data, constitutes a philosophical question but one inextricably tied to the domain of machine learning. Finally, we ask how software should behave, given inferences it can access? This appears to call upon both legal, machine learning and HCI perspectives.

Later in her talk, Simons raised an important issue of functional vs epistemic transparency of machine learning. Functional transparency refers to the opacity of systems owing to inaccessibility. For example, Criminal recidivism models might be functionally opaque because the public lacks access to their data, their algorithms, and the learned parameters of their models.

Epistemic transparency, on the other hand, refers to the intrinsic ability (or lack thereof) to understand a model even given full functional transparency. Simon notes we might view functional transparency as a necessary but insufficient step to understanding machine learning.

In the machine learning community, we focus almost exclusively on epistemic opacity. This makes sense. It’s the problem machine learning academics are best equipped to tackle.

We might also note that it’s typically technical people who rise to positions of power within technology companies. Ultimately, the ethical responsibility to provide functional transparency falls on people with mostly technical training. We might ask, how can we count on these stakeholders to do the right thing if these issues are only tackled by legal scholars and philosophers.

Katherine Strandburg – Decision-making, machine learning and the value of explanation

A second talk came from Professor Katherine Strandburg of NYU. In it, she articulated the value of explanation from a lawyer’s perspective. While the talk lacked a technical discussion, I think any researcher interested in interpretable models should check it out.

To begin, she articulated why a right to explanation is a core aspect of due process under the law. Explanations are required because citizens are required only to comply with the letter of the law. In order to subject a citizen to a judgment, one must articulate precisely what someone did and why it is illegal. This explanation must accord with the letter of the law. The requirement that one produce an explanation (and not simply a judgment) is in part intended as a guard against unlawful or discriminatory judgments. [This assumes, of course, that the law is not itself discriminatory].

Moreover, the necessity of providing an explanation is thought to guard against subconscious biases. For example, suppose an adjudicator were predisposed subconsciously to pass judgment against an individual on account of race. In order to actually pass judgment against the individual, the adjudicator would have to produce an explanation that didn’t include race. Ultimately, such an explanation might be difficult to produce absent a real violation of the law. Further, this process of introspection might conceivably help an unconsciously biased adjudicator to uncover the subconscious bias and account for it.

Of course, despite the right to explanation, the legal system is well-known to suffer from systemic biases. Black defendants are more likely to be convicted, and more likely to be sentenced to death. Nevertheless, it seems plausible that absent the right to explanation, the situation could be far worse.

We should ask, do the explanations we generated by today’s efforts at interpretable ML confer these desired properties? Does the task of producing the explanation improve the models? When we ask for interpretable models what are we asking for? Are we sometimes wrongly anthropomorphizing models in the hope that the task of producing an explanation will make them smarter or less biased?

We should consider, when stakeholders ask for explanations what precisely do they want? If what they want is assurance that the decision conforms with a valid chain of legal or causal reasoning, then hardly any of the existing work on model interpretability applies at all. Certainly post-hoc explanations like saliency maps or LIME offer nothing towards accountability.