Evidence Counterfactuals for explaining predictive models on Big Data

Big Data generated by people -- such as, social media posts, mobile phone GPS locations, and browsing history -- provide enormous prediction value for AI systems. However, explaining how these models predict with the data remains challenging. This interesting explanation approach considers how a model would behave if it didn't have the original set of data to work with.

By Yanou Ramon, Applied Data Mining Research Group, U. of Antwerp.


Predictive models on Big Data: Mining a pool of evidence


Why did the model predict you'd be interested in this post, based on the hundreds of KDNuggets posts you read? Because you read the post about "explainable AI" and the post about "cracking open the black box": if you had not read these posts, you would not have been predicted to be interested.

The above example is an imaginary "Evidence Counterfactual" for a model that would predict interest in this post, based on your browsing data on KDNuggets (much like targeted online advertising works these days). In this post, you'll learn more about the Evidence Counterfactual, an explanation approach to explaining the decisions of any predictive system that uses Big Data.

More companies are tapping into a rich pool of humanly-generated data (also referred to as "behavioral big data"). Think of a person liking Instagram posts, visiting different locations captured by their mobile GPS, browsing web pages, searching Google, making online payments, connecting to other people on LinkedIn, writing reviews on Reddit or Goodreads, and so on. Mining these massive behavioral traces leads to artificial intelligent (AI) systems with very high predictive performance in a variety of application areas,1 ranging from finance to risk to marketing.

The goal of these AI systems is to predict a variable of interest from these data, such as creditworthiness, fraudulent behavior, personality traits, or product interest. The input data are characterized by a large number of small pieces of evidence that the model uses to predict the output variable. Let's refer to this as the "evidence pool." These pieces of evidence are either "present" for an instance (e.g., a person in the data set) or "missing," and each instance only has a relatively small portion of evidence present. As Foster Provost explains in this talk,2 a predictive model can be thought of as an evidence-combining system, where all pieces of evidence that are present for an instance can be used by the model to make predictions.

To illustrate more clearly how to see behavioral big data as a "pool of evidence," think of a model that uses location data of persons in New York City to predict someone as a tourist or NY citizen. Out of all possible places to go to (the "evidence pool"), a person would only visit a small number of places each month (the "evidence of that person"). In a numerical data representation, each place is represented by a binary feature (see the columns in Figure 1), and the places someone visited will get a corresponding nonzero value for that person. All places that are not visited by that person are "missing" pieces of evidence and get a corresponding zero value. In Figure 1, for example, Anna visited 85 places out of the 50,000 possible places used by the predictive model. For example, she visited Time Square and Dumbo, however, she did not visit Columbia University, making this piece of evidence missing.

Yanou Fig1 Locationdata

Intuition behind the Evidence Counterfactual


It is not straightforward to interpret how predictive systems trained from behavioral footprint data make their decisions, either because of the modelling technique (it can be highly nonlinear such as Deep Learning models) or the data (very high-dimensional and sparse), or both.

To understand the reasons behind individual model predictions, Evidence Counterfactuals (or simply "counterfactuals") have been proposed. This explanation approach (to the best of our knowledge first proposed for predictive modeling in this paper3 to explain document classifications) is mainly inspired by causal reasoning, where the goal is to identify a causal relationship between two events: an event A causes another event B if we observe a difference in B's value after changing A while keeping everything else constant.4

The Evidence Counterfactual shows a subset of evidence of the instance (event A) that causally drives the model's decision (event B). For any subset of evidence of an instance, we can imagine two worlds, identical in every way up until the point where the evidence set is present in one world, but not in the other. The first world is the "factual" world, whereas the unobserved world is the "counterfactual" world. The counterfactual outcome of the model is defined as the hypothetical value of the output under an event that did not happen (e.g., a set of pieces of evidence is no longer present for that instance). The counterfactual explanation can be defined as an irreducible set of evidence pieces such that, if it were no longer present for that instance, the model's decision would be different. (We can also talk about "removing evidence" when making pieces of evidence "missing"). The irreducibility indicates that removing a subset of the features that are part of the counterfactual explanation does not affect the model's decision.

To clarify this definition, consider the following Evidence Counterfactual as an explanation for why Anna was predicted as a tourist in our running location data example:

IF Anna did not visit Time Square and Dumbo, THEN the model's prediction changes from tourist to NY citizen.

The pieces of evidence {Time Square, Dumbo} are a subset of the evidence of Anna (all the places she visited). Just removing Time Square or Dumbo from her visited locations would not be sufficient to change the predicted class (this refers to the irreducibility of the Evidence Counterfactual). The "factual world" is the one that's observed and includes all the places Anna visited. The "counterfactual world" that results in a predicted class change is identical to the factual world in every way up until the two locations Time Square and Dumbo.

An important advantage of counterfactuals that immediately catches the eye in the above example is that they do not require all features that are used in the model (the "evidence pool") or all the evidence of the instance (e.g., all places Anna visited) to be part of the explanation. This is especially interesting in the context of humanly-generated big data. How useful would an explanation be that shows the marginal contribution of each visited location to the prediction of being a tourist? Such an explanation encompasses hundreds of locations. The evidence counterfactual bypasses this issue by only showing those pieces of evidence that have led to the decision (they are causal with respect to the decision) and evidence that's relevant for that particular person (only locations visited by that person or more general, evidence of that particular instance, can be part of the explanation).

To illustrate how counterfactual explanations can be used to explain models on big data, consider the well-known 20 Newsgroups data5 where we want to predict whether a document is about a "Medical" topic. Figure 2a shows all the words being used in the predictive model and the evidence (i.e., words) of each document. The counterfactual explanation that explains why document 01's predicted topic is Medical is shown in Figure 2b. There are 17 words that need to be removed from the document so that the predicted topic would no longer be "Medical," meaning there is quite some evidence that explains the model's decision.

Yanou Fig2b Medicaltopic

Yanou Fig2b Medicaltopic

Consider another model trained on the 20 Newsgroups data to predict documents with the topic "Atheism," where we do not remove header data as a textual preprocessing step. Figure 3a/b shows how the Evidence Counterfactual can help to identify problems with the trained model. Even though document 01 was correctly classified, the header information is being used to differentiate documents with the topic "Atheism" from documents with other topics. This leads to predictions being made for arbitrary reasons that have no clear connection with the predicted topic (e.g., "psilink," "p00261"). It is unlikely that this arbitrary information is useful when predicting topics of new documents. This example illustrates how Evidence Counterfactuals can be used for identifying issues with a predictive system (such as predictions being "right for the wrong reasons") and how such explanations can be a starting point for improving the model and the data preprocessing.

Yanou Fig3a Atheism topic

Yanou Fig3b Atheism topic

For more illustrations of counterfactuals for explaining models on behavioral big data, visit this GitHub repository. There are tutorials on explanations for gender prediction using movie viewing data using a Logistic Regression and a Multilayer Perceptron model, and Topic prediction from news documents using a Support Vector Machine with a linear kernel function.


Computing counterfactuals for binary classifiers


The huge dimensionality of the behavioral data makes it infeasible to compute counterfactual explanations using a complete search algorithm (this search strategy would check all subsets of evidence of an instance up until an explanation is found).

Alternatively, a heuristic search algorithm can be used to efficiently find counterfactuals. In the original paper, a best-first search has based on the scoring function of the model (the open-source Python code is available on GitHub). This scoring function is used to first consider subsets of evidence (features) that, when removed (set feature value to zero), reduce the predicted score the most in the direction of the opposite predicted class. These are the best-first feature combinations. There are at least two weaknesses of this strategy: 1) for some nonlinear models, removing one feature does not result in a predicted score change, which results in the search algorithm picking a random feature in the first iteration. This can result in counterfactuals that have too many features in the explanation set or a search time that becomes exponentially large because of the growing number of search iterations. 2) Second, the search time is very sensitive to the size of the counterfactual explanation: the more evidence that needs to be removed, the longer it takes the algorithm to find the explanation.

As an alternative to the best-first search, we proposed in this paper6 a search strategy that chooses features to consider in the explanation according to their overall importance for the predicted score. The importance weights can be computed by an additive feature attribution technique, such as the popular explanation technique LIME. The idea is that the more accurate the importance rankings are, the more likely it is to find a counterfactual explanation starting from removing the top-ranked feature up until a counterfactual explanation is found. The hybrid algorithm LIME-Counterfactual (LIME-C) seems to be a favorable alternative to the best-first search, because of its overall good effectiveness (high percentage of small-sized counterfactuals found) and efficiency. Another interesting upshot of this paper is that it solves an important issue related to importance-ranking methods (like LIME) for high-dimensional data, namely, how many features to show to the user? For counterfactuals, the answer is the number of features that results in a predicted class change.


Other data and models


Evidence Counterfactuals can address various data types, from tabular data to textual data to image data. The focal issue is to define what it means for evidence to be "present" or "missing." To compute counterfactuals, we thus need to define the notion of "removing evidence" or setting evidence to "missing."

In this post, we focused on behavioral big data. For these data, which is very sparse (a lot of zero values in the data matrix), it makes sense to represent evidence that's present to those features (e.g., word or behavior) having a corresponding nonzero value. The absence of a piece of evidence is represented by a zero value for that feature.

For image data, the Evidence Counterfactual shows which parts of the image need to be "removed" to change the predicted class. Removing parts of the image can correspond to setting the pixels to black or blurring that part.7 For tabular data (think of data that can be shown in a standard Excel file), that has both numerical and categorical variables, the "missingness" of features can correspond to replacing the feature value to the mean or mode, respectively for numerical and categorical features.8


Key takeaways


  • Predictive systems that are trained from humanly-generated Big Data have high predictive performance, however, explaining them becomes challenging because of the modeling technique (e.g., Deep Learning), the dimensionality of the data, or both.
  • Explaining data-driven decisions is important for a variety of reasons (increase trust and acceptance, improve models, inspect misclassifications, aid in model use, gain insights, etc.), and for many different stakeholders (data scientists, managers, decision subjects, etc.).
  • The Evidence Counterfactual is an explanation approach that can be applied across many relevant applications and highlights a key subset of evidence of an instance that led to a particular model decision. It shows a set of evidence such that, when removing this evidence, the model's decision would be different.

GitHub resource


  1. Junqué de Fortuny, E., Martens, D., Provost, F., Predictive Modeling with Big Data: Is Bigger Really Better?, Big Data, 1(4), pp215-226, 2013
  2. Provost, F., Understanding decisions driven by big data: from analytics management to privacy-friendly cloaking devices, Keynote Lecture, Strate Europe, https://learning.oreilly.com/library/view/stratahadoop/9781491917381/video203329.html (2014)
  3. Martens, D., Provost, F., Explaining data-driven document classifications, MIS Quarterly, 38(1), pp73-99 (2014)
  4. https://causalinference.gitlab.io/causal-reasoning-book-chapter1/
  5. 20 Newsgroups data set: http://qwone.com/~jason/20Newsgroups/
  6. Ramon, Y., Martens, D., Provost, F., Evgeniou, T., Counterfactual Explanation Algorithms for Behavioral and Textual Data, arXiv:1912.01819 (2019). Available online
  7. Vermeire, T., Martens, D., Explainable Image Classification with Evidence Counterfactual, arXiv:2004.07511. Available online
  8. Fernandez, C., Provost, F., Han, X., Explaining data-driven decisions made by AI systems: the counterfactual approach, arXiv:2001.07417 (2019). Available online

Bios: Yanou Ramon graduated in 2018 as a business engineer from the University of Antwerp (Faculty of Business and Economics). She now works as a PhD student at the University of Antwerp under Professor David Martens (Applied Data Mining group). The topic of her dissertation is on making it easier for humans to understand and interact with predictive models on Big Data by using (post-hoc) techniques to explain model decisions, both on the instance and global level.

David Martens is a Professor at the University of Antwerp, where he heads the Applied Data Mining group. His work focuses on the development and application of data mining techniques for very high-dimensional (behavior) data and the use thereof in business domains such as risk, marketing, and finance. A key topic in his research relates to the ethical aspects of data science and the explainability of prediction models.