Introduction to the White-Box AI: the Concept of Interpretability

ML models interpretability can be seen as “the ability to explain or to present in understandable terms to a human.” Read this article and learn to go beyond the black box of AI, where algorithms make predictions, toward the underlying explanation remains unknown and untraceable.

By Sciforce, software solutions based on science-driven information technologies

The common story: you start analyzing how a certain ML model works and at some point you see the familiar phrase “and then some magic happens”. This is called the black box of AI, where algorithms make predictions, but the underlying explanation remains unknown and untraceable. The root of the evil lies in the inherent complexity that bestows the extraordinary predictive abilities on machine learning algorithms and also makes the answers hard to understand, and maybe even to trust.

It might be a general requirement for any good science to understand and trust models and results, in a number of highly regulated industries, such as healthcare, banking, or insurance, interpretability becomes a serious legal mandate.

This is a very old subject in Machine Learning. Dr Andy first came across it in the ’80s who created a more transparent Fuzzy Logic Rule Induction algorithm which has been improved and refined ever since.

The one way data scientists tried to tackle this issue was to enforce monotonicity constraints, i.e. a relationship that only changes in one direction between independent variables and a machine-learned response function, in reality algorithms persisted to create nonlinear, non-monotonic, non-polynomial, and even non-continuous functions that approximated the relationship between independent and dependent variables in a data set. Another approach was to use linear models, even though it usually meant giving up a couple points on the accuracy scale.

However, it soon became clear that such measures could not ensure acceptable levels of reliability and transparency of prediction and a new approach was required to change the whole concept of model interpretability. The new approach has got named White Box AI and focuses on interpretable models. White-box models, or in other terms interpretable models, are the type of models which one can explain how they behave, how they produce predictions and what the influencing variables are. There are two key elements that make a model white-box: features have to be understandable, and the ML process has to be transparent.

ML models interpretability can be seen as “the ability to explain or to present in understandable terms to a human.” (Doshi-Velez et al., 2017). Regardless of the simple definition, technical challenges and the needs of different user communities have made interpretability a subjective and complicated subject. To make it more objective, a taxonomy (Hall et al., 2917) was adopted that describes models in terms of their complexity, and categorizes interpretability techniques by the global or local scope of explanations they generate, the family of algorithms to which they can be applied, and their ability to promote trust and understanding.


Model complexity and scale of interpretability

Generally, the more complex the model, the more difficult it is to interpret and explain. The number of weights or rules in a model — or its Vapnik–Chervonenkis dimension, a more formal measure — are good ways to quantify a model’s complexity. Here we show the relative complexity of different types of models impacting their interpretability as classified by Patrick Hall and Navdeep Gill (Hall & Gill, 2018):

  • High interpretability in linear, monotonic functions. Models created by traditional regression algorithms are probably the most interpretable class of models. They are called “linear and monotonic” because a change in any given input variable induces a change of the output of the response function at a defined rate, in only one direction, and at a magnitude represented by a preset coefficient. Besides being highly interpretable, linear and monotonic functions are also used in explanatory techniques, including the popular LIME approach.
  • Medium interpretability — nonlinear, monotonic functions. In reality, most machine-learning functions are nonlinear, but some of them can be constrained to be monotonic with respect to any given independent variable. They have no single coefficient to represent the change in the response function output induced by a change in a single input variable, but they still change in one direction as a single input variable changes. Such functions usually allow for the generation of reason codes and relative variable importance measures, so they can be interpreted and used in regulated applications.
  • Low interpretability — nonlinear, nonmonotonic functions. The truth is that the majority of machine learning algorithms create nonlinear, nonmonotonic response functions. Such models are the most difficult to interpret, as the output can change in a positive and negative direction and at a varying rate for any change in an input variable. The only standard interpretability measures these functions provide are relative variable importance measures, so to go behind their intelligence data scientists have to combine several special techniques.


Scope of Interpretability

If we analyze an algorithm training a model to perform certain predictions, we’ll see that each step can be evaluated in terms of transparency or interpretability. As it is often important to understand the entire model trained on a global scale, and be able to zoom into local regions of your data or your predictions, data scientists talk about global and local interpretability. Starting from the algorithm level, we can name the following interpretability levels (Molnar, 2019):

  • Algorithm transparency. The highest level of interpretability shows how the algorithm learns a model from the data and what kind of relationships it can learn. Algorithm transparency requires only knowledge of the algorithm and how it works and not of the data, the learned model or how individual predictions are made. For example, if you use convolutional neural networks to classify images, you know that the algorithm learns edge detectors and filters on the lowest layers. Algorithms such as the least squares method for linear models are well studied and understood, while deep learning approaches are less well understood and, therefore, less transparent.
  • Global, or Holistic Model Interpretability. As Lipton put it, you could describe a model as interpretable if you can comprehend the entire model at once (Lipton 2016). The global level of interpretability is about understanding how the model makes decisions, based on a holistic view of its features and each of the learned components such as weights, other parameters, and structures. It helps to understand the distribution of your target outcome based on the features. However, global model interpretability is very difficult to achieve in practice, since any feature space with more than 3 dimensions will be simply inconceivable for humans. Usually, when people try to comprehend a model, they consider only parts of it, such as the weights in linear models, which brings us to the lower level of interpretability.
  • Global Model Interpretability on a Modular Level. Most probably, we won’t be able to comprehend the whole model, to grasp all weights and features and use them to make predictions. However, we can understand a single weight or feature, which makes models interpretable on a modular level. What is critical to understand is that the interpretable parts are different for different models, such as weights for linear models and splits and leaf node predictors for trees. Besides, the interpretation of a single weight always implies that the other input features remain at the same value, which is not the case in reality. Nevertheless, the weights in a linear model can still be interpreted better than the weights of a deep neural network.
  • Local Interpretability for a Single Prediction. It is possible to zoom in on a single instance and try to understand what the model predicts for this input, and explain why. Local interpretations ensure understanding of small regions of the machine-learned relationship between the prediction target and the input variables, such as clusters of input records and their corresponding predictions, or deciles of predictions and their corresponding input rows, or even single rows of data At the local level, the prediction might only depend linearly or monotonously on some features, rather than having a complex dependence on them. Local explanations can therefore be more accurate than global interpretations.
  • Local Interpretability for a Group of Predictions. Similarly, model predictions for multiple instances can be explained not only with global model interpretation methods (on a modular level), but also locally with explanations of individual instances. In the process analogous to understanding a single prediction, the individual explanation methods for a group of predictions can be used on each instance and then listed or aggregated for the entire group.


Model Awareness

The dependence of interpretability techniques on the model is another important dimension to add to our understanding of interpretability.

  • Model-agnostic techniques can be applied to different types of machine learning algorithms, For example, the LIME technique is model agnostic and can be used to interpret nearly any set of machine learning inputs and machine learning predictions. Model-agnostic interpretability techniques are convenient in the development process, but they often rely on surrogate models or other approximations that can degrade the accuracy of the explanations they provide.
  • Model-specific techniques, on the other hand, are applicable only for a single type or class of algorithm. For instance, the technique known as tree interpreter is model specific and can be applied only to decision tree models. Model-specific interpretation techniques tend to use the model to be interpreted directly, leading to potentially more accurate explanations (Hall & Gill, 2018).

While in this overview we wanted to explore the multidimensional notion of ML models interpretability. In the next post we’ll go specifically into interpretability techniques that appear today in the increasing pace, as more and more organizations and individuals embrace machine learning algorithms for predictive modeling.


Further reading


  1. Doshi-Velez, Finale and Been Kim. “Towards a rigorous science of interpretable machine learning.” arXiv preprint, 2017.
  2. Hall, Patrick and Gill, Navdeep. An Introduction to Machine Learning Interpretability. O’Reilly Media, 2018.
  3. Hall, Patrick, Wen Phan, and Sri Satish Ambati. “Ideas on interpreting machine learning.” O’Reilly Ideas, 2017. https://
  4. Lipton, Zachary C. “The mythos of model interpretability.” arXiv preprint, 2016.
  5. Molnar, Christoph. “Interpretable machine learning. A Guide for Making Black Box Models Explainable”, 2019.

SciForce is a Ukraine-based IT company specialized in development of software solutions based on science-driven information technologies. We have wide-ranging expertise in many key AI technologies, including Data Mining, Digital Signal Processing, Natural Language Processing, Machine Learning, Image Processing, and Computer Vision.

Original. Reposted with permission.