The amazing predictive power of conditional probability in Bayes Nets
This article explains how Bayes Nets gain remarkable predictive power by their use of conditional probability. This adds to several other salient strengths, making them a preeminent method for prediction and understanding variables’ effects.
By Steven M. Struhl, ConvergeAnalytic.
Using conditional probability gives Bayes Nets strong analytical advantages over traditional regression-based models. This adds to several advantages we discussed in an earlier article. These include:
- All variables interconnecting. Any change in one variable takes into account how that variable relates to all others. This is a basic property of networks. In regressions, any change assumes that all other variables remain constant. That works perfectly with a controlled experiment, but rarely so in real life, where we find many subtle connections.
- A whole-distribution view of data. The entire distribution of variables’ values enters the analysis. All regression-based models rely on correlations. A correlation is a one-number summary of how well two variables align in a straight line.
- Ability to handle many variables. Working Bayes Nets with over 2000 variables have been reported in scientific articles. These provide accurate predictions and readings of effects. This differs from regression, where the apparent effects of variables shrink as more are added to the model. In fact, in regression, the coefficient or effect of any given variable in a model may be influenced more by the presence of other variables in the model than any underlying relationship with the target or dependent variable.
But what is conditional probability and what makes it different? In short, conditional probability means that the effects of one variable depend on, of flow from, the distribution of another variable (or others). The complete state of one variable determines how another acts. This likely sounds opaque, so let’s see how this works.
This is a small example of conditional probability that you can see in more detail in Practical Text Analytics (Struhl, 2015). The initial problem come from the work of Kahneman and Tversky (1982).
The taxicab problem
Suppose there is a city with just yellow and white cabs. Some 85% of the cabs are yellow, and the rest white. An accident occurs and a witness says the cab is white. He proves to be 80% correct at identifying each color cab. If he says the cab is white, what are the odds the cab truly is white?
Most people guess either 12% or 80%. Some try a cleverer-seeming approach, and say 80% times 80%, or 64%. But all these answers are quite wrong.
Bayes Nets immediately see the correct answer. It is—hold on—41.4%.
How can nearly all of us be so wrong—and how can Bayes Nets solve this easily? To show how this works, we will need to make our own tiny network.
While networks can self-assemble from patterns in the data, we also can make them by hand to solve specific problems. This network is as small as possible, linking two variables: the color of the cab, and what the witness reports as the color. Each variable is called a node.
Making our own network
We understand that the witness’ report of the cab’s color depends on its actual color, so we will draw a small network with the color of the cab leading to what the witness says, as you see in Figure 1. You must always have a direction between variables in a network, with the arrow pointing toward the variable that depends on the other or others.
This depiction has little meaning until we can see what is inside each node. Each node holds a table numerically representing the situation we described. First, we set up the node showing the odds of a taxicab being each color, which we have called actual cab color. We see this in Figure 2a.
Then we set up the second node showing the odds of the witness being right about each type of cab. We see that in Figure 2b. Whether the color is yellow or white, the witness says the right color 80% of the time, and says the other (wrong) color 20% of the time.
Now what happens when the witness says he saw a white cab? We can manipulate the network diagram in the Bayes Net software. We first change its display so that it shows bar charts representing the probabilities we just defined. We then can tweak the diagram, and move the value of white in the witness node to 100%. This corresponds to the witness saying “white.” We see what happens in Figure 3.
The actual cab color and what the witness says are linked (as we saw in figure 1). Having linkages is a basic quality of a network. If we change the values in one linked node, the other node will change along with it. This happens regardless of which way an arrow points.
The network easily solves what nearly all of us cannot intuit.
This is the underlying math that the network uses. Of 100 cabs, the witness would identify 12 out of 15 white cabs correctly (12 = 15 x 0.80, the level of correct identification). However, of the 100, he would also misidentify 17 yellow cabs as white (17 = 85 X 0.20, the level of misidentification). That is, the witness saying the cab was white would identify 29 cabs out of 100 as white. But of those, only 12 would truly be white.
The odds therefore would be 12/29—or 41.4%. This answer appears in the changed network diagram to the right in Figure 3.
These odds are conditional upon the percentage of white and yellow cabs (in the “actual cab color” node). If 90% of the cabs were yellow (and our hapless witness still 80% correct about each color), the odds he was right about calling a cab white would go still lower. He would then correctly identify 8 out of 26 as white, or 30.7%.
Conditional probability is what makes Bayesian networks Bayesian. That is, what happens in one node is conditional upon, or dependent upon, conditions in another node. This way of approaching problems has tremendous analytical power.
Still it is not intuitive, and may even take a couple of readings when laid out step by step. Such is conditional probability. To paraphrase one expert, these networks lead to remarkable results—but they are hard to understand, for the novice and the experienced user alike (Yudkowsky, 2006). Now, let’s all take a deep breath.
The payoffs
This ability would not mean anything if predictive results were not strong. Fortunately, they almost invariably are. Results are still more impressive with larger Bayes Nets, and particularly with ones that assemble themselves based on patterns in data. Looking at larger networks, it is hard to avoid anthropomorphic terminology, such as saying the network has “insight” into the data, or that it has “seen into” the problem.
Your author has seen networks cutting through problems that stopped other methods cold in dozens of studies, and many others are reported in the literature. Just one recent example: A Bayesian network predicted market share using 74 questionnaire questions with 85% accuracy, and 76% validated. (Validation involves holding part of the data aside, building the model on the rest, and then testing it with the unused data. In weak models, the validated accuracy drops precipitously, because weak models tend to seize on patterns peculiar just to the data being used.) The best regression-based model, a monstrous partial-least squares regression, reached only 11% correct prediction.
Overall, despite the ways they work outside the bounds of our intuition, Bayes Nets are well worth exploring. They can solve problems and produce excellent results where other methods cannot.
References
Gill, R. (2010), “Monty Hall problem,” International Encyclopedia of Statistical Science, pp 858–863, Springer-Verlag, Berlin
Kahneman, D., Slovic, P., Tversky, A. (eds.) (1982), Judgment under Uncertainty: Heuristics and Biases, pp 156-158, Cambridge University Press, Cambridge, UK
Struhl, S (2015), Practical Text Analytics, Kogan Page, London
Struhl, S (2017), Artificial Intelligence Marketing and Predicting Consumer Choice, Kogan Page, London
Witten, I, Frank E (2005) Data Mining: Practical Machine Learning Tools and Techniques (2nd Ed), Morgan Kaufmann, San Francisco
Yudkowsky, E. (2006), “An Intuitive Explanation of Bayes’ Theorem,” http://yudkowsky.net/rational/bayes (Last accessed 9/5/2017)
Bio: Dr. Steven Struhl is the author of Artificial Intelligence Marketing and Predicting Consumer Choice (2017, Kogan Page), Practical Text Analytics (2015, Kogan Page) and Market Segmentation(1992, American Marketing Association; revised 2013). He is founder and principal at Converge Analytic, specializing in advanced analytics and marketing sciences. He has over 30 years’ experience with a wide range of industries, as well as with governmental and non-profit agencies.
Related: