Introduction to Bayesian Inference

Bayesian inference is a powerful toolbox for modeling uncertainty, combining researcher understanding of a problem with data, and providing a quantitative measure of how plausible various facts are. This overview from Datascience.com introduces Bayesian probability and inference in an intuitive way, and provides examples in Python to help get you started.



By DataScience.com Sponsored Post.

Prerequisites

This post is an introduction to Bayesian probability and inference. We will discuss the intuition behind these concepts, and provide some examples written in Python to help you get started. To get the most out of this introduction, the reader should have a basic understanding of statistics and probability, as well as some experience with Python. The examples use the Python package pymc3.

Bayesian Inference

Introduction to Bayesian Thinking

Bayesian inference is an extremely powerful set of tools for modeling any random variable, such as the value of a regression parameter, a demographic statistic, a business KPI, or the part of speech of a word. We provide our understanding of a problem and some data, and in return get a quantitative measure of how certain we are of a particular fact. This approach to modeling uncertainty is particularly useful when:

  • Data is limited
  • We're worried about overfitting
  • We have reason to believe that some facts are more likely than others, but that information is not contained in the data we model on
  • We're interested in precisely knowing how likely certain facts are, as opposed to just picking the most likely fact

The table below enumerates some applied tasks that exhibit these challenges, and describes how Bayesian inference can be used to solve them. Don't worry if the Bayesian solutions are foreign to you, they will make more sense as you read this post:

Typically, Bayesian inference is a term used as a counterpart to frequentist inference. This can be confusing, as the lines drawn between the two approaches are blurry. The true Bayesian and frequentist distinction is that of philosophical differences between how people interpret what probability is. We'll focus on Bayesian concepts that are foreign to traditional frequentist approaches and are actually used in applied work, specifically the prior and posterior distributions.

Typically, Bayesian inference is a term used as a counterpart to frequentist inference. This can be confusing, as the lines drawn between the two approaches are blurry. The true Bayesian and frequentist distinction is that of philosophical differences between how people interpret what probability is. We'll focus on Bayesian concepts that are foreign to traditional frequentist approaches and are actually used in applied work, specifically the prior and posterior distributions.

Consider Bayes' theorem:

$p(A|B) = \frac{p(A)p(B|A)}{p(B)}$

Think of A as some proposition about the world, and B as some data or evidence. For example, A represents the proposition that it rained today, and B represents the evidence that the sidewalk outside is wet:

$p(\text{rain}|\text{wet}) = \frac{p(\text{rain})p(\text{wet | rain})}{p(\text{wet})} = \frac{p(\text{rain})p(\text{wet | rain})}{p(\text{rain})p(\text{wet | rain}) + p(\text{no rain})p(\text{wet | no rain})}$

p(rain | wet) asks, "What is the probability that it rained given that it is wet outside?" To evaluate this question, let's walk through the right side of the equation. Before looking at the ground, what is the probability that it rained, p(rain)? Think of this as the plausibility of an assumption about the world. We then ask how likely the observation that it is wet outside is under that assumption, p(wet | rain)? This procedure effectively updates our initial beliefs about a proposition with some observation, yielding a final measure of the plausibility of rain, given the evidence.

This procedure is the basis for Bayesian inference, where our initial beliefs are represented by the prior distribution p(rain), and our final beliefs are represented by the posterior distribution p(rain | wet). The denominator simply asks, "What is the total plausibility of the evidence?", whereby we have to consider all assumptions to ensure that the posterior is a proper probability distribution.

Bayesians are uncertain about what is true (the value of a KPI, a regression coefficient, etc.), and use data as evidence that certain facts are more likely than others. Prior distributions reflect our beliefs before seeing any data, and posterior distributions reflect our beliefs after we have considered all the evidence. To unpack what that means and how to leverage these concepts for actual analysis, let's consider the example of evaluating new marketing campaigns.

See Example: Evaluating New Marketing Campaigns Using Bayesian Inference and the rest of the tutorial at DataScience.com site.