# How I Learned to Stop Worrying and Love Uncertainty

This is a written version of Data Scientist Adolfo Martínez’s talk at Software Guru’s DataDay 2017. There is a link to the original slides (in Spanish) at the top of this post.

Pages: 1 2

### Bayesian Statistics

Bayesianism is rooted in the idea that probability is a measure of uncertainty and, as such, it is dependent on the information available to the individual making the measurement. As a measure, it can be applied to anything you can think of, including unique events, unknown quantities, or the truth about a statement.

The term refers to Thomas Bayes, an 18th-century reverend who proved a special case of the theorem that bears his name. This theorem provides a way to compute the “inverse probability”, that is, the probability of an event *A*given an event *B* when we know the probability of *B* given *A.*

**Bayes Theorem in a neon sign**

*prior*

*distribution*of the parameter. This

*prior distribution*encodes the information possessed before any data is observed.

Through the use of this theorem and its definition of probability, Bayesian Statistics can combine information possessed about a phenomenon with observed data about it, and produce updated, more accurate information. And though the inference made in this way is subjective, the theory of Bayesian Statistics states that, as we collect more and more data, the subjective part (the prior information) becomes less and less relevant; the subjective approximates the objective.

Like in frequentism, simple Bayesian models have a straightforward interpretation, for example, the posterior distributions of linear coefficients measure the uncertainty

around the effect of an independent variable in the dependent.

**Posterior distributions for a Bayesian linear regression. The peak of the distribution represents the most likely value for the parameter, while the spread represents the uncertainty about it.**

**Bayes Theorem applied to hypothesis evaluation. (Credit: Fast Forward Labs)**

And, unlike supervised learning methods, statistics provide the full distribution of the response variable given the features, allowing us to ask any number of questions related to it. This conditional distribution also encodes the uncertainty about our predictions, allowing us, for example, to compute prediction intervals as opposed to single values for each input combination.

**The predictive distribution of y given variables x and data D. The shadowed area represents the probability that y is between 0 and 2.5**

### Some Limitations

Of course, there is a reason mainstream science uses frequentist methods instead of bayesian ones, and it comes down to practicality; for the past centuries, the applicability of Bayesianism was limited by the hard, sometimes impossible, integrals that must be solved or approximated to make it work. One is needed to compute the “posterior” distribution, that is, the measure of uncertainty after observing data, and another for the predictive distribution, which will tell us what is the likely value of a “new” data point, possibly given some other variables.

**The predictive distribution. This integral is often impossible to solve analytically**

Even more advanced methods, such as Automatic Differentiation Variational Inference (ADVI) further reduce the time and tuning needed to arrive at the posterior distributions.

There are further philosophical questions and practical considerations that have prevented the mainstream use of these methods, although the latter have been somewhat lessened by recent developments in *probabilistic programming.*

### Probabilistic Programming

Probabilistic programming is the name given to frameworks capable of fully specifying a Bayesian model and make inference with only a couple of lines.

The following snippets are taken from an example of a mean change detection model, taken from the excellent book by Cameron Davidson-Pilon, Bayesian Methods for Hackers, where you can find it in full.

Here is the model specification in PyMC3.

**A mean change model using PyMC3**

**Inference using PyMC3, using the Metropolis MCMC algorithm**

### Love Uncertainty

In conclusion, **Bayesian Statistics** provide a framework for data analysis which can overcome many limitations prevalent in different techniques such as Supervised Learning and Frequentist Statistics.

In particular, they provide a way to deal with the problem of **overcertainty**, allowing us to ask questions about probability, and enabling the analyst to have a healthier relationship with uncertainty, by measuring and presenting it instead of blindly reducing it.

### Further Reading

*Bayesian Methods for Hackers**,*by Cameron Davidson-Pilon.*Probabilistic Programming*, a report by Fast Forward Labs.*Bayesian Reasoning and Machine Learning*, by David Barber.

Original. Reposted with permission.

**Bio**: Adolfo Martínez is a data scientist at datank.ai and a self-confessed professional procrastinator.

**Related:**

- The Intuitions Behind Bayesian Optimization with Gaussian Processes
- Unfolding Naive Bayes From Scratch
- Frequentists Fight Back

Pages: 1 2