How I Learned to Stop Worrying and Love Uncertainty
This is a written version of Data Scientist Adolfo Martínez’s talk at Software Guru’s DataDay 2017. There is a link to the original slides (in Spanish) at the top of this post.
Bayesianism is rooted in the idea that probability is a measure of uncertainty and, as such, it is dependent on the information available to the individual making the measurement. As a measure, it can be applied to anything you can think of, including unique events, unknown quantities, or the truth about a statement.
The term refers to Thomas Bayes, an 18th-century reverend who proved a special case of the theorem that bears his name. This theorem provides a way to compute the “inverse probability”, that is, the probability of an event Agiven an event B when we know the probability of B given A.
Through the use of this theorem and its definition of probability, Bayesian Statistics can combine information possessed about a phenomenon with observed data about it, and produce updated, more accurate information. And though the inference made in this way is subjective, the theory of Bayesian Statistics states that, as we collect more and more data, the subjective part (the prior information) becomes less and less relevant; the subjective approximates the objective.
Like in frequentism, simple Bayesian models have a straightforward interpretation, for example, the posterior distributions of linear coefficients measure the uncertainty
around the effect of an independent variable in the dependent.
Bayes Theorem applied to hypothesis evaluation. (Credit: Fast Forward Labs)
And, unlike supervised learning methods, statistics provide the full distribution of the response variable given the features, allowing us to ask any number of questions related to it. This conditional distribution also encodes the uncertainty about our predictions, allowing us, for example, to compute prediction intervals as opposed to single values for each input combination.
Of course, there is a reason mainstream science uses frequentist methods instead of bayesian ones, and it comes down to practicality; for the past centuries, the applicability of Bayesianism was limited by the hard, sometimes impossible, integrals that must be solved or approximated to make it work. One is needed to compute the “posterior” distribution, that is, the measure of uncertainty after observing data, and another for the predictive distribution, which will tell us what is the likely value of a “new” data point, possibly given some other variables.
Even more advanced methods, such as Automatic Differentiation Variational Inference (ADVI) further reduce the time and tuning needed to arrive at the posterior distributions.
There are further philosophical questions and practical considerations that have prevented the mainstream use of these methods, although the latter have been somewhat lessened by recent developments in probabilistic programming.
Probabilistic programming is the name given to frameworks capable of fully specifying a Bayesian model and make inference with only a couple of lines.
The following snippets are taken from an example of a mean change detection model, taken from the excellent book by Cameron Davidson-Pilon, Bayesian Methods for Hackers, where you can find it in full.
Here is the model specification in PyMC3.
In conclusion, Bayesian Statistics provide a framework for data analysis which can overcome many limitations prevalent in different techniques such as Supervised Learning and Frequentist Statistics.
In particular, they provide a way to deal with the problem of overcertainty, allowing us to ask questions about probability, and enabling the analyst to have a healthier relationship with uncertainty, by measuring and presenting it instead of blindly reducing it.
- Bayesian Methods for Hackers, by Cameron Davidson-Pilon.
- Probabilistic Programming, a report by Fast Forward Labs.
- Bayesian Reasoning and Machine Learning, by David Barber.
Original. Reposted with permission.
- The Intuitions Behind Bayesian Optimization with Gaussian Processes
- Unfolding Naive Bayes From Scratch
- Frequentists Fight Back