Bayesian Basics, Explained
This interview between Professor Andrew Gelman of Columbia University and marketing scientist Kevin Gray covers the basics of Bayesian statistics and how it differs from the ordinary statistics most of us learned in college.
KG: Does Bayesian inference have a special role in Big Data or the Internet of Things?
AG: Yes, I think so. The essence of Bayesian statistics is the combination of information from multiple sources. We call this data and prior information, or hierarchical modeling, or dynamic updating, or partial pooling, but in any case it’s all about putting together data to understand a larger structure. Big data, or data coming from the so-called internet of things, are inherently messy: scraped data not random samples, observational data not randomized experiments, available data not constructed measurements. So statistical modeling is needed to put data from these different sources on a common footing. I see this in the analysis of internet surveys where we use multilevel Bayesian models to use non-random samples to make inferences about the general population, and the same ideas occur over and over again in modern messy-data settings.
KG: What are the most important things non-statisticians need to know about Bayesian statistics? Are there things we need to be especially careful about when using these methods?
AG: It’s hard to say. You have to learn by doing, and one place to start is to look at some particular problem. One example that interested me recently was a website constructed by the sociologist Pierre-Antoine Kremp, who used the open-source statistics language R and the open-source Bayesian inference engine Stan (named after Stanislaw Ulam, the inventor of the Monte Carlo method mentioned earlier) to combine U.S. national and state polls to make daily forecasts of the U.S. presidential election. In an article for Slate, I called this “the open-source poll aggregator that will put all other poll aggregators out of business” because ultimately you can’t beat the positive network effects of free and open-source: the more people who see this model, play with it, and probe its weaknesses, the better it can become. The Bayesian formalism allows a direct integration of data from different sorts of polls in the context of a time-series prediction models.
You ask if there are things we need to be especially careful about. As a famous cartoon character once said, With great power comes great responsibility. Bayesian inference is powerful in the sense that it allows the sophisticated combination of information from multiple sources via partial pooling (that is, local inferences are constructed in part from local information and in part from models fit to non-local data), but the flip side is that when assumptions are very wrong, conclusions can be far off too. That’s why Bayesian methods need to be continually evaluated with calibration checks, comparisons of observed data to simulated replications under the model, and other exercises that give the model an opportunity to fail. Statistical model-building, but maybe especially in its Bayesian form, is an ongoing process of feedback and quality control.
A statistical procedure is a sort of machine that can run for awhile on its own, but eventually needs maintenance and adaptation to new conditions. That’s what we’ve seen in the recent replication crisis in psychology and other social sciences: methods of null hypothesis significance testing and p-values, which had been developed for analysis of certain designed experiments in the 1930s, were no longer working a modern settings of noisy data and uncontrolled studies. Savvy observers had realized this for awhile—psychologist Paul Meehl was writing acerbically about statistically-driven pseudoscience as early as the 1960s—but it took awhile for researchers in many professions to catch on. I’m hoping that Bayesian modelers will be sooner to recognize their dead ends, and in my own research I’ve put a lot of effort into developing methods for checking model fit and evaluating predictions.
KG: You are one of the principal developers of the Stan software. When statisticians are "shopping around" for software that handles Bayes, what features should they look for first? Conversely, are there features that are now outdated or that they should avoid?
AG: Different software will serve different needs. Many users will not know a lot of statistics and will want to choose among some menu of models or analyses, and I respect that. We have written wrappers in Stan with pre-coded versions of various standard choices such as linear and logistic regression, ordered regression, multilevel models with varying intercepts and slopes, and so forth, and we’re working on tutorials that will allow the new user to fit these models in R or Stata or other familiar software.
Other users come to Stan because they want to build their own models, or, better still, want to explore their data by fitting multiple models, comparing them, and evaluating their fit. Indeed, our motivation in developing Stan was to solve problems in my own applied research, to fit models that I could not easily fit any other way.
Statistics is sometimes divided between graphical or “exploratory” data analysis, and formal or “confirmatory” inference. But I think that division is naive: in my experience, data exploration is most effectively done using models, and, conversely, our most successful models are constructed as the result of an intensive period of exploration and feedback. So, for me, I want model-fitting software that is:
- Flexible (so I can fit the models I want and expand them in often unanticipated ways)
- Fast (so I can fit many models)
- Connected to other software (so I can prepare my datasets before entering them in the model, and I can graphically and otherwise explore the fitted model relative to the data)
- Open (so I can engage my collaborators and the larger scientific community in my work, and conversely so I can contribute by sharing my modeling expertise in a common language)
- Readable and transparent (both so I can communicate my models with others and so I can actually understand what my models are doing).
Our efforts on Stan move us toward these goals.
KG: What are the developments in Bayesian statistics that might have an impact on the behavioral and social sciences in the next few years?
AG: Lots of directions here. From the modeling direction, we have problems such as polling where our samples are getting worse and worse, less and less representative, and we need to do more and more modeling to make reasonable inferences from sample to population. For decision making we need causal inference, which typically requires modeling to adjust for differences between so-called treatment and control groups in observational studies. And just about any treatment effect we care about will vary depending on scenario. The challenge here is to estimate this variation, while accepting that in practice we will have a large residue of uncertainty. We’re no longer in the situation where “p < .05” can be taken as a sign of a discovery. We need to accept uncertainty and embrace variation. And that’s true no matter how “big” our data are.
In practice, much of my thought goes into computing. We know our data are messy, we know we want to fit big models, but the challenge is to do so stably and in reasonable time—in the current jargon, we want “scalable” inference. Efficiency, stability, and speed of computing are essential. And we want more speed than you might think, because, as discussed earlier, when I’m learning from data I want to fit lots and lots of models. Of course then you have to be concerned about overfitting, but that’s another story. For most of the problems I’ve worked on, there are potential big gains from exploration, especially if that exploration is done through substantively-based models and controlled with real prior information. That is, Bayesian data anlaysis.
KG: Thank you, Andrew!
A version of this article was first published in RW Connect December 2, 2016