Top 10 Big Ideas in Harvard Statistics 110 Class

The Big Ideas in Statistics include: Conditioning (the soul of statistics), Random variables and random vectors, Stories, Symmetry, Linearity of expectation, LOTUS, Variance, covariance, and correlation.

By Gregory Piatetsky, Dec 6, 2013.

My recent post Why statistical community is disconnected from Big Data and how to fix it, which presented opinions from the leaders of ASA (American Statistical Association), has generated a vigorous discussion with over 60 comments on LinkedIn, with many opinions on the Statistics role in the current Big Data revolution.

HarvardHowever, the approaches of statisticians and data scientists are very different, and a good example of this is the list of 10 Big Ideas in Stat 110 (Quora) from the Harvard Professor Joe Blitzstein, presented on the last day of Statistics 110 classes for the semester.

  1. Conditioning. Conditioning is the soul of statistics. This includes both conditional probability and conditional expectation. It includes ideas such as Bayes' rule, the law of total probability.
  2. Random variables and their distributions, and random vectors and their joint distributions. If conditioning is the soul of statistics, then random variables are the bread and butter of statistics (basic, nourishing, and delicious). Statistics is about quantifying uncertainty, and random variables/vectors are fundamental for doing this precisely.
  3. Stories. Understanding the stories introduced in Stat 110 makes it much easier to see how the famous distributions are used, how they became famous, and how they are connected.
  4. Symmetry. Several forms of symmetry come up in Stat 110. Often a symmetry argument can reduce ugly, tedious calculations to something short and sweet.
  5. Linearity of expectation. The fact that linearity of expectation holds so generally (for dependent r.v.s, not just independent r.v.s) has far-reaching consequences throughout statistics.
  6. Indicator random variables and the fundamental bridge. The idea behind indicator r.v.s isn't much different than writing 3 as 1+1+1, so it is surprising how powerful they are. The fundamental bridge, which says that the expected value of the indicator r.v. for event A is P(A), bridges between expectation and probability, and between random variables and events.
  7. LOTUS. The law of the unconscious statistician, which to many people seems at first too good to be true, is a powerful tool for finding the expected value of a function of an r.v.
  8. Variance, covariance, and correlation. Statistics is about far more than just "point estimation" (computing or estimating the mean in some problem). We would like to study variability, not just averages. We would like to give predictive intervals, not just single guesses. Variance lets us quantify this. Covariance is the analog for two r.v.s, and is also often needed for computing variances. Correlation is a standardized version of covariance.
  9. Law of large numbers and central limit theorem. These theorems, which may be the two most famous theorems in all of probability, reveal a lot about what happens to the sample mean of a lot of i.i.d. r.v.s. Without them, science as we know it would be impossible.
  10. Markov chains. This is a remarkably beautiful and useful stochastic process. They were first studied by Markov as part of a philosophical debate about religion and free will, as a way to go beyond i.i.d. But in recent years they have proven worthwhile in a vast assortment of problems, especially through Markov chain Monte Carlo (MCMC).

Here is the full Quora answer: