# What Statistics Topics are Needed for Excelling at Data Science?

Here is a list of skills and statistical concepts suggested for excelling at data science, roughly in order of increasing complexity.

**By Sergey Feldman, Data Cowboys**.

"Data scientist" is a vague new job and you never know what tools you'll need to succeed. Lots of stuff I do at work I have never done before, but grad school was as much about learning how to learn quickly & think mathematically, as it was about learning specific models & techniques.

In general, I recommend that you are able to (a) think in math and (b) code those thoughts up. Everything else you can teach yourself on the spot. But here is a giant list roughly in order of increasing complexity.

**Coding.** Be a master of Python and/or R. There are other options but these two are ubiquitous nowadays.

**Know Thy Distributions.** You should have a good intuition of what distribution is used for what. Given some data, you should be able to do something like this for many scenarios:

Q: Is my data well-modeled by a Pareto?

*A: No, the empirical histogram is not monotonically decreasing. *

Q: A Gaussian of course!

*A: Nope, there aren't any negative values. *

Q: How about the Exponential?

*A: No, there are no zeros. *

Q: OK, uh, the von Mises?

*A: Don't be silly, I'm pretty sure this data doesn't reside on the surface of a circle...*

Q: The log-normal!

*A: That sounds good. Better plot it and see...*

**Fitting. **Once you've got your distributions down, you should know how to fit them to data in slick ways. Start with maximum likelihood and go from there.

**Classical hypothesis testing.** I think p-values and frequentist hypothesis testing in general are really hard to explain & hard to understand (failing to reject null hypotheses &c), but both are still ubiquitous.

**Markov chains** + bells + whistles.

**Basic Bayesian thinking & modeling.** Learn to think of everything as a probability distribution instead of just a single value (if appropriate). Be able to assemble the models & compute with them.

**Some old-school stats and probability theory. **E.g. "Random variables; transformations, conditional expectation, moment generating functions, convergence, limit theorems, estimation; Cramer-Rao lower bound, maximum likelihood estimation, sufficiency, ancillarity, completeness. Rao-Blackwell theorem. Some decision theory."

**Regression!** First linear, then ** non-linear. **(Gasp!)

**Machine learning.** I know you said "statistics," but really if you want to be a "data scientist" then machine learning will be an amazingly versatile & useful toolbelt for you. Also, machine learning is broad, so maybe that could be another Quora question. =)

**Writing. **Communicate your ideas clearly, succinctly, & compellingly.

Good luck!

**Bio: Sergey Feldman** is a machine learning and data science consultant. He's the founder of Data Cowboys, and lives in Seattle.

Original. Reposted with permission.

**Related:**

- Why Big Data is in Trouble: They Forgot About Applied Statistics
- Big Data, Bible Codes, and Bonferroni
- 15 Mathematics MOOCs for Data Science