A Non-comprehensive List of Awesome Things other People Did in 2015

A top statistics professor and statistical researcher reflects on a number of awesome accomplishments by individuals in, and related to, the fields of statistics and data science, with a focus on the world of academia but with resonance far beyond.

This is the third year I'm making a list of awesome things other people did this year. Just like the lists for 2013 and 2014 I am doing this off the top of my head. I have avoided talking about stuff I worked on or that people here at Hopkins are doing because this post is supposed to be about other people's awesome stuff. I wrote this post because a blog often feels like a place to complain, but we started Simply Stats as a place to be pumped up about the stuff people were doing with data. This year's list is particularly "off the cuff" so I'd appreciate additions if you have 'em. I have surely missed awesome things people have done.

1. I hear the Tukey conference put on by my former advisor John S. was amazing. Out of it came this really good piece by David Donoho on 50 years of Data Science.

2. Sherri Rose wrote really accurate and readable guides on academic CVs, academic cover letters, and how to be an effective PhD researcher.


3. I am not 100% sold on the deep learning hype, but Michael Nielson wrote this awesome book on deep learning and neural networks. I like how approachable it is and how un-hypey it is. I also thought Andrej Karpathy's blog post on whether you have a good selfie or not was fun.

4. Thomas Lumley continues to be must read regardless of which blog he writes for with a ton of snarky fun posts debunking the latest ridiculous health headlines on statschat and more in depth posts like this one on pre-filtering multiple tests on notstatschat.

5. David Robinson is making a strong case for top data science blogger with his series of awesome posts on empirical Bayes.

6. Hadley Wickham doing Hadley Wickham things again. readr is the biggie for me this year.

7. I've been really enjoying the solid coverage of science/statistics from the (not entirely statistics focused as the name would suggest) @statnews.

8. Ben Goldacre and co. launched OpenTrials for aggregating all the clinical trial data in the world in an open repository.

9. Christie Aschwanden's piece on why Science Isn't Broken is a must read and one of the least polemic treatments of the reproducibility/replicability issue I've read. The p-hacking graphic is just icing on the cake.

10. I'm excited about the new R Consortium and the idea of having more organizations that support folks in the R community.

11. Emma Pierson's blog and writeups in various national level news outlets continue to impress. I thought this one on changing the incentives for sexual assault surveys was particularly interesting/good.

12. Amanda Cox an co. created this interactive graphic, which is an amazing way to teach people about pre-conceived biases in the way we think about relationships and correlations. I love the crowd-sourcing view on data analysis this suggests.

12 Sticky Notes

13. As usual Philip Guo was producing gold over on his blog. I appreciate this piece on twelve tips for data driven research.

14. I am really excited about the new field of adaptive data analysis. Basically understanding how we can let people be "real data analysts" and still get reasonable estimates at the end of the day. This paper from Cynthia Dwork and co was one of the initial salvos that came out this year.

15. Datacamp incorporated Python into their platform. The idea of interactive education for R/Python/Data Science is a very cool one and has tons of potential.

16. I was really into the idea of Cross-Study validation that got proposed this year. With the growth of public data in a lot of areas we can really start to get a feel for generalizability.

17. The Open Science Foundation did this incredible replication of 100 different studies in psychology with attention to detail and care that deserves a ton of attention.

18. Florian's piece "You are not working for me; I am working with you." should be required reading for all students/postdocs/mentors in academia. This is something I still hadn't fully figured out until I read Florian's piece.

19. I think Karl Broman's post on why reproducibility is hard is a great introduction to the real issues in making data analyses reproducible.

20. This was the year of the f1000 post-publication review paper. I thought this one from Yoav and the ensuing fallout was fascinating.

21. I love pretty much everything out of Di Cook/Heike Hoffman's groups. This year I liked the paper on visual statistical inference in high-dimensional low sample size settings.

22. This is pretty recent, but Nathan Yau's day in the life graphic is mesmerizing.

This was a year where open source data people described their pain from people being demanding/mean to them for their contributions. As the year closes I just want to give a big thank you to everyone who did awesome stuff I used this year and have completely ungraciously failed to acknowledge.

Original. Reposted with permission.

Bio: Jeff Leek is a professor at Johns Hopkins, where he does statistical research, writes data analysis software, curates and creates data sets, writes a blog about statistics, and works with amazing students who go do awesome things.