A Non-comprehensive List of Awesome Things Other People Did in 2016

A top statistics professor and statistical researcher reflects on a number of awesome accomplishments by individuals in, and related to, the fields of statistics and data science, with a focus on the world of academia but with resonance far beyond.

Editor’s note: For the last few years I have made a list of awesome things that other people did (2015, 2014, 2013). Like in previous years I’m making a list, again right off the top of my head. If you know of some, you should make your own list or add it to the comments! I have also avoided talking about stuff I worked on or that people here at Hopkins are doing because this post is supposed to be about other people’s awesome stuff. I write this post because a blog often feels like a place to complain, but we started Simply Stats as a place to be pumped up about the stuff people were doing with data.

    Data Imaginist

  • Thomas Lin Pedersen created the tweenr package for interpolating graphs in animations. Check out this awesome logo he made with it.
  • Yihui Xie is still blowing away everything he does. First it was bookdown and then the yolo feature in xaringan package.
  • J Alammar built this great visual introduction to neural networks
  • Jenny Bryan is working literal world wonders with legos to teach functional programming. I loved her Data Rectangling talk. The analogy between exponential families and data frames is so so good.
  • Hadley Wickham’s book on R for data science is everything you’d expect. Super clear, great examples, just a really nice book.
  • David Robinson is a machine put on this earth to create awesome data science stuff. Here is analyzing Trump’s tweets and here he is on empirical Bayes modeling explained with baseball.
  • Julia Silge and David created the tidytext package. This is a holy moly big contribution to NLP in R. They also have a killer book on tidy text mining.
  • Julia used the package to do this fascinating post on mining Reddit after the election.
  • It would be hard to pick just five different major contributions from JJ Allaire (great interview here), Joe Cheng, and the rest of the Rstudio folks. Rstudio is absolutely churning out awesome stuff at a rate that is hard to keep up with. I loved R notebooks and have used them extensively for teaching.
  • Konrad Kording and Brett Mensh full on mike dropped on how to write a paper with their 10 simple rules piece Figure 1 from that paper should be affixed to the office of every student/faculty in the world permanently.
  • Yaniv Erlich just can’t stop himself from doing interesting things like seeq.io and dna.land.
  • Thomaz Berisa and Joe Pickrell set up a freaking Python API for genomics projects.
  • DataCamp continues to do great things. I love their DataChats series and they have been rolling out tons of new courses.
  • Sean Rife and Michele Nuijten created statcheck.io for checking papers for p-value calculation errors. This was all over the press, but I just like the site as a dummy proofing for myself.
  • This was the artificial intelligence tweet of the year
  • I loved seeing PLoS Genetics start a policy of looking for papers in biorxiv.
  • Matthew Stephens post on his preprint getting pre-accepted and reproducibility is also awesome. Preprints are so hot right now!
  • Lorena Barba made this amazing reproducibility syllabus then won the Leamer-Rosenthal prize in open science.
  • Colin Dewey continues to do just stellar stellar work, this time on re-annotating genomics samples. This is one of the key open problems in genomics.
  • I love FlowingData sooooo much. Here is one on the changing American diet.
  • If you like computational biology and data science and like super detailed reports of meetings/talks you MIchael Hoffman is your man. How he actually summarizes that much information in real time is still beyond me.
  • I really really wish I had been at Alyssa Frazee’s talk at startup.ml but loved this review of it. Sampling, inverse probability weighting? Love that stats flavor!
  • I have followed Cathy O’Neil for a long time in her persona as mathbabedotorg so it is no surprise to me that her new book Weapons of Math Destruction is so good. One of the best works on the ethics of data out there.
  • A related and very important piece is on Machine bias in sentencing by Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner at ProPublica.
  • Dimitris Rizopolous created this stellar integrated Shiny app for his repeated measures class. I wish I could build things half this nice.
  • Daniel Engber’s piece on Who will debunk the debunkers? at fivethirtyeight just keeps getting more relevant.
  • I rarely am willing to watch a talk posted on the internet, but Amelia McNamara’s talk on seeing nothing was an exception. Plus she talks so fast #jealous.
  • Sherri Rose’s post on economic diversity in the academy focuses on statistics but should be required reading for anyone thinking about diversity. Everything about it is impressive.
  • If you like your data science with a side of Python you should definitely be checking out Jake Vanderplas’s data science handbook and the associated Jupyter notebooks.
  • I love Thomas Lumley being snarky about the stats news. Its a guilty pleasure. If he ever collected them into a book I’d buy it (hint Thomas :)).
  • Dorothy Bishop’s blog is one of the ones I read super regularly. Her post on When is a replication a replication is just one example of her very clearly explaining a complicated topic in a sensible way. I find that so hard to do and she does it so well.
  • Ben Goldacre’s crowd is doing a bunch of interesting things. I really like their OpenPrescribing project.
  • I’m really excited to see what Elizabeth Rhodes does with the experimental design for the Ycombinator Basic Income Experiment.
  • Lucy D’Agostino McGowan made this amazing explanation of Hill’s criterion using xckd.
  • It is hard to overstate how good Leslie McClure’s blog is. This post on biostatistics is public health should be read aloud at every SPH in the US.
  • The ASA’s statement on p-values is a really nice summary of all the issues around a surprisngly controversial topic. Ron Wasserstein and Nicole Lazar did a great job putting it together.
  • I really liked this piece on the relationship between income and life expectancy by Raj Chetty and company.
  • Christie Aschwanden continues to be the voice of reason on the statistical crises in science.

That’s all I have for now, I know I’m missing things. Maybe my New Year’s resolution will be to keep better track of the awesome things other people are doing :).

Original. Reposted with permission.

Bio: Jeff Leek is a professor at Johns Hopkins, where he does statistical research, writes data analysis software, curates and creates data sets, writes a blog about statistics, and works with amazing students who go do awesome things.