By Sam Finlayson, MD-PhD Student at Harvard-MIT. Original. Reposted with permission. The following is a snapshot of the original that will be updated over time
This is a not-particularly-systematic attempt to curate a handful of my favorite resources for learning statistics and machine learning. This isn’t meant to be comprehensive, and in fact is still missing the vast majority of my favorite explainers. Rather, it’s just a smattering of resources I’ve found myself turning to multiple times and thus would like to have in one place. The organization is as follows:
Open Courses and Textbooks: Cover a fairly broadtopic reasonably comprehensively, and would take weeks to monthsto work through start-to-finish.
Of the above, the second section is both the most incomplete and the one that I am most excited about. I hope to use it to capture the best explanations of tricky topics that I have read online, to make it easier to re-learn them later when I inevitably forget. (In a perfect world, Chris Olah and/or distill.pub would just write an article on everything, but in the meantime I have to gather scraps from everywhere else.)
If you stumble upon this list and have suggestions for me to add (especially for the middle section!), please feel free to reach out! But I’m only trying to post things on here that I’ve read, so it may be caught in my to-read list for a while before it makes it on here. Of course, the source for this webpage is on github, so you can also just take it.
Open Courses and Textbooks
I’m trying to limit to this list to things that are legally accessible online, for free.
John Tsitsiklis et al have put together some great resources. Their classic MIT intro to probability has been archived on OCW and also offered on edX (Part 1, Part 2). The textbook is also excellent.
Joe Blitzstein’s undergrad probability course has a high overlap in content with 6.041. Like 6.041, it also has a great textbook, youtube videos, and an edx offering. It’s a bit more playful, as well.
This guy is amazing. Some 250 YouTube tutorials on ML, Probability, and Information Theory. What’s great about these playlists is any individual video could go into section 2!
Tim Roughgarden is one of most natural teachers I’ve ever seen, and fortunately for the world, he’s decided to make a lot of his algorithms resources public. The first link is to lecture notes in PDF form from many classes – for the data-oriented, his CS 168 course is accessible and amazing. Videos for his Algorithms 2 class (CS 261) are here (pdf notes are in that first link). The second is a link to his page for his new textbook, but that page also has links out to all the YouTube videos from his Coursera version of CS 161 (Algorithms 1).
This is an online visual textbook that has a bunch of cool interactive displays for intro probability/stats ideas. My favorite is the inference visualizations.
This appears to be a pretty fantastic (albeit rather elementary) textbook for a one-quarter intro to statistics class (stat 60 at stanford). Despite assuming little, it touches upon a lot of great topics.
This online textbook is from Susan Holmes and Wolfgang Huber, and provides a nice and accessible intro to the parts of modern data science relevant to computational biologists. It also happens to be a piece of typographic art, created with bookdown.
Beginner (ISL) and Advanced (ESL) presentation to classic machine learning from world-class stats professors. Slides and video for a MOOC on ISL is available here.
Notes from Roger Grosse’s CSC 231 full website here. Probably the single best intro to DL course I’ve found from any university. Notes and slides are gorgeous.
Wonderful set of intro lectures + notebooks from Jeremy Howard and Rachel Thomas. In addition, Hiromi Suenaga has released excellent and self-contained notes of the whole series with timestamp links back to videos: FastAI DL Part 1, FastAI DL Part 2, and FastAI ML.
Famous and freely available textbook from Boyd and Vandenberghe, accompanied by slides and YouTube videos. More advanced follow-up class here
NYU Optimization-based Data Analysis 2016 and 2017
Fantastic course notes on Optimization-based data analysis from NYU 2016 website and 2017 website.
Tutorials, Overviews, and (Individual) Lecture Notes
This section is fledgling at best, but was my real motivation in making this page. Archetypes include basically anything on distill.pub, good blog or medium posts, etc. Depth-first learning looks like a great access point here, but I haven’t gotten to do more than skim any of those, yet.
The Madry lab is one of the top research groups in robust deep learning research. They put together a fantastic intro to these topics on their blog. I hope they keep making posts…
Harvard’s Sasha Rush created a line-by-line annotation of “Attention is All You Need” that also serves as a working notebook. Pedagogical brilliance, and it would be awesome to do this for a couple papers per year.
Andrej Karpathy has a real gift for didactics. This is a self-contained explanation of deep reinforcement learning sufficient to understand a basic Atari agent.
It feels slimy and self-serving to include this, but I wrote this post to better understand how information theory can be used to understand/derive common probability distributions from first principles.
Sebastian has produced a lot of really great explanations, like the one on gradient descent methods I linked to above. He also maintains a website tracking progress on NLP benchmarks