Top 10 IPython Notebook Tutorials for Data Science and Machine Learning

A list of 10 useful Github repositories made up of IPython (Jupyter) notebooks, focused on teaching data science and machine learning. Python is the clear target here, but general principles are transferable.



This post is made up of a collection of 10 Github repositories consisting in part, or in whole, of IPython (Jupyter) Notebooks, focused on transferring data science and machine learning concepts. They go from introductory Python material to deep learning with TensorFlow and Theano, and hit a lot of stops in between.

Oh, they're all Python-focused. Jupyter now includes support for a wide range of languages, but this list is old school, and is straight IPython Notebook style material.

So here they are: 10 useful IPython Notebook Github repositories in no particular order:

IPython Notebooks

Example Data Science Notebook

Repository of teaching materials, code, and data for my data analysis and machine learning projects

This warmup notebook is from postdoctoral researcher Randal Olson, who uses the common Python ecosystem data analysis/machine learning/data science stack to work with the Iris dataset. It's a single notebook, but it's a good notebook to start with, as it whets your appetite for all tools analytic, including visualization. It also gets you focused on telling stories with data.

Python Machine Learning Book

The "Python Machine Learning" book code repository and info resource

This is the accompanying code for the fantastic book Machine Learning with Python, by Sebastian Raschka. I don't vouch for many materials, but I highly recommend this book. The repository is also fantastic, and a great resource unto itself. However, you would be well-advised to splurge and grab a copy of your own to fully understand the contents of the repo, and to fully embrace machine learning in the Python ecosystem.

Learn Data Science

Open Content for self-directed learning in data science

This is a collection of notebooks and datasets, primarily put together by Nitin Borwankar, covering 4 algorithmic topics: Linear Regression, Logistic Regression, Random Forests, and k-Means Clustering. These are seemingly non-nonsense tutorials, though likely useful mostly for the newcomer.

Scikit-learn Tutorial

Materials for my scikit-learn tutorial

This repository, by Jake VanderPlas, is aimed at teaching Scikit-learn in the context of several different machine learning algorithms. Validation, Density Estimation with Gaussian Mixture Models, and Dimensionality Reduction with PCA are a few of the more interesting topics covered; you also get standards like k-Means, Regression, and Classification, don't worry. The material is likely best-suited to a beginner in machine learning, or someone with some understanding looking to master Scikit-learn.

Machine Learning

Python coded examples and documentation of machine learning algorithms

Aaron Masino has shared a series of very detailed, very technical machine learning IPython Notebook learning resources. The notebooks of this simply-titled repository draw inspiration from Andrew Ng's Machine Learning course (Stanford, Coursera), Tom Mitchell's course (Carnegie Mellon), and Christopher M. Bishop's "Pattern Recognition And Machine Learning."

Research Computing Meetup

Slides, code, and other information relating to the Fall 2013 Meetups

From UC Boulder's Research Computing group, this older collection of notebooks (it's from way back in Fall 2013) covers a wide range of material, with an apparent focus on Linux command line-powered data management. A number of the common libraries are covered, as well as shell programming and Linux command line basics, and at least one then-current paper was implemented using the Python stack. It also seems to tackle a few Kaggle competitions, so you get a little bit of a lot with this one.

Github social coding

Theano Tutorial

A collection of tutorials on neural networks, using Theano

PhD student Colin Raffel authored this collection of deep learning tutorials using Theano. It contains 2 notebooks: a general Theano neural networks tutorial, and an overview of backpropagation. It's a good introductory resource for getting started with deep learning and Theano.

IPython Theano Tutorials

A collection of tutorials in ipynb format that illustrate how to do various things in Theano

This is a good followup to Colin Raffel's introductory Theano notebooks. James Bergstra takes us deeper into neural network architecture with this, covering a wider range of Theano exercises. It includes some introductory Python material, as well as more advanced topics like Autoencoders. It also links to a number of related materials.

IPython Notebooks

A collection of IPython notebooks covering various topics

This is an eclectic mix, put together by John Wittenauer, with notebooks for Python implementation of Ng's Coursera course exercises, Udacity's TensorFlow-oriented deep learning course exercises, and the Spark edX course exercises. Machine learning, deep learning, and big data processing frameworks: it doesn't get any more "data science" than this, folks.

ISLR Python

An Introduction to Statistical Learning (James, Witten, Hastie, Tibshirani, 2013): Python code

This is a great project undertaken by Jordi Warmenhoven to implement the concepts from the book An Introduction to Statistical Learning with Applications in R by James, Witten, Hastie, Tibshirani (2013) in Python (the book has practical exercises in R, as you may have guessed). The book is freely available in as a PDF, which makes this repo even more attractive to those looking to learn.

Related: