Top 10 IPython Notebook Tutorials for Data Science and Machine Learning
A list of 10 useful Github repositories made up of IPython (Jupyter) notebooks, focused on teaching data science and machine learning. Python is the clear target here, but general principles are transferable.
This post is made up of a collection of 10 Github repositories consisting in part, or in whole, of IPython (Jupyter) Notebooks, focused on transferring data science and machine learning concepts. They go from introductory Python material to deep learning with TensorFlow and Theano, and hit a lot of stops in between.
Oh, they're all Pythonfocused. Jupyter now includes support for a wide range of languages, but this list is old school, and is straight IPython Notebook style material.
So here they are: 10 useful IPython Notebook Github repositories in no particular order:
Repository of teaching materials, code, and data for my data analysis and machine learning projects
This warmup notebook is from postdoctoral researcher Randal Olson, who uses the common Python ecosystem data analysis/machine learning/data science stack to work with the Iris dataset. It's a single notebook, but it's a good notebook to start with, as it whets your appetite for all tools analytic, including visualization. It also gets you focused on telling stories with data.
The "Python Machine Learning" book code repository and info resource
This is the accompanying code for the fantastic book Machine Learning with Python, by Sebastian Raschka. I don't vouch for many materials, but I highly recommend this book. The repository is also fantastic, and a great resource unto itself. However, you would be welladvised to splurge and grab a copy of your own to fully understand the contents of the repo, and to fully embrace machine learning in the Python ecosystem.
Open Content for selfdirected learning in data science
This is a collection of notebooks and datasets, primarily put together by Nitin Borwankar, covering 4 algorithmic topics: Linear Regression, Logistic Regression, Random Forests, and kMeans Clustering. These are seemingly nonnonsense tutorials, though likely useful mostly for the newcomer.
Materials for my scikitlearn tutorial
This repository, by Jake VanderPlas, is aimed at teaching Scikitlearn in the context of several different machine learning algorithms. Validation, Density Estimation with Gaussian Mixture Models, and Dimensionality Reduction with PCA are a few of the more interesting topics covered; you also get standards like kMeans, Regression, and Classification, don't worry. The material is likely bestsuited to a beginner in machine learning, or someone with some understanding looking to master Scikitlearn.
Python coded examples and documentation of machine learning algorithms
Aaron Masino has shared a series of very detailed, very technical machine learning IPython Notebook learning resources. The notebooks of this simplytitled repository draw inspiration from Andrew Ng's Machine Learning course (Stanford, Coursera), Tom Mitchell's course (Carnegie Mellon), and Christopher M. Bishop's "Pattern Recognition And Machine Learning."
Slides, code, and other information relating to the Fall 2013 Meetups
From UC Boulder's Research Computing group, this older collection of notebooks (it's from way back in Fall 2013) covers a wide range of material, with an apparent focus on Linux command linepowered data management. A number of the common libraries are covered, as well as shell programming and Linux command line basics, and at least one thencurrent paper was implemented using the Python stack. It also seems to tackle a few Kaggle competitions, so you get a little bit of a lot with this one.
A collection of tutorials on neural networks, using Theano
PhD student Colin Raffel authored this collection of deep learning tutorials using Theano. It contains 2 notebooks: a general Theano neural networks tutorial, and an overview of backpropagation. It's a good introductory resource for getting started with deep learning and Theano.
A collection of tutorials in ipynb format that illustrate how to do various things in Theano
This is a good followup to Colin Raffel's introductory Theano notebooks. James Bergstra takes us deeper into neural network architecture with this, covering a wider range of Theano exercises. It includes some introductory Python material, as well as more advanced topics like Autoencoders. It also links to a number of related materials.
A collection of IPython notebooks covering various topics
This is an eclectic mix, put together by John Wittenauer, with notebooks for Python implementation of Ng's Coursera course exercises, Udacity's TensorFloworiented deep learning course exercises, and the Spark edX course exercises. Machine learning, deep learning, and big data processing frameworks: it doesn't get any more "data science" than this, folks.
An Introduction to Statistical Learning (James, Witten, Hastie, Tibshirani, 2013): Python code
This is a great project undertaken by Jordi Warmenhoven to implement the concepts from the book An Introduction to Statistical Learning with Applications in R by James, Witten, Hastie, Tibshirani (2013) in Python (the book has practical exercises in R, as you may have guessed). The book is freely available in as a PDF, which makes this repo even more attractive to those looking to learn.
Related:
Top Stories Past 30 Days

