Scikit-learn and Python Stack Tutorials: Introduction, Implementing Classifiers

A small collection of introductory scikit-learn and Python stack tutorials for those with an existing understanding of machine learning looking to jump right into using a new set of tools.



Are you a newcomer to machine learning or to scikit-learn, the de facto official general purpose machine learning library in use in the Python ecosystem? While the following tutorials will not be of much use to seasoned machine learning practitioners experienced in scikit-learn, they will undoubtedly by a solid introduction to individuals with an understanding of machine learning, either theory or practice in a different environment, looking for a quick overview of the basics in scikit-learn and the Python ecosystem.

These tutorials cover only some of the basics of scikit-learn, but they do get to fitting models in a no-nonsense straight-away manner. There are also (obviously) numerous other tutorials on the subject, including many others that have been highlighted on KDnuggets, but different perspectives are never a bad thing.

The material is contained in iPython Notebooks and shared via Github.

scikit-learn Map

1. Abridged scikit-learn Official Beginner Tutorials

This is an abridged, to-the-point, implementation of the official scikit-learn tutorials. If you are familiar with both Python and machine learning, this may be a quicker way to get through the material, as it lacks the verbose explanations quite often pertaining to introductory machine learning concepts. There are a few minor changes to the original material (I believe), but it follows the original quite faithfully.

As such, all credit goes to the creator(s) of the original material. I make no claim of ownership or innovation; my role was simply peeling away what I felt to be unnecessary information for those beginners to scikit-learn who are not, otherwise, beginners to machine learning.

Classifiers

2. Introduction to Implementing scikit-learn Classifiers

This tutorial is meant to serve as a demonstration of implementing several machine learning classifiers. It draws inspiration from other excellent related works, such as Randal Olson's An Example Machine Learning Notebook and this Common Machine Learning Algorithms Cheat Sheet. It also draws from The Official scikit-learn Documentation and some of the various referenced tutorials included in this KDnuggets Python Machine Learning Guide.

Using example datasets, models are built using a number of machine learning classification algorithms, including logistic regression, decision trees, and SVMs (among others). Separate testing and training datasets and k-fold Cross-validation are both used as model evaluation methods. Some simple data investigation methods are used as well.

Related: