An Introduction to Scikit Learn: The Gold Standard of Python Machine Learning
If you’re going to do Machine Learning in Python, Scikit Learn is the gold standard. Scikitlearn provides a wide selection of supervised and unsupervised learning algorithms. Best of all, it’s by far the easiest and cleanest ML library.
A comparison Scikit Learn’s many Machine Learning models
Machine Learning Gold
If you’re going to do Machine Learning in Python, Scikit Learn is the gold standard. Scikitlearn provides a wide selection of supervised and unsupervised learning algorithms. Best of all, it’s by far the easiest and cleanest ML library.
Scikit learn was created with a software engineering mindset. It’s core API design revolves around being easy to use, yet powerful, and still maintaining flexibility for research endeavours. This robustness makes it perfect for use in any endtoend ML project, from the research phase right down to production deployments.
What Scikit Learn has to Offer
Scikit Learn is built on top of several common data and math Python libraries. Such a design makes it super easy to integrate between them all. You can pass numpy arrays and pandas data frames directly to the ML algoirthms of Scikit! It uses the following libraries:
 NumPy: For any work with matrices, especially math operations
 SciPy: Scientific and technical computing
 Matplotlib: Data visualisation
 IPython: Interactive console for Python
 Sympy: Symbolic mathematics
 Pandas: Data handling, manipulation, and analysis
Scikit Learn is focused on Machine Learning, e.g data modelling. It is not concerned with the loading, handling, manipulating, and visualising of data. Thus, it is natural and common practice to use the above libraries, especially NumPy, for those extra steps; they are made for each other!
Scikit’s robust set of algorithm offerings includes:
 Regression: Fitting linear and nonlinear models
 Clustering: Unsupervised classification
 Decision Trees: Tree induction and pruning for both classification and regression tasks
 Neural Networks: Endtoend training for both classification and regression. Layers can be easily defined in a tuple
 SVMs: for learning decision boundaries
 Naive Bayes: Direct probabilistic modelling
Even beyond that, it has some very convenient and advanced functions not commonly offered by other libraries:
 Ensemble Methods: Boosting, Bagging, Random Forest, Model voting and averaging
 Feature Manipulation: Dimensionality reduction, feature selection, feature analysis
 Outlier Detection: For detecting outliers and rejecting noise
 Model selection and validation: Crossvalidation, Hyperparamter tuning, and metrics
A Taste Test
To give you a taste of just how easy it is to train and test an ML model using Scikit Learn, here’s an example of how to do just that for a Decision Tree Classifier!
Decision trees for both classification and regression are super easy to use in Scikit Learn with a built in class. We’ll first load in our dataset which actually comes built into the library. Then we’ll initialise our decision tree for classification, also a built in class. Running training is then a simple oneliner! The .fit(X, Y)
function trains the model where X is the numpy array of inputs and Y is the corresponding numpy array of outputs
Scikit Learn also allows us to visualise our tree using the graphviz library. It comes with a few options that will help in visualising the decision nodes and splits that the model learned which is super useful for understanding how it all works. Below we will colour the nodes based on the feature names and display the class and feature information of each node.
Beyond that, Scikit Learn’s documentation is exquisite! Each of the algorithm parameters are explained clearly and are intuitively named. Moreover, they also offer tutorials with example code on how to train and apply the model, its pros and cons, and practical application tips!
Like to learn?
Follow me on twitter where I post all about the latest and greatest AI, Technology, and Science!
Bio: George Seif is a Certified Nerd and AI / Machine Learning Engineer.
Original. Reposted with permission.
Related:
 5 Quick and Easy Data Visualizations in Python with Code
 The 5 Clustering Algorithms Data Scientists Need to Know
 Selecting the Best Machine Learning Algorithm for Your Regression Problem
Top Stories Past 30 Days

