Automated Machine Learning in Python

An organization can also reduce the cost of hiring many experts by applying AutoML in their data pipeline. AutoML also reduces the amount of time it would take to develop and test a machine learning model.

By Derrick Mwiti, Data Scientist on January 18, 2019 in Automated Machine Learning, AutoML, H2O, Keras, Machine Learning, Python, scikit-learn

comments

As we already know, machine learning is a way of automating complex problem-solving. But can machine learning itself be automated? That’s what we’ll explore in this article. By its end, we’ll have answered that question and shown practical ways it can be accomplished.

Automated Machine Learning (AutoML)

When applying machine learning models, we’d usually do data pre-processing, feature engineering, feature extraction and, feature selection. After this, we’d select the best algorithm and tune our parameters in order to obtain the best results. AutoML is a series of concepts and techniques used to automate these processes.

Benefits of AutoML

Applying machine learning models to our problems usually requires computer science skills, domain expertise, and mathematical expertise. Getting an expert with all these skills is not always a walk in the park.

AutoML also reduces bias and errors that occur when a human being is designing the machine learning models. An organization can also reduce the cost of hiring many experts by applying AutoML in their data pipeline. AutoML also reduces the amount of time it would take to develop and test a machine learning model.

Drawbacks of AutoML

AutoML is a fairly new concept in the machine learning world. It is, therefore, important to exercise caution while applying some of the current AutoML solutions. This is because some of these technologies are still under development.

Another major challenge is the time it takes to run the AutoML models. This will really depend on the computational power of the machine we’re running. As we shall see soon, some of the AutoML solutions run well on our local machines, but some require an accelerated solution such as Google Colab.

AutoML Concepts

There are two major concepts to grasp as far as AutoML is concerned: neural architecture search and transfer learning.

Neural Architecture Search

Neural architecture search is the process of automating the design of neural networks. Usually, reinforcement learning or evolutionary algorithms are used in the design of these networks. In reinforcement learning, models are punished for low accuracies and rewarded for high accuracies. Using this technique the model will always strive to obtain higher accuracies.

Several neural architecture search papers have been published such as Learning Transferable Architectures for Scalable Image Recognition, Efficient Neural Architecture Search (ENAS), and Regularized Evolution for Image Classifier Architecture Search just to mention a few.

Transfer Learning

Transfer learning, as the name suggests, is a technique where one uses pre-trained models to transfer what its learned when applying the model to a new but similar dataset. This enables us to obtain high accuracies while using less computation time and power. Neural architecture search is good for problems that require the discovering of new architectures, while transfer learning works best for problems where the datasets are similar to the ones used in pre-training models.

AutoML solutions

Let’s now look at some of the available automated machine learning solutions.

Auto-Keras

According to the official site:

Auto-Keras is an open source software library for automated machine learning (AutoML). It is developed by DATA Lab at Texas A&M University and community contributors. The ultimate goal of AutoML is to provide easily accessible deep learning tools to domain experts with limited data science or machine learning background. Auto-Keras provides functions to automatically search for architecture and hyperparameters of deep learning models.

It can be installed using a simple pip command:

pip install auto-keras

Auto-Keras is still undergoing final testing before its final release. The official site warns that they will not be held liable for any loss incurred as a result of using the libraries on the site. This package is based on the Keras deep learning package.

Auto-Sklearn

Auto-Sklearn is an automated machine learning package based on Scikit-learn. It’s a replacement for the Scikit-learn estimator. Its installation is also via a simple pip command:

pip install auto-sklearn

In Ubuntu, a C++11 building environment and SWIG are required:

sudo apt-get install build-essential swig

Installation via Anaconda is as follows:

conda install gxx_linux-64 gcc_linux-64 swig

It isn’t possible to run Auto-Sklearn on Windows. However, one can try some hacks such as a docker image or running via a virtual machine.

The Tree-Based Pipeline Optimization Tool (TPOT)

According to its official site:

The goal of TPOT is to automate the building of ML pipelines by combining a flexible expression tree representation of pipelines with stochastic search algorithms such as genetic programming. TPOT makes use of the Python-based scikit-learn library as its ML menu.

This software is open source and is available on GitHub.

Google’s AutoML

Its official site states that:

Cloud AutoML is a suite of machine learning products that enables developers with limited machine learning expertise to train high-quality models specific to their business needs, by leveraging Google’s state-of-the-art transfer learning, and Neural Architecture Search technology.

Google’s AutoML solution is not open source. Its pricing can be viewed here.

H20

H2O is an open source distributed in-memory machine learning platform. It is available in both R and Python. This package provides support for statistical & machine learning algorithms.

Applying AutoML to Real-World Problems

Now let’s see how we’d use Auto-Keras and Auto-Sklearn to solve a real problem.

Auto-Keras Implementation

I would highly recommend running the following example on Google Colabunless we’re using a machine with high computational power. It’s also important to ensure that we’re using the GPU runtime on our Google Colab. The first step here is to install Auto-Keras on our Colab runtime.

!pip install autokeras

We’re going to run image classification on the MNIST dataset. The first step is to import that dataset and the image classifier. The dataset is imported from Keras, whereas the image classifier is imported from Auto-Keras. Since we are building a model that will recognize handwritten digits based on pre-trained models, we classify this as a supervised learning problem. We then test the accuracy of the model on images of digits that it hasn’t encountered.

In this example, the images and labels have already been formatted to numpy arrays, which is required in machine learning. The next step is to separate the data we just loaded into a training set and test set as shown below.

Once the data has been separated into a training set and a test set, the next step is to fit the image classifier.

Specifying Verbose as True means that the search process will be printed on the screen for us to see.
In the fit method, time_limit refers to the time limit for the search in seconds.
final_fit is the last training that occurs after the model has found the best architecture. Specifying retrain as true means that the weights of the model will be reinitialized.
Printing y will show us the accuracy after evaluating the model on the test set.

This is all we need to classify images using Auto-Keras. Very few lines of code, and Auto-Keras will do all the heavy lifting for us.

Auto-Sklearn Implementation

Implementation on Auto-Sklearn is very similar to the Auto-Keras implementation above. We’ll run a similar classification using the digits dataset. First, we need to get a few imports out of the way:

autosklearn.classification that we’ll use to load the classifier later
sklearn.model_selection for model selection
sklearn.datasets for loading in the dataset
import sklearn.metrics for measuring model performance

As usual, we load in the dataset and split it into a training and test set. We then import the AutoSklearnClassifier from autosklearn.classification. Once this is done we fit the classifier to our dataset, make predictions and check the accuracy. That’s all you need to do.

What’s Next?

Automated machine learning packages are still under active development. We expect to see more advancements in that arena in 2019. One can keep tabs on the progress of these packages by following the official documentation sites. One can also contribute to these packages by making a pull request on GitHub. More information and examples about Auto-Keras and Auto-Sklearncan be found on their official site.

Discuss this post on Hacker News and Reddit.

Bio: Derrick Mwiti is a data analyst, a writer, and a mentor. He is driven by delivering great results in every task, and is a mentor at Lapid Leaders Africa.

Original. Reposted with permission.

Related:

Automated Machine Learning in Python

Automated Machine Learning (AutoML)

AutoML Concepts

Applying AutoML to Real-World Problems

What’s Next?

More On This Topic

Latest Posts

Top Posts