Everything You Need to Know About AutoML and Neural Architecture Search
So how does it work? How do you use it? What options do you have to harness that power today? Here’s everything you need to know about AutoML and NAS.
By George Seif, AI / Machine Learning Engineer
AutoML and Neural Architecture Search (NAS) are the new kings of the deep learning castle. They’re the quick and dirty way of getting great accuracy for your machine learning task without much work. Simple and effective; it’s what we want AI to be all about!
So how does it work? How do you use it? What options do you have to harness that power today?
Here’s everything you need to know about AutoML and NAS.
Neural Architecture Search (NAS)
Developing neural network models often requires significant architecture engineering. You can sometimes get by with transfer learning, but if you really want the best possible performance it’s usually best to design your own network. This requires specialised skills (read: expensive from a business standpoint) and is challenging in general; we may not even know the limits of the current state-of-the-art techniques! It’s a lot of trial and error and the experimentation itself is time consuming and expensive.
This is where NAS comes in. NAS is an algorithm that searches for the best neural network architecture. Most of the algorithms work in this following way. Start off by defining a set of “building blocks” that can possibly be used for our network. For example, the state-of-the-art NASNet paper proposes these commonly used blocks for an image recognition network:
NASNet blocks for image recognition network
In the NAS algorithm, a controller Recurrent Neural Network (RNN) samples these building blocks, putting them together to create some kind of end-to-end architecture. This architecture generally embodies the same style as state-of-the-art networks, such as ResNets or DenseNets, but uses a much different combination and configuration of the blocks.
This new network architecture is then trained to convergence to obtain some accuracy on a held-out validation set. The resulting accuracies are used to update the controller so that the controller will generate better architectures over time, perhaps by selecting better blocks or making better connections. The controller weights are updated with policy gradient. The whole end-to-end setup is shown below.
The NAS algorithm
It’s a fairly intuitive approach! In simple terms: have an algorithm grab different blocks and put those blocks together to make a network. Train and test out that network. Based on your results, adjust the blocks you used to make the network and how you put them together!
Part of the reason this algorithm succeeds and the paper demonstrates such great results is because of the constraints and assumptions made with it. The NAS discovered architecture is trained and tested on a much-smaller-than-real-world dataset. This is done because training on something large, like ImageNet would take a very long time. But, the idea is that a network that performs better on a smaller, yet similarly structured dataset should also perform better on a larger and more complex one, which has generally been true in the deep learning era.
Second, is that the search space itself is quite limited. NAS is designed to build architectures that are very similar in style to the current state-of-the-art. For image recognition, this is to have a set of repeated blocks in the network while progressively downsampling, as shown on the left below. The set of blocks to choose from to build the repeating ones are also quite commonly used in current research. The main novel part of the NAS discovered networks is how the blocks are connected together. Check out the best discovered blocks and structure for the ImageNet network on the right below.
Advances in Architecture Search
The NASNet paper was fantastically progressive in that it provided a new direction of deep learning research. Unfortunately, it was quite inefficient and inaccessible to the common user outside of Google. Using 450 GPUs it took 3–4 days of training to find that great architecture. Much of the latest research in NAS has thus focused on make this process more efficient.
Progressive Neural Architecture Search (PNAS) proposes to use what is called a sequential model-based optimisation (SMBO) strategy, rather than the reinforcement learning used in NASNet. With SMBO, instead of randomly grabbing and trying out blocks from out set, we test out blocks and search for structures in order of increasing complexity. This doesn’t shrink the search space, but it does make it such that the search is done in a smarter way. SMBO is basically saying: instead of trying everything all at once, let’s start off simple and only get complex if we need to. This method of PNAS is 5–8 times more efficient (and thus much less expensive) than the original NAS.
Efficient Neural Architecture Search (ENAS) is another shot at trying to make the general architecture search more efficient, this time usable by the average practitioner with a GPU. The authors’ hypothesis was that the computational bottleneck of NAS is the training of each model to convergence, only to measure its test accuracy and then throw away all the trained weights.
It’s been repeatedly shown in research and practice that transfer learning helps to achieve high accuracy in a short period of time since networks trained for somewhat similar tasks discover similar weights; transfer learning is basically just transfer of network weights. The ENAS algorithm forces all models to share weights instead of training from scratch to convergence. Any blocks that we’ve tried before in a previous model will use those previously learned weights. Thus, we’re essentially doing a transfer learning each time we train a new model, converging much faster!
The table from the paper shows just how much more efficient ENAS is, half a day of training with a single 1080Ti GPU.
Performance and efficiency of ENAS
A new way of doing Deep Learning: AutoML
Many people are calling AutoML the new way of doing deep learning, a change in the entire system. Instead of designing complex deep networks, we’ll just run a preset NAS algorithm. Google recently took this to the extreme by offering Cloud AutoML. Just upload your data and Google’s NAS algorithm will find you an architecture, quick and easy!
This idea of AutoML is to simply abstract away all of the complex parts of deep learning. All you need is data. Just let AutoML do the hard part of network design! Deep learning then becomes quite literally a plugin tool like any other. Grab some data and automatically create a decision function powered by a complex neural network.
Google Cloud’s AutoML pipeline
Cloud AutoML does have a steep price of $20 USD and unfortunately you can’t export your model once it’s trained; you’ll have to use their API to run your network on the cloud. There are a few other alternatives that are completely free, but do require a tad bit more work.
AutoKeras is a GitHub project that uses the ENAS algorithm. It can be installed using pip. Since it’s written in Keras it’s quite easy to control and play with, so you can even dive into the ENAS algorithm and try making some modifications. If you prefer TensorFlow or Pytorch, there’s also public code projects for those here and here!
Overall there’s several options to use AutoML today. It just depends on whether you’ll be playing around with the algorithm you want and how much you’re willing to pay to get some more code abstracted away.
A future prediction for NAS and AutoML
It’s great to see the big strides that have been made over the past few years in automating deep learning. It makes it more accessible to users and business; the power of deep learning becomes more accessible to the public in general. But, there’s always some room to improve.
Architecture search has become far more efficient; finding a network with a single GPU in a single day of training as with ENAS is pretty amazing. However, our search space is still really quite limited. The current NAS algorithms still use the structures and building blocks that were hand designed, they just put them together differently!
A strong and potentially ground breaking future direction would be a far wider ranging search, to really look for novel architectures. Such algorithms may reveal even more hidden deep learning secrets within these huge and complex networks. Of course, such a search space requires efficient algorithm design.
This new direction of NAS and AutoML provides exciting challenges for the AI community, and really a chance for another breakthrough in the science.
Like to read about tech?
Follow me on twitter where I post all about the latest and greatest tech!
Bio: George Seif is a Certified Nerd and AI / Machine Learning Engineer.
Original. Reposted with permission.
Related:
- Auto-Keras, or How You can Create a Deep Learning Model in 4 Lines of Code
- Why Automated Feature Engineering Will Change the Way You Do Machine Learning
- AutoKeras: The Killer of Google’s AutoML