Design by Evolution: How to evolve your neural network with AutoML
The gist ( tl;dr): Time to evolve! I’m gonna give a basic example (in PyTorch) of using evolutionary algorithms to tune the hyper-parameters of a DNN.
Running the code
Putting all of the above together and to run the experiment.
Let’s visualize some of the results!
This is the scoring of a population 50, with tournament size of 3. The models are trained only for 10000 examples and then evaluated.
At first glance it seems that the evolution is not doing much - solutions are near top performance from the first evolution. The maximum performance is reached in epoch 7. In the following figure we use a box plot. The box describes the quartiles of the population. We notice that most individuals perform well, but also that the boxes get tighter as the population evolves.
Left: distribution of solutons. Right: boxplot of the solutions per epoch.The box shows the quartiles of the solutions while the whiskers extend to show the rest of the distribution. Black dot is the mean value of solutions, we can notice an increasing trend.
A different evolution run.
To have a better understanding of the method’s performance it is good to compare it against a completely randomized population search. No evolution is performed between each epoch and the individuals are reset to a random state.
Left: distribution of solutons. Right: boxplot of the solutions per step of random generations.
The evolution performs better by a small margin(93.66% vs 93.22%), and while the random population search seems to produce some good solutions, the variance of the models is greatly increased. This means that resources are wasted on searching for suboptimal architectures. Comparing this with the evolution figures we see that the evolution clearly generates more solutions that are competent. It manages to evolve structures that consistently achieve higher performance.
A few things wroth mentioning:
- MNIST is a fairly easy dataset - even 1 layer networks can achieve high accuracies.
- Optimizers like ADAM are less sensitive to the learning rate; they find a good solution if the network has enough parameters.
- During training the model sees only 10k examples (1/5 of the training). Good architectures might achieve even higher accuracies if we choose to train them for longer.
- Limiting the number of examples also plays an important role for the number of layers we can learn - deeper model requires more examples. To counter this we also add a layer removal mutation to allow the population to regulate the number of layers.
The size of the experiment is not ideal to demonstrate the strength of such methods. Take a look at the related work following which points to papers using much larger experiments on harder databases.
We just developed a simple evolutionary algorithm that implements a tournament selection theme. Our algorithm only considers the winning solutions and mutates them to create offsprings. The next step is to implement more advanced methods for generating and evolving the population. Some suggested improvements :
- Reuse the weights of the parent for the common layers
- Merge layers from 2 potential parents
- The architectures don’t have to be sequential, you can explore more connection type between layers. (splits/merges/etc)
- Add extra layers on top and do fine tuning
All of the above have been a subject of interest for the AI research field. One of the popular methods is NEAT (Neuroevolution of augmenting topologies) and its extensions. EAT variations use evolutionary algorithms not only to create the network but also to set the weights of it. Evolving the agent’s weights can be successful on a typical sparse reward-RL scenario. On the other hand when (x, y) input pairs are available, gradient descent methods perform better.
This is just an introductory post to draw some attention into this very interesting approach of machine learning model exploration. I’ll link some papers of interest that describe what is state of the art in this domain.
*In no particular order
- Evolino: Hybrid Neuroevolution / Optimal Linear Search for Sequence Learning http://people.idsia.ch/~daan/papers/gomez-ijcai05.pdf
- http://www.cs.ucf.edu/~kstanley/neat.html NEAT
- Evolving Deep Neural Networks (paper) — This is a very interesting approach of co-evolving whole networks and blocks within the network, it’s very similar to the Evolino method but for CNNs.
- Large-Scale Evolution of Image Classifiers (paper)
- Convolution by Evolution (paper)
You will need PyTorch to run the project.
Clone the project repository here: https://github.com/offbit/evo-design
Bio: Stathis Vafeias (@techabilly) holds a PhD in Robotics from Edinburgh University. Currently he leads the machine learning at AimBrain, where he works on deep learning models for mobile biometric authentication. Before joining AimBrain, he was a research engineer at Toshiba Medical Visualization Systems.
Original. Reposted with permission.