Using Genetic Algorithm for Optimizing Recurrent Neural Networks
In this tutorial, we will see how to apply a Genetic Algorithm (GA) for finding an optimal window size and a number of units in Long Short-Term Memory (LSTM) based Recurrent Neural Network (RNN).
By Aaqib Saeed, University of Twente
Recently, there has been a lot of work on automating machine learning, from a selection of appropriate algorithm to feature selection and hyperparameters tuning. Several tools are available (e.g. AutoML and TPOT), that can aid the user in the process of performing hundreds of experiments efficiently. Likewise, the deep neural network architecture is usually designed by experts; through a trial and error approach. Although, this approach resulted in state-of-the-art models in several domains but is very time-consuming. Lately, due to increase in available computing power, researchers are employing Reinforcement Learning and Evolutionary Algorithms to automatically search for optimal neural architectures.
In this tutorial, we will see how to apply a Genetic Algorithm (GA) for finding an optimal window size and a number of units in Long Short-Term Memory (LSTM) based Recurrent Neural Network (RNN). For this purpose, we will train and evaluate models for time-series prediction problem using Keras. For GA, a python package called DEAP will be used. The main idea of the tutorial is to familiarize the reader about employing GA, to find optimal settings automatically; hence, only two parameters will be explored. Moreover, reader’s knowledge (theoretical and applied) about RNNs is assumed. If you are unfamiliar with them, please consult following resources  and .
The ipython netbook with the complete code is available at the following link.
The genetic algorithm is a heuristic search and an optimization method inspired by the process of natural selection. They are widely used for finding a near optimal solution to optimization problems with large parameter space. The process of evolution of species (solutions in our case) is mimicked, by depending on biologically inspired components e.g. crossover. Furthermore, as it doesn’t take auxiliary information into account, (e.g. derivatives) it can be used for both discrete and continuous optimization.
For using a GA, two preconditions have to be fulfilled, a) a solution representation or defining a chromosome and b) a fitness function to evaluate produced solutions. In our case, a binary array is a genetic representation of a solution (see Figure 1) and model’s Root-Mean-Square Error (RMSE) on validation set will act a fitness value. Moreover, three basic operations that constitute a GA, are as follows:
- Selection: It defines which solutions to preserve for further reproduction e.g. roulette wheel selection.
- Crossover: It describes how new solutions are created from existing ones e.g. n-point crossover.
- Mutation: Its aim is to introduce diversity and novelty into the solution pool by means of randomly swapping or turning-off solution bits e.g. binary mutation.
Occasionally, a technique called “Elitism” is also used, which preserve few best solutions from the population, and pass on to next generation. Figure 2 depicts a complete genetic algorithm, where, initial solutions (population) are randomly generated. Next, they are evaluated according to a fitness function and selection, crossover and mutation are performed afterwards. This process is repeated for a defined number of iteration (called generations in GA terminology). At the end, a solution with highest fitness score is selected as the best solution. To learn more, please check following resources  and .
Now, we have a fair understanding of what GA is and how it works. Next, let’s get to coding.
We will use wind power forecast data, which is available at the following link. It consists of normalized (between zero and one) wind power measurements from seven wind farms. To keep things simple, we will use first wind farm data (column named wp1) but I encourage the reader to experiment and extend the code to forecast energy for all seven, wind farms.
Let’s import required packages, load the dataset and define two helper functions. The first method
prepare_dataset will segment the data into chunks to create X, Y pair for model training. The X will the wind power values from the past (e.g. 1 to t-1) and Y will be future value at time t. The second method
train_evaluate perform three things, 1) decoding GA solution to get window size and number of units. 2) Prepare the dataset using window size found by GA and divide into train and validation set, and 3) train LSTM model, calculate RMSE on validation set and return it as a fitness score of the current GA solution.
Next, use DEAP package to define things to run GA. We will use a binary representation for the solution of length ten. It will be randomly initialized using Bernoulli distribution. Likewise, ordered crossover, shuffle mutation and roulette wheel selection is used. The GA parameter values are initialized arbitrarily; I will suggest you, to play around with different settings.
The K best-found solution via GA can be seen easily seen using
tools.selBest(population,k = 1). Afterward, the optimal configuration can be used to train on the complete training set and test it on holdout test set.
In this tutorial, we saw how to employ GA to automatically find optimal window size (or lookback) and a number of units to use in RNN. For further learning, I would suggest you, to experiment with different GA parameter configurations, extend genetic representation to include more parameters to explore and share your findings and questions below in the comment section below.
- Understanding LSTM Networks, by Christopher Olah
- Recurrent Neural Networks in Tensorflow I, by R2RT
- Genetic Algorithms: Theory and Applications, by Ulrich Bodenhofer
- Chapter 9, Genetic Algorithms of Machine Learning book, by Tom M. Mitchell
Bio: Aaqib Saeed is a graduate student of Computer Science (specializing in Data Science and Smart Services) at University of Twente (The Netherlands).
Original. Reposted with permission.
- Urban Sound Classification with Neural Networks in Tensorflow
- Implementing a CNN for Human Activity Recognition in Tensorflow
- Today I Built a Neural Network During My Lunch Break with Keras