A Guide For Time Series Prediction Using Recurrent Neural Networks (LSTMs)
Looking at the strengths of a neural network, especially a recurrent neural network, I came up with the idea of predicting the exchange rate between the USD and the INR.
Time Series Prediction
I was impressed with the strengths of a recurrent neural network and decided to use them to predict the exchange rate between the USD and the INR. The dataset used in this project is the exchange rate data between January 2, 1980 and August 10, 2017. Later, I’ll give you a link to download this dataset and experiment with it.
Table 1. Dataset Example
The dataset displays the value of $1 in rupees. We have a total of 13,730 records starting from January 2, 1980 to August 10, 2017.
USD vs INR
Over the period, the price to buy $1 in rupees has been rising. One can see that there was a huge dip in the American economy during 2007–2008, which was hugely caused by the great recession during that period. It was a period of general economic decline observed in world markets during the late 2000s and early 2010s.
This period was not very good for the world’s developed economies, particularly in North America and Europe (including Russia), which fell into a definitive recession. Many of the newer developed economies suffered far less impact, particularly China and India, whose economies grew substantially during this period.
Test-Train Split
Now, to train the machine we need to divide the dataset into test and training sets. It is very important when you do time series to split train and test with respect to a certain date. So, you don’t want your test data to come before your training data.
In our experiment, we will define a date, say January 1, 2010, as our split date. The training data is the data between January 2, 1980 and December 31, 2009, which are about 11,000 training data points.
The test dataset is between January 1, 2010 and August 10, 2017, which are about 2,700 points.
Train-Test Split
The next thing to do is normalize the dataset. You only need to fit and transform your training data and just transform your test data. The reason you do that is you don’t want to assume that you know the scale of your test data.
Normalizing or transforming the data means that the new scale variables will be between zero and one.
Neural Network Models
A fully Connected Model is a simple neural network model which is built as a simple regression model that will take one input and will spit out one output. This basically takes the price from the previous day and forecasts the price of the next day.
As a loss function, we use mean squared error and stochastic gradient descent as an optimizer, which after enough numbers of epochs will try to look for a good local optimum. Below is the summary of the fully connected layer.
Summary of a Fully Connected Layer
After training this model for 200 epochs or early_callbacks (whichever came first), the model tries to learn the pattern and the behavior of the data. Since we split the data into training and testing sets we can now predict the value of testing data and compare them with the ground truth.
Ground Truth (blue) vs Prediction (orange)
As you can see, the model is not good. It essentially is repeating the previous values and there is a slight shift. The fully connected model is not able to predict the future from the single previous value. Let us now try using a recurrent neural network and see how well it does.
Long Short-Term Memory
The recurrent model we have used is a one layer sequential model. We used 6 LSTM nodes in the layer to which we gave input of shape (1,1), which is one input given to the network with one value.
Summary of LSTM Model
The last layer is a dense layer where the loss is mean squared error with stochastic gradient descent as an optimizer. We train this model for 200 epochs with early_stopping callback. The summary of the model is shown above.
LSTM Prediction
This model has learned to reproduce the yearly shape of the data and doesn’t have the lag it used to have with a simple feed forward neural network. It is still underestimating some observations by certain amounts and there is definitely room for improvement in this model.
Changes in the model
There can be a lot of changes to be made in this model to make it better. One can always try to change the configuration by changing the optimizer. Another important change I see is by using the Sliding Time Window method, which comes from the field of stream data management system.
This approach comes from the idea that only the most recent data are important. One can show the model data from a year and try to make a prediction for the first day of the next year. Sliding time window methods are very useful in terms of fetching important patterns in the dataset that are highly dependent on the past bulk of observations.
Try to make changes to this model as you like and see how the model reacts to those changes.
Dataset
I made the dataset available on my github account under deep learning in python repository. Feel free to download the dataset and play with it.
Useful sources
I personally follow some of my favorite data scientists like Kirill Eremenko, Jose Portilla, Dan Van Boxel (better known as Dan Does Data), and many more. Most of them are available on different podcast stations where they talk about different current subjects like RNN, Convolutional Neural Networks, LSTM, and even the most recent technology, Neural Turing Machine.
Try to keep up with the news of different artificial intelligence conferences. By the way, if you are interested, then Kirill Eremenko is coming to San Diego this November with his amazing team to give talks on Machine Learning, Neural Networks, and Data Science.
Conclusion
LSTM models are powerful enough to learn the most important past behaviors and understand whether or not those past behaviors are important features in making future predictions. There are several applications where LSTMs are highly used. Applications like speech recognition, music composition, handwriting recognition, and even in my current research of human mobility and travel predictions.
According to me, LSTM is like a model which has its own memory and which can behave like an intelligent human in making decisions.
Thank you again and happy machine learning!
Bio: Neelabh Pant loves Data Science. Let’s build some intelligent bots together! ;)
Original. Reposted with permission.
Related: