Introduction to Recurrent Networks in TensorFlow
A straightforward, introductory overview of implementing Recurrent Neural Networks in TensorFlow.
By Danijar Hafner, Independent Machine Learning Researcher.
Recurrent networks like LSTM and GRU are powerful sequence models. I will explain how to create recurrent networks in TensorFlow and use them for sequence classification and sequence labelling tasks. If you are not familiar with recurrent networks, I suggest you take a look at Christopher Olah’s great post first. On the TensorFlow part, I also expect some basic knowledge. The official tutorials are a good place to start.
Defining the Network
To use recurrent networks in TensorFlow we first need to define the network architecture consisting of one or more layers, the cell type and possibly dropout between the layers.
Unrolling in Time
We can now unroll this network in time using the
rnn operation. This takes placeholders for the input at each timestep and returns the hidden states and output activations for each timestep.
TensorFlow uses Python lists of one tensor for each timestep for the interface. Thus we make use of
tf.unpack() to split our data tensors into lists of frames and merge the results back to a single tensor.
As of version
v0.8.0, TensorFlow provides
rnn.dynamic_rnn as an alternative to
rnn.rnn that does not actually unroll the compute graph but uses a loop graph operation. The interface is the same except that you don’t need
pack_sequence() anymore, it already operates on single tensors. In the following sections, I will mention the modifications you need to make in order to use
For classification, you might only care about the output activation at the last timestep, which is just
outputs[-1]. The code below adds a softmax classifier ontop of that and defines the cross entropy error function. For now we assume sequences to be equal in length but I will cover variable length sequences in another post.
dynamic_rnn, this is how to get the last output of the recurrent networks. We can’t use
outputs[-1] because unlike Python lists, TensorFlow doesn’t support negative indexing yet. Here is thecomplete gist for sequence classification.
For sequence labelling, we want a prediction for each timestamp. However, we share the weights for the softmax layer across all timesteps. This way, we have one softmax layer ontop of an unrolled recurrent network as desired.
If you want to use
dynamic_rnn instead, you cannot apply the same weights and biases to all time steps in a Python list comprehension. Instead, we must flatten the outputs of each time step. This way time steps look the same as examples in the trainng batch to the weight matrix. Afterwards, we reshape back to the desired shape.
Since this is a classification task as well, we keep using cross entropy as our error function. Here we have a prediction and target for every timestep. We thus compute the cross entropy for every timestep first and then average. Here is the complete gist for sequence labelling.
That’s all. We learned how to construct recurrent networks in TensorFlow and use them for sequence learning tasks. Please ask any questions below if you couldn’t follow.
Original. Reposted with permission.