Introduction to Recurrent Networks in TensorFlow
A straightforward, introductory overview of implementing Recurrent Neural Networks in TensorFlow.
By Danijar Hafner, Independent Machine Learning Researcher.
Recurrent networks like LSTM and GRU are powerful sequence models. I will explain how to create recurrent networks in TensorFlow and use them for sequence classification and sequence labelling tasks. If you are not familiar with recurrent networks, I suggest you take a look at Christopher Olah’s great post first. On the TensorFlow part, I also expect some basic knowledge. The official tutorials are a good place to start.
from tensorflow.models.rnn import rnn_cell num_hidden = 200 num_layers = 3 dropout = tf.placeholder(tf.float32) network = rnn_cell.GRUCell(num_hidden) # Or LSTMCell(num_hidden) network = rnn_cell.DropoutWrapper(network, output_keep_prob=dropout) network = rnn_cell.MultiRNNCell([network] * num_layers)
Defining the Network
To use recurrent networks in TensorFlow we first need to define the network architecture consisting of one or more layers, the cell type and possibly dropout between the layers.
Unrolling in Time
We can now unroll this network in time using the rnn
operation. This takes placeholders for the input at each timestep and returns the hidden states and output activations for each timestep.
from tensorflow.models.rnn import rnn max_length = 100 # Batch size times time steps times data width. data = tf.placeholder(tf.float32, [None, max_length, 28]) outputs, states = rnn.rnn(network, unpack_sequence(data), dtype=tf.float32) output = pack_sequence(outputs) state = pack_sequence(states)
TensorFlow uses Python lists of one tensor for each timestep for the interface. Thus we make use oftf.pack()
and tf.unpack()
to split our data tensors into lists of frames and merge the results back to a single tensor.
def unpack_sequence(tensor): """Split the single tensor of a sequence into a list of frames.""" return tf.unpack(tf.transpose(tensor, perm=[1, 0, 2])) def pack_sequence(sequence): """Combine a list of the frames into a single tensor of the sequence.""" return tf.transpose(tf.pack(sequence), perm=[1, 0, 2])
As of version v0.8.0
, TensorFlow provides rnn.dynamic_rnn
as an alternative to rnn.rnn
that does not actually unroll the compute graph but uses a loop graph operation. The interface is the same except that you don’t need unpack_sequence()
and pack_sequence()
anymore, it already operates on single tensors. In the following sections, I will mention the modifications you need to make in order to use dynamic_rnn
.
Sequence Classification
For classification, you might only care about the output activation at the last timestep, which is justoutputs[-1]
. The code below adds a softmax classifier ontop of that and defines the cross entropy error function. For now we assume sequences to be equal in length but I will cover variable length sequences in another post.
in_size = num_hidden out_size = int(target.get_shape()[2]) weight = tf.Variable(tf.truncated_normal([in_size, out_size], stddev=0.1)) bias = tf.Variable(tf.constant(0.1, shape=[out_size])) prediction = tf.nn.softmax(tf.matmul(outputs[-1], weight) + bias) cross_entropy = -tf.reduce_sum(target * tf.log(prediction))
When using dynamic_rnn
, this is how to get the last output of the recurrent networks. We can’t useoutputs[-1]
because unlike Python lists, TensorFlow doesn’t support negative indexing yet. Here is thecomplete gist for sequence classification.
output, _ = rnn.dynamic_rnn(network, data, dtype=tf.float32) output = tf.transpose(output, [1, 0, 2]) last = tf.gather(output, int(output.get_shape()[0]) - 1)
Sequence Labelling
For sequence labelling, we want a prediction for each timestamp. However, we share the weights for the softmax layer across all timesteps. This way, we have one softmax layer ontop of an unrolled recurrent network as desired.
in_size = num_hidden out_size = int(target.get_shape()[2]) weight = tf.Variable(tf.truncated_normal([in_size, out_size], stddev=0.1)) bias = tf.Variable(tf.constant(0.1, shape=[out_size])) predictions = [tf.nn.softmax(tf.matmul(x, weight) + bias) for x in outputs] prediction = pack_sequence(predictions)
If you want to use dynamic_rnn
instead, you cannot apply the same weights and biases to all time steps in a Python list comprehension. Instead, we must flatten the outputs of each time step. This way time steps look the same as examples in the trainng batch to the weight matrix. Afterwards, we reshape back to the desired shape.
max_length = int(self.target.get_shape()[1]) num_classes = int(self.target.get_shape()[2]) weight, bias = self._weight_and_bias(self._num_hidden, num_classes) output = tf.reshape(output, [-1, self._num_hidden]) prediction = tf.nn.softmax(tf.matmul(output, weight) + bias) prediction = tf.reshape(prediction, [-1, max_length, num_classes])
Since this is a classification task as well, we keep using cross entropy as our error function. Here we have a prediction and target for every timestep. We thus compute the cross entropy for every timestep first and then average. Here is the complete gist for sequence labelling.
cross_entropy = -tf.reduce_sum( target * tf.log(prediction), reduction_indices=[1]) cross_entropy = tf.reduce_mean(cross_entropy)
That’s all. We learned how to construct recurrent networks in TensorFlow and use them for sequence learning tasks. Please ask any questions below if you couldn’t follow.
Bio: Danijar Hafner is a Python and C++ developer from Berlin interested in Machine Intelligence research. He recently released a neural networks library, but he likes creating new things in general.
Original. Reposted with permission.
Related: