# Introduction to Recurrent Networks in TensorFlow

Recurrent networks like LSTM and GRU are powerful sequence models. I will explain how to create recurrent networks in TensorFlow and use them for sequence classification and labelling tasks. If you are not familiar with recurrent networks, I suggest you take a look at Christopher Olah’s great article first. On the TensorFlow part, I also expect some basic knowledge. The official tutorials are a good place to start.

## Defining the Network

To use recurrent networks in TensorFlow we first need to define the network architecture consiting of one or more layers, the cell type and possibly dropout between the layers. In TensorFlow, we build recurrent networks out of so called cells that wrap each other.

```
import tensorflow
num_units = 200
num_layers = 3
dropout = tf.placeholder(tf.float32)
cells = []
for _ in range(num_layers):
cell = tf.contrib.rnn.GRUCell(num_units) # Or LSTMCell(num_units)
cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=1.0 - dropout)
cells.append(cell)
cell = tf.contrib.rnn.MultiRNNCell(cells)
```

## Simulating Time Steps

We can now add the operations to the graph that simulate the recurrent network
over the time steps of the input. We do this using TensorFlow’s `dynamic_rnn()`

operation. It takes the a tensor block holding the input sequences and returns
the *output* activations and last hidden *state* as tensors.

```
# Batch size x time steps x features.
data = tf.placeholder(tf.float32, [None, None, 28])
output, state = tf.nn.dynamic_rnn(cell, data, dtype=tf.float32)
```

## Sequence Classification

For classification, you might only care about the output activation at the last
time step. We transpose so that the time axis is first and use `tf.gather()`

for
selecting the last frame. We can’t just use `output[-1]`

because unlike Python
lists, TensorFlow doesn’t support negative indexing yet.

```
output, _ = tf.nn.dynamic_rnn(cell, data, dtype=tf.float32)
output = tf.transpose(output, [1, 0, 2])
last = tf.gather(output, int(output.get_shape()[0]) - 1)
```

The code below adds a softmax classifier ontop of the last activation and defines the cross entropy loss function. Here is the complete gist for sequence classification.

```
out_size = target.get_shape()[2].value
logit = tf.contrib.layers.fully_connected(last, out_size, activation_fn=None)
prediction = tf.nn.softmax(logit)
loss = tf.losses.softmax_cross_entropy(target, logit)
```

For now we assume sequences to be equal in length. Please refer to my other post on handling sequences of different length.

## Sequence Labelling

For sequence labelling, we want a prediction for each timestamp. However, we share the weights for the softmax layer across all timesteps. How do we do that? By flattening the first two dimensions of the output tensor. This way time steps look the same as examples in the batch to the weight matrix. Afterwards, we reshape back to the desired shape.

```
out_size = target.get_shape()[2].value
logit = tf.contrib.layers.fully_connected(output, out_size, activation_fn=None)
prediction = tf.nn.softmax(logit)
```

Let’s say we predict a class for each frame, so we keep using cross entropy as our loss function. Here we have a prediction and target for every time step. We thus compute the cross entropy for every time step and sequence in the batch, and then average along these two dimensions. Here is the complete gist for sequence labelling.

```
flat_target = tf.reshape(target, [-1] + target.shape.as_list()[2:])
flat_logit = tf.reshape(logit, [-1] + logit.shape.as_list()[2:])
loss = tf.losses.softmax_cross_entropy(flat_target, flat_logit)
loss = tf.reduce_mean(loss)
```

## Conclusion

That’s all. We have learned how to construct recurrent networks in TensorFlow and use them for sequence learning tasks. Please ask any questions below if you couldn’t follow.

**Updated 2016-08-17:** TensorFlow 0.10 moved the recurrent network operations
from `tf.models.rnn`

into the `tf.nn`

package where they live along the other
neural network operations now. Cells can now be found in `tf.nn.rnn_cell`

.

**Updated 2016-05-20:** TensorFlow 0.8 introduced `dynamic_rnn()`

that uses a
symbolic loop instead of creating a sub graph for each time step. This results
in a more compact graph. The function also expects and returns tensors directly,
so we do not need to convert to and from Python-lists anymore.

**Updated 2017-06-07:** TensorFlow 1.0 moved recurrent cells into
`tf.contrib.rnn`

. From TensorFlow 1.2 on, recurrent cells reuse their weights,
so that we need to create multiple separate `GRUCell`

s in the first code block.
Moreover, I switched to using the existing implementation of the cross entropy
loss which is numerically stable and has a more efficient gradient computation.