Bidirectional LSTM
Create and run a bidirectional LSTM model.
Chapter Goals:
- Learn about the bidirectional LSTM and why it's used
A. Forwards and backwards
The language model from the Language Model section of this course used a regular LSTM which read each input sequence in the forwards direction. This meant that the recurrent connections went in the left-right direction, i.e. from time step to time step .
While regular LSTMs work well for most NLP tasks, they are not always the best option. Specifically, when we have access to a completed text sequence (e.g. text classification), it may be beneficial to look at the sequence in both the forwards and backwards directions.
By looking at a sequence in both directions, the model can take into account both the past and future context of words and phrases in the text sequence, which allows it to obtain a better understanding of the text sequence.
B. Bidirectional LSTM structure
The way we look at an input sequence in both directions is with a model called a bidirectional LSTM, or BiLSTM for short. The model architecture is incredibly simple: it consists of a regular forwards LSTM and a backwards LSTM, which reads the input sequence in reverse.
The above diagram shows a (unrolled) BiLSTM with 3 time steps. On the left is the forwards LSTM and on the right is the backwards LSTM. The sequence represents an input (embedded) sequence.
Note: It is also possible to create a bidirectional RNN with general RNN cells rather than LSTM cells. However, since this Lab focuses on LSTM cells, we'll continue using the BiLSTM variant.
C. BiLSTM in TensorFlow
In TensorFlow, we can create and run a BiLSTM using the tf.keras.layers.Bidirectional
function. This function is very similar to the tf.compat.v1.nn.dynamic_rnn
function for regular LSTMs.
The code below shows example usage of tf.keras.layers.Bidirectional
.
import tensorflow as tfcell = tf.keras.layers.LSTMCell(7)rnn = tf.keras.layers.RNN(cell, return_sequences=True ,go_backwards=True , return_state=True)# Embedded input sequences# Shape: (batch_size, time_steps, embed_dim)input_embeddings = tf.compat.v1.placeholder(tf.float32, shape=(None, 10, 12))Bi_rnn= tf.keras.layers.Bidirectional(rnn,merge_mode=None)outputs = Bi_rnn(input_embeddings)
The tf.keras.layers.Bidirectional
function returns a Bidirectional RNN object. Input embeddings passed as argument to Bidirectional RNN object. It returns tuple containing the LSTM outputs and the final LSTM states. Since a BiLSTM contains two LSTMs, both outputs
and final_states
shown in the example are tuples. We won't worry about final_states
for now.
However, note that outputs[0]
represents the outputs of the forwards LSTM while outputs[1]
represents the outputs of the backwards LSTM. This is important for calculating the model's logits (which we'll do in the next chapter).
Time to Code!
In this chapter you'll be completing the run_bilstm
function, which runs a bidirectional LSTM on input sequences.
The function has already been filled with code that converts the sequences to embeddings and uses the make_lstm_cell
function to create the directional LSTM cells. Your task is to use the tf.keras.layers.Bidirectional
function to run the BiLSTM.
We only need to use the first element of the returned tuple from running the BiLSTM. We also use sequence_lengths
and tf.float32
for the sequence_length
and dtype
keyword arguments, respectively.
First create rnn
object using tf.keras.layers.RNN
with input argument go_backword = True
Set Bi_rnn
equals to the tf.keras.layers.Bidirectional
. Use rnn
and merge_mode=None
as the required arguments.
Pass input_embeddings
as required argument in object Bi_rnn
Return a tuple containing lstm_outputs
as the first element and sequence_lengths
as the second element.
import tensorflow as tf# Text classification modelclass ClassificationModel(object):# Model initializationdef __init__(self, vocab_size, max_length, num_lstm_units):self.vocab_size = vocab_sizeself.max_length = max_lengthself.num_lstm_units = num_lstm_units# See the Word Embeddings Lab for details on the Tokenizerself.tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=self.vocab_size)def make_lstm_cell(self, dropout_keep_prob):cell = tf.keras.layers.LSTMCell(self.num_lstm_units, dropout=dropout_keep_prob)return cell# Use feature columns to create input embeddingsdef get_input_embeddings(self, input_sequences):input_col = tf.compat.v1.feature_column \.categorical_column_with_identity('inputs', self.vocab_size)embed_size = int(self.vocab_size**0.25)embed_col = tf.compat.v1.feature_column.embedding_column(input_col, embed_size)input_dict = {'inputs': input_sequences}input_embeddings= tf.compat.v1.feature_column \.input_layer(input_dict, [embed_col])sequence_lengths = tf.compat.v1.placeholder("int64", shape=(None,),name="input_layer/input_embedding/sequence_length")return input_embeddings, sequence_lengths# Create and run a BiLSTM on the input sequencesdef run_bilstm(self, input_sequences, is_training):input_embeddings, sequence_lengths = self.get_input_embeddings(input_sequences)dropout_keep_prob = 0.5 if is_training else 1.0cell = self.make_lstm_cell(dropout_keep_prob)#CODE HEREpass
Get hands-on with 1300+ tech skills courses.