Logits

A. Concatenation

As mentioned in the previous chapter, the BiLSTM returns two outputs: the forwards and backwards outputs. In order to calculate the model’s logits, we need to combine these two outputs. We do this through simple concatenation.

Concatenation in TensorFlow refers to appending tensors along a certain dimension. The function that performs this operation is tf.concat. It takes in two required arguments: a list of tensors to concatenate and the axis (dimension) to concatenate along.

Below we demonstrate an example usage of tf.concat. The variables o1 and o2 are NumPy arrays representing the concatenation outputs.

Press + to interact

B. Final time step

Unlike the language model from the Language Model section of this course, when we create an LSTM for classification we only use the final time step output for each sequence in the batch. This is because we take into account the entire text sequence for classification, whereas for language modeling we were focused on completing partial sequences.

So after combining the forwards and backwards LSTM outputs, we retrieve the final time step values (using tf.gather_nd), and then pass those values through a final fully-connected layer to obtain the model’s logits.

Time to Code!

In this chapter you’ll be completing the calculate_logits function, which calculates logits based on the outputs of the BiLSTM.

The function’s input, lstm_outputs, is a tuple containing the outputs of the forwards and backwards LSTMs. Our first step is to separate the tuple into two distinct variables.

Set lstm_outputs_fw, lstm_outputs_bw equal to lstm_outputs.

The way we combine the two LSTM outputs is by concatenating the output values along their final dimension. The input list for tf.concat should be [lstm_outputs_fw, lstm_outputs_bw].

Set combined_outputs equal to tf.concat applied with the specified input list as the first argument and -1 as the second argument.

We provide a function, get_gather_indices, which uses code from the Language Model section to calculate the indices of each sequence’s final time step. Use that function, along with tf.gather_nd, to retrieve the final time step values from combined_outputs.

Set gather_indices equal to self.get_gather_indices applied with batch_size and sequence_lengths as arguments.

Set final_outputs equal to tf.gather_nd applied with combined_outputs and gather_indices as arguments.

Since our task is binary text classification, we use a final fully-connected layer with a single node to obtain the model’s logits.

Set logits equal to tf.keras.layers.Dense applied with 1 as the first argument and final_outputs as the second argument. Then return logits.

Press + to interact

import tensorflow as tf
# Text classification model
class ClassificationModel(object):
    # Model initialization
    def __init__(self, vocab_size, max_length, num_lstm_units):
        self.vocab_size = vocab_size
        self.max_length = max_length
        self.num_lstm_units = num_lstm_units
        # See the Word Embeddings Lab for details on the Tokenizer
        self.tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=self.vocab_size)
    def make_lstm_cell(self, dropout_keep_prob):
        cell = tf.keras.layers.LSTMCell(self.num_lstm_units, dropout=dropout_keep_prob)
        return cell
    # Use feature columns to create input embeddings
    def get_input_embeddings(self, input_sequences):
        
        input_col = tf.compat.v1.feature_column \
              .categorical_column_with_identity(
                  'inputs', self.vocab_size)
        embed_size = int(self.vocab_size**0.25)
        embed_col = tf.compat.v1.feature_column.embedding_column(
                  input_col, embed_size)
        input_dict = {'inputs': input_sequences}
        input_embeddings= tf.compat.v1.feature_column \
                                 .input_layer(
                                     input_dict, [embed_col])
                                 
        sequence_lengths = tf.compat.v1.placeholder("int64", shape=(None,), name="input_layer/input_embedding/sequence_length")
        
        return input_embeddings, sequence_lengths
    
    # Create and run a BiLSTM on the input sequences
    def run_bilstm(self, input_sequences, is_training):
        input_embeddings, sequence_lengths = self.get_input_embeddings(input_sequences)
        dropout_keep_prob = 0.5 if is_training else 1.0
        cell = self.make_lstm_cell(dropout_keep_prob)
        rnn = tf.keras.layers.RNN(cell, return_sequences=True ,
                go_backwards=True , return_state=True)
        input_embeddings = tf.compat.v1.placeholder(
                tf.float32, shape=(None, 10, 12))
        Bi_rnn= tf.keras.layers.Bidirectional(
              rnn,
              merge_mode=None
              )
        outputs = Bi_rnn(input_embeddings)
        return outputs , sequence_lengths
    def get_gather_indices(self, batch_size, sequence_lengths):
        row_indices = tf.range(batch_size)
        final_indexes = tf.cast(sequence_lengths - 1, tf.int32)
        return tf.transpose([row_indices, final_indexes])
    # Calculate logits based on the outputs of the BiLSTM
    def calculate_logits(self, lstm_outputs, batch_size, sequence_lengths):
        #CODE HERE
        pass

What you'll learn from this course

Word Embeddings

Language Model

Text Classification

Seq2Seq Model

Chapter Goals:

A. Concatenation

B. Final time step

Time to Code!