Logits
Calculate logits based on the final output of the model.
We'll cover the following
Chapter Goals:
- Calculate logits from the combined BiLSTM outputs
A. Concatenation
As mentioned in the previous chapter, the BiLSTM returns two outputs: the forwards and backwards outputs. In order to calculate the model’s logits, we need to combine these two outputs. We do this through simple concatenation.
Concatenation in TensorFlow refers to appending tensors along a certain dimension. The function that performs this operation is tf.concat
. It takes in two required arguments: a list of tensors to concatenate and the axis (dimension) to concatenate along.
Below we demonstrate an example usage of tf.concat
. The variables o1
and o2
are NumPy arrays representing the concatenation outputs.
import tensorflow as tf# Shape: (2, 2, 3)t1 = tf.constant([[[1, 2, 3], [4, 5, 6]],[[0, 4, 8], [3, 2, 2]]])# Shape: (1, 2, 3)t2 = tf.constant([[[9, 9, 9], [8, 8, 8]]])# Shape: (2, 2, 2)t3 = tf.constant([[[9, 9], [1, 1]],[[7, 2], [8, 8]]])with tf.compat.v1.Session() as sess:o1 = sess.run(tf.concat([t1, t2], 0))o2 = sess.run(tf.concat([t1, t3], -1))print(repr(o1))print(repr(o2))
When concatenating tensors, each tensor needs to have the exact same shape, apart from the axis that’s being concatenated. The tensors are concatenated in the same order that they appear in the list.
We can use -1
for the second argument to specify the final tensor dimension as the axis of concatenation. This is a useful shortcut for concatenating along the final dimension, and it is how we combine the BiLSTM outputs.
B. Final time step
Unlike the language model from the Language Model section of this course, when we create an LSTM for classification we only use the final time step output for each sequence in the batch. This is because we take into account the entire text sequence for classification, whereas for language modeling we were focused on completing partial sequences.
So after combining the forwards and backwards LSTM outputs, we retrieve the final time step values (using tf.gather_nd
), and then pass those values through a final fully-connected layer to obtain the model’s logits.
Time to Code!
In this chapter you’ll be completing the calculate_logits
function, which calculates logits based on the outputs of the BiLSTM.
The function’s input, lstm_outputs
, is a tuple containing the outputs of the forwards and backwards LSTMs. Our first step is to separate the tuple into two distinct variables.
Set lstm_outputs_fw, lstm_outputs_bw
equal to lstm_outputs
.
The way we combine the two LSTM outputs is by concatenating the output values along their final dimension. The input list for tf.concat
should be [lstm_outputs_fw, lstm_outputs_bw]
.
Set combined_outputs
equal to tf.concat
applied with the specified input list as the first argument and -1
as the second argument.
We provide a function, get_gather_indices
, which uses code from the Language Model section to calculate the indices of each sequence’s final time step. Use that function, along with tf.gather_nd
, to retrieve the final time step values from combined_outputs
.
Set gather_indices
equal to self.get_gather_indices
applied with batch_size
and sequence_lengths
as arguments.
Set final_outputs
equal to tf.gather_nd
applied with combined_outputs
and gather_indices
as arguments.
Since our task is binary text classification, we use a final fully-connected layer with a single node to obtain the model’s logits.
Set logits
equal to tf.keras.layers.Dense
applied with 1
as the first argument and final_outputs
as the second argument. Then return logits
.
import tensorflow as tf# Text classification modelclass ClassificationModel(object):# Model initializationdef __init__(self, vocab_size, max_length, num_lstm_units):self.vocab_size = vocab_sizeself.max_length = max_lengthself.num_lstm_units = num_lstm_units# See the Word Embeddings Lab for details on the Tokenizerself.tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=self.vocab_size)def make_lstm_cell(self, dropout_keep_prob):cell = tf.keras.layers.LSTMCell(self.num_lstm_units, dropout=dropout_keep_prob)return cell# Use feature columns to create input embeddingsdef get_input_embeddings(self, input_sequences):input_col = tf.compat.v1.feature_column \.categorical_column_with_identity('inputs', self.vocab_size)embed_size = int(self.vocab_size**0.25)embed_col = tf.compat.v1.feature_column.embedding_column(input_col, embed_size)input_dict = {'inputs': input_sequences}input_embeddings= tf.compat.v1.feature_column \.input_layer(input_dict, [embed_col])sequence_lengths = tf.compat.v1.placeholder("int64", shape=(None,), name="input_layer/input_embedding/sequence_length")return input_embeddings, sequence_lengths# Create and run a BiLSTM on the input sequencesdef run_bilstm(self, input_sequences, is_training):input_embeddings, sequence_lengths = self.get_input_embeddings(input_sequences)dropout_keep_prob = 0.5 if is_training else 1.0cell = self.make_lstm_cell(dropout_keep_prob)rnn = tf.keras.layers.RNN(cell, return_sequences=True ,go_backwards=True , return_state=True)input_embeddings = tf.compat.v1.placeholder(tf.float32, shape=(None, 10, 12))Bi_rnn= tf.keras.layers.Bidirectional(rnn,merge_mode=None)outputs = Bi_rnn(input_embeddings)return outputs , sequence_lengthsdef get_gather_indices(self, batch_size, sequence_lengths):row_indices = tf.range(batch_size)final_indexes = tf.cast(sequence_lengths - 1, tf.int32)return tf.transpose([row_indices, final_indexes])# Calculate logits based on the outputs of the BiLSTMdef calculate_logits(self, lstm_outputs, batch_size, sequence_lengths):#CODE HEREpass
Get hands-on with 1300+ tech skills courses.