Decoding Output

Decode the model's outputs for training and inference.

Chapter Goals:

  • Retrieve the decoder outputs and return the model's logits during training

After creating the decoder object for our model, we can perform the decoding using the decoder object as dynamic_decode function.

Press + to interact
import tensorflow as tf
extended_vocab_size = 500
batch_size = 10
# decoder is a BasicDecoder object used as Dynamic_decoder
outputs = decoder(inputs , initial_state=initial_state , sequence_length=input_seq_lens , training=True )
decoder_output = outputs[0]
logits = decoder_output.rnn_output
decoder_final_state = outputs[1]
decoded_sequence_lengths = outputs[2]

The decoder object takes in input, initial_state, sequence_length, and isTraining as required argument. It returns a tuple containing three elements:

  1. The decoder's output. For a BasicDecoder input, the decoder's output takes the form of a BasicDecoderOutput object.
  2. The decoder's final state. This isn't used in our encoder-decoder model.
  3. The lengths of each of the decoder's output sequences. This also isn't used in our encoder-decoder model.

If the BasicDecoder input was initialized with the output_layer keyword argument, the rnn_output of the BasicDecoderOutput object will be the model's logits

B. Limiting the decoded length

A problem that sometimes occurs when decoding, especially in tasks like text summarization, is the decoder returning output sequences that are too long. We can manually limit the decoder output length with the decoder object’s maximum_iterations keyword argument.

By setting this keyword argument with an integer, kk, we guarantee that the decoder will not output sequences with length longer than kk.

Time to Code!

In this chapter you'll be completing the run_decoder function, which is used in the model's decoder function to run the decoder object.

We can run the decoder object and we only care about the first element in the returned tuple, which contains the output of our decoder.

Set dec_outputs equal to the first element of the tuple returned by decoder object. Call the object decoder with the required arguments.

During training, the model's logits are located in the rnn_output property of dec_outputs. The decoder function will return both the model's logits and the ground truth sequence lengths.

Create an if block that checks if is_training is True. Inside the if block, return a tuple containing the model's logits and dec_seq_lens, in that order.

When we're not training, we will return the model's predictions, which is in the sample_id attribute of dec_outputs (more on this later).

Outside the if block, return dec_outputs.sample_id.

Press + to interact
import tensorflow as tf
import tensorflow_addons as tfa
def run_decoder(decoder,inputs , initial_state , input_seq_lens , isTraining ,dec_seq_lens):
#CODE HERE
pass
# Seq2seq model
class Seq2SeqModel(object):
def __init__(self, vocab_size, num_lstm_layers, num_lstm_units):
self.vocab_size = vocab_size
# Extended vocabulary includes start, stop token
self.extended_vocab_size = vocab_size + 2
self.num_lstm_layers = num_lstm_layers
self.num_lstm_units = num_lstm_units
self.tokenizer = tf.keras.preprocessing.text.Tokenizer(
num_words=vocab_size)
def make_lstm_cell(self, dropout_keep_prob, num_units):
cell = tf.keras.layers.LSTMCell(num_units, dropout=dropout_keep_prob )
return cell
# Create multi-layer LSTM
def stacked_lstm_cells(self, is_training, num_units):
dropout_keep_prob = 0.5 if is_training else 1.0
cell_list = [self.make_lstm_cell(dropout_keep_prob, num_units) for i in range(self.num_lstm_layers)]
cell = tf.keras.layers.StackedRNNCells(cell_list)
return cell
# Get embeddings for input/output sequences
def get_embeddings(self, sequences, scope_name):
with tf.compat.v1.variable_scope(scope_name,reuse=tf.compat.v1.AUTO_REUSE):
cat_column = tf.compat.v1.feature_column \
.categorical_column_with_identity(
'sequences', self.extended_vocab_size)
embed_size = int(self.extended_vocab_size**0.25)
embedding_column = tf.compat.v1.feature_column.embedding_column(
cat_column, embed_size)
seq_dict = {'sequences': sequences}
embeddings= tf.compat.v1.feature_column \
.input_layer(
seq_dict, [embedding_column])
sequence_lengths = tf.compat.v1.placeholder("int64", shape=(None,), name=scope_name+"/sinput_layer/sequence_length")
return embeddings, tf.cast(sequence_lengths, tf.int32)
# sampler funtion to combine BiLSTM encoder outputs
def combine_enc_outputs(self, enc_outputs):
enc_outputs_fw, enc_outputs_bw = enc_outputs
return tf.concat([enc_outputs_fw, enc_outputs_bw], -1)
# Create the stacked LSTM cells for the decoder
def create_decoder_cell(self, enc_outputs, input_seq_lens, is_training):
num_decode_units = self.num_lstm_units * 2
dec_cell = self.stacked_lstm_cells(is_training, num_decode_units)
combined_enc_outputs = self.combine_enc_outputs(enc_outputs)
attention_mechanism = tfa.seq2seq.LuongAttention(
num_decode_units, combined_enc_outputs,
memory_sequence_length=input_seq_lens)
dec_cell = tfa.seq2seq.AttentionWrapper(
dec_cell, attention_mechanism,
attention_layer_size=num_decode_units)
return dec_cell
# Create the sampler for decoding
def create_decoder_sampler(self, decoder_inputs, is_training, batch_size):
if is_training:
dec_embeddings, dec_seq_lens = self.get_embeddings(decoder_inputs, 'decoder_emb')
sampler = tfa.seq2seq.sampler.TrainingSampler()
else:
# IGNORE FOR NOW
pass
return sampler, dec_seq_lens
def Decoder(self, enc_outputs, input_seq_lens, final_state, batch_size,
sampler, dec_seq_lens , is_training):
dec_cell = self.create_decoder_cell(enc_outputs, input_seq_lens, is_training)
projection_layer = tf.keras.layers.Dense(self.extended_vocab_size)
batch_s = tf.constant(batch_size)
initial_state = dec_cell.get_initial_state(enc_outputs[0],batch_size=batch_s , dtype = tf.float32)
decoder = tfa.seq2seq.BasicDecoder(
dec_cell, sampler,
output_layer=projection_layer
)
inputs = enc_outputs[0]
output = run_decoder(decoder,inputs , initial_state , input_seq_lens , is_training , dec_seq_lens)
return output

Get hands-on with 1300+ tech skills courses.