Decoding Output
Decode the model's outputs for training and inference.
We'll cover the following
Chapter Goals:
- Retrieve the decoder outputs and return the model's logits during training
After creating the decoder object for our model, we can perform the decoding using the decoder object as dynamic_decode
function.
import tensorflow as tfextended_vocab_size = 500batch_size = 10# decoder is a BasicDecoder object used as Dynamic_decoderoutputs = decoder(inputs , initial_state=initial_state , sequence_length=input_seq_lens , training=True )decoder_output = outputs[0]logits = decoder_output.rnn_outputdecoder_final_state = outputs[1]decoded_sequence_lengths = outputs[2]
The decoder
object takes in input, initial_state, sequence_length, and isTraining as required argument. It returns a tuple containing three elements:
- The decoder's output. For a
BasicDecoder
input, the decoder's output takes the form of aBasicDecoderOutput
object. - The decoder's final state. This isn't used in our encoder-decoder model.
- The lengths of each of the decoder's output sequences. This also isn't used in our encoder-decoder model.
If the BasicDecoder
input was initialized with the output_layer
keyword argument, the rnn_output
of the BasicDecoderOutput
object will be the model's logits
B. Limiting the decoded length
A problem that sometimes occurs when decoding, especially in tasks like text summarization, is the decoder returning output sequences that are too long. We can manually limit the decoder output length with the decoder
object’s maximum_iterations
keyword argument.
By setting this keyword argument with an integer, , we guarantee that the decoder will not output sequences with length longer than .
Time to Code!
In this chapter you'll be completing the run_decoder
function, which is used in the model's decoder
function to run the decoder object.
We can run the decoder object and we only care about the first element in the returned tuple, which contains the output of our decoder.
Set dec_outputs
equal to the first element of the tuple returned by decoder
object. Call the object decoder
with the required arguments.
During training, the model's logits are located in the rnn_output
property of dec_outputs
. The decoder
function will return both the model's logits and the ground truth sequence lengths.
Create an if
block that checks if is_training
is True
. Inside the if
block, return a tuple containing the model's logits and dec_seq_lens
, in that order.
When we're not training, we will return the model's predictions, which is in the sample_id
attribute of dec_outputs
(more on this later).
Outside the if
block, return dec_outputs.sample_id
.
import tensorflow as tfimport tensorflow_addons as tfadef run_decoder(decoder,inputs , initial_state , input_seq_lens , isTraining ,dec_seq_lens):#CODE HEREpass# Seq2seq modelclass Seq2SeqModel(object):def __init__(self, vocab_size, num_lstm_layers, num_lstm_units):self.vocab_size = vocab_size# Extended vocabulary includes start, stop tokenself.extended_vocab_size = vocab_size + 2self.num_lstm_layers = num_lstm_layersself.num_lstm_units = num_lstm_unitsself.tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=vocab_size)def make_lstm_cell(self, dropout_keep_prob, num_units):cell = tf.keras.layers.LSTMCell(num_units, dropout=dropout_keep_prob )return cell# Create multi-layer LSTMdef stacked_lstm_cells(self, is_training, num_units):dropout_keep_prob = 0.5 if is_training else 1.0cell_list = [self.make_lstm_cell(dropout_keep_prob, num_units) for i in range(self.num_lstm_layers)]cell = tf.keras.layers.StackedRNNCells(cell_list)return cell# Get embeddings for input/output sequencesdef get_embeddings(self, sequences, scope_name):with tf.compat.v1.variable_scope(scope_name,reuse=tf.compat.v1.AUTO_REUSE):cat_column = tf.compat.v1.feature_column \.categorical_column_with_identity('sequences', self.extended_vocab_size)embed_size = int(self.extended_vocab_size**0.25)embedding_column = tf.compat.v1.feature_column.embedding_column(cat_column, embed_size)seq_dict = {'sequences': sequences}embeddings= tf.compat.v1.feature_column \.input_layer(seq_dict, [embedding_column])sequence_lengths = tf.compat.v1.placeholder("int64", shape=(None,), name=scope_name+"/sinput_layer/sequence_length")return embeddings, tf.cast(sequence_lengths, tf.int32)# sampler funtion to combine BiLSTM encoder outputsdef combine_enc_outputs(self, enc_outputs):enc_outputs_fw, enc_outputs_bw = enc_outputsreturn tf.concat([enc_outputs_fw, enc_outputs_bw], -1)# Create the stacked LSTM cells for the decoderdef create_decoder_cell(self, enc_outputs, input_seq_lens, is_training):num_decode_units = self.num_lstm_units * 2dec_cell = self.stacked_lstm_cells(is_training, num_decode_units)combined_enc_outputs = self.combine_enc_outputs(enc_outputs)attention_mechanism = tfa.seq2seq.LuongAttention(num_decode_units, combined_enc_outputs,memory_sequence_length=input_seq_lens)dec_cell = tfa.seq2seq.AttentionWrapper(dec_cell, attention_mechanism,attention_layer_size=num_decode_units)return dec_cell# Create the sampler for decodingdef create_decoder_sampler(self, decoder_inputs, is_training, batch_size):if is_training:dec_embeddings, dec_seq_lens = self.get_embeddings(decoder_inputs, 'decoder_emb')sampler = tfa.seq2seq.sampler.TrainingSampler()else:# IGNORE FOR NOWpassreturn sampler, dec_seq_lensdef Decoder(self, enc_outputs, input_seq_lens, final_state, batch_size,sampler, dec_seq_lens , is_training):dec_cell = self.create_decoder_cell(enc_outputs, input_seq_lens, is_training)projection_layer = tf.keras.layers.Dense(self.extended_vocab_size)batch_s = tf.constant(batch_size)initial_state = dec_cell.get_initial_state(enc_outputs[0],batch_size=batch_s , dtype = tf.float32)decoder = tfa.seq2seq.BasicDecoder(dec_cell, sampler,output_layer=projection_layer)inputs = enc_outputs[0]output = run_decoder(decoder,inputs , initial_state , input_seq_lens , is_training , dec_seq_lens)return output
Get hands-on with 1300+ tech skills courses.