Key Concepts of Transformers

Represent text with positional encodings and embedding so it can be passed into a transformer.

We'll cover the following

Representing the input sentence

Sets and tokenization
Word embeddings
Positional encodings
Feature-based attention: the key, value, and query
Vector similarity in high dimensional spaces

Mostly, the inability to fully understand transformers arises due to the confusion around secondary concepts. To prevent this from happening, we will gradually discuss all fundamental concepts and then construct a holistic view of transformers.

With Recurrent Neural Networks (RNN’s), we used to treat sequences sequentially to keep the order of the sentence in place. To satisfy that design, each RNN component (layer) needs the previous (hidden) output. As such, stacked LSTM computations were performed sequentially.

Then, transformers came out.

The fundamental building block of a transformer is self-attention. To begin, we need to get rid of sequential processing, recurrency, and LSTMs.We can do that by simply changing the input representation.

Representing the input sentence

Sets and tokenization

The transformer revolution started with a simple question:

Why don’t we feed the entire input sequence so there are no dependencies between hidden states? That might be cool!

As an example the sentence “Hello I love you”:

Get hands-on with 1400+ tech skills courses.

Learn Deep Learning

Neural Networks

Training Neural Networks

Convolutional Neural Networks

Recurrent Neural Networks

Autoencoders

Generative Adversarial Networks

Attention and Transformers

Graph Neural Networks

Conclusion

Final Quiz

Key Concepts of Transformers

Representing the input sentence

Sets and tokenization