Build a Transformer Encoder

Implement your own transformer in Pytorch.

We'll cover the following

Linear layers
Layer normalization
Skip connection
Multihead attention
Encoder

EncoderLayer
Encoder
TranformerEncoder

It’s finally time to apply all we have learned about Transformers. The best way to do it is to build a Transformer Encoder from scratch. We will start by developing all the subcomponents and in the end, we will combine them to form the encoder. Let’s start with something simple.

Disclaimer: Pytorch has its own built-in Transformer and attention modules. However, we believe that you can get a solid understanding only if you develop them yourself.

Linear layers

A good first step is to build the linear subcomponent. A 2-layered feedforward network followed by dropout is good enough. So here is what the forward pass should look like:

Linear Layer
RELU as an activation function
Dropout
2nd Linear layer

You can implement this yourself. Jump to the code below, and finish the FeedForward module. Note that this is not an exercise. It is only intended to solidify your understanding by revisiting how to build simple Pytorch modules.

Get hands-on with 1400+ tech skills courses.

Learn Deep Learning

Neural Networks

Training Neural Networks

Convolutional Neural Networks

Recurrent Neural Networks

Autoencoders

Generative Adversarial Networks

Attention and Transformers

Graph Neural Networks

Conclusion

Final Quiz

Build a Transformer Encoder

Linear layers