Yes this is for the how to implement a transformer, but there will still be some theory. Implementing a transformer is hard, and it is vital to understand the theory.

You need to really know how to implement a normal NN and its theoretical foundations, see Model Implementation (Pytorch)

What makes up a transformer?

Embeddings
Self-Attention Mechanism
Positional Encoding
Transformer final layers, which layers I am using here is very important, and I need to figure out when to use which layer. related to Model Selection.

Pytorch implementation

Basically we are just putting the parts together. The code for this is explained in the parts, no need to repeat it here. I will just link a working transformer ipynb file.

![[simple_transformer.ipynb]]