Welcome to Module 6 — transformers (core of GenAI)
Before we begin
Module 4 showed LSTMs reading text one step at a time. Transformers changed the field by letting every token look at every other token in parallel — the architecture behind GPT, BERT, Claude, and most modern GenAI.
This is the turning point. You will focus on concepts, not heavy matrix proofs.
Figure
Module 6 at a glance
What Module 6 covers
| Topic | What you will understand |
|---|---|
| Attention | Query, Key, Value — soft lookup between positions |
| Self-attention | Contextual token vectors, multi-head, causal masks |
| Transformer blocks | Attention + FFN + residuals |
| Encoder vs decoder | BERT-style reading vs GPT-style generation |
| Tokenization | Subwords, IDs, context window limits |
| Vectorization | Token embeddings, positions, retrieval vectors |
Before you start
Required: Module 4 project or comfort with embeddings and sequence models.
Optional depth: Module 5 — Image segmentation if you want hands-on CNN dense prediction (U-Net, DeepLab, Mask R-CNN) before transformers.
Install before the project:
pip install torch tiktoken(or use a simple word-level tokenizer for learning)
Lessons 1–7 are reading. Lesson 8 is the coding project.