← Back to curriculum

Module 6 — Transformers (core of GenAI)

Welcome to Module 6

Why transformers matter, how Module 6 connects to Modules 4–5, and what to install before the text-generation project.

~25 min read + exercises

Welcome to Module 6 — transformers (core of GenAI)

Before we begin

Module 4 showed LSTMs reading text one step at a time. Transformers changed the field by letting every token look at every other token in parallel — the architecture behind GPT, BERT, Claude, and most modern GenAI.

This is the turning point. You will focus on concepts, not heavy matrix proofs.

Figure

Module 6 at a glance

Module 6 — transformers (core of GenAI)1WelcomeModule 62AttentionQ K V3Self-attncontext4Transformerblocks5Enc/DecGPT/BERT6TokensBPE7Vectorsembed8Quizcheck9Projectpredict
Attention through vectorization, quiz, then a mini transformer on your blog corpus.

What Module 6 covers

TopicWhat you will understand
AttentionQuery, Key, Value — soft lookup between positions
Self-attentionContextual token vectors, multi-head, causal masks
Transformer blocksAttention + FFN + residuals
Encoder vs decoderBERT-style reading vs GPT-style generation
TokenizationSubwords, IDs, context window limits
VectorizationToken embeddings, positions, retrieval vectors

Before you start

Required: Module 4 project or comfort with embeddings and sequence models.

Optional depth: Module 5 — Image segmentation if you want hands-on CNN dense prediction (U-Net, DeepLab, Mask R-CNN) before transformers.

Install before the project:

  • pip install torch tiktoken (or use a simple word-level tokenizer for learning)

Lessons 1–7 are reading. Lesson 8 is the coding project.


Ready?

Lesson 1 — Attention mechanism