Welcome to Module 6 — transformers (core of GenAI)

Before we begin

Module 4 showed LSTMs reading text one step at a time. Transformers changed the field by letting every token look at every other token in parallel — the architecture behind GPT, BERT, Claude, and most modern GenAI.

This is the turning point. You will focus on concepts, not heavy matrix proofs.

Figure

Module 6 at a glance

Attention through vectorization, quiz, then a mini transformer on your blog corpus.

What Module 6 covers

Topic	What you will understand
Attention	Query, Key, Value — soft lookup between positions
Self-attention	Contextual token vectors, multi-head, causal masks
Transformer blocks	Attention + FFN + residuals
Encoder vs decoder	BERT-style reading vs GPT-style generation
Tokenization	Subwords, IDs, context window limits
Vectorization	Token embeddings, positions, retrieval vectors

Before you start

Required: Module 4 project or comfort with embeddings and sequence models.

Optional depth: Module 5 — Image segmentation if you want hands-on CNN dense prediction (U-Net, DeepLab, Mask R-CNN) before transformers.

Install before the project:

pip install torch tiktoken (or use a simple word-level tokenizer for learning)

Lessons 1–7 are reading. Lesson 8 is the coding project.

Ready?

Lesson 1 — Attention mechanism