Forward propagation
Before we begin
Forward propagation (forward pass) means: given current weights and an input, compute the output layer by layer until you have a prediction.
No learning happens during forward pass — you only compute.
Figure
Layer stack for MNIST
What you will learn
- Trace one MNIST digit through a 2-layer network.
- View a layer as matrix multiply + bias + activation.
- Connect flattened images to input vectors (Module 1).
Before this lesson
Step-by-step story
- Flatten a 28×28 digit image → vector x of length 784.
- Hidden layer: h = ReLU(W₁ x + b₁) — e.g. 128 hidden units.
- Output layer: z = W₂ h + b₂ — 10 numbers (logits), one per digit.
- Softmax (optional in forward display): turn logits into probabilities.
- Prediction: argmax — digit with highest score.
Matrix view
If x is (784×1), W₁ is (128×784):
h_pre = W₁ x + b₁
h = ReLU(h_pre)
Then W₂ is (10×128):
z = W₂ h + b₂
Same math as Module 1 dot products — batched across neurons.
Worked size check
| Tensor | Shape |
|---|---|
| Input x | 784 |
| W₁ | 128 × 784 |
| h | 128 |
| W₂ | 10 × 128 |
| Output logits | 10 |
Parameter count (rough): 784×128 + 128 + 128×10 + 10 ≈ 101k weights/biases — tiny by modern standards, enough for MNIST.
Inference vs training
- Inference: forward pass only (app UI predicting your drawing).
- Training: forward pass → compute loss → backprop → update weights.
Checkpoint
You have 784 inputs and 10 outputs with one hidden layer of 128. How many logits at the output?
Answer sketch
10 — one score per digit class before softmax.