← Back to curriculum

Module 3 — Neural networks basics

Forward propagation

Layer-by-layer computation, matrix view of fully connected layers, and tracing a digit through a small network.

~65 min read + exercises

Forward propagation

Before we begin

Forward propagation (forward pass) means: given current weights and an input, compute the output layer by layer until you have a prediction.

No learning happens during forward pass — you only compute.

Figure

Layer stack for MNIST

784 flattened pixels → hidden ReLU → 10 class scores (logits).

What you will learn

Trace one MNIST digit through a 2-layer network.
View a layer as matrix multiply + bias + activation.
Connect flattened images to input vectors (Module 1).

Before this lesson

Step-by-step story

Flatten a 28×28 digit image → vector x of length 784.
Hidden layer: h = ReLU(W₁ x + b₁) — e.g. 128 hidden units.
Output layer: z = W₂ h + b₂ — 10 numbers (logits), one per digit.
Softmax (optional in forward display): turn logits into probabilities.
Prediction: argmax — digit with highest score.

Matrix view

If x is (784×1), W₁ is (128×784):

h_pre = W₁ x + b₁
h = ReLU(h_pre)

Then W₂ is (10×128):

z = W₂ h + b₂

Same math as Module 1 dot products — batched across neurons.

Worked size check

Tensor	Shape
Input x	784
W₁	128 × 784
h	128
W₂	10 × 128
Output logits	10

Parameter count (rough): 784×128 + 128 + 128×10 + 10 ≈ 101k weights/biases — tiny by modern standards, enough for MNIST.

Inference vs training

Inference: forward pass only (app UI predicting your drawing).
Training: forward pass → compute loss → backprop → update weights.

Checkpoint

You have 784 inputs and 10 outputs with one hidden layer of 128. How many logits at the output?

Answer sketch

10 — one score per digit class before softmax.

What's next

Lesson 4 — Backpropagation