← Back to curriculum

Module 3 — Neural networks basics

Forward propagation

Layer-by-layer computation, matrix view of fully connected layers, and tracing a digit through a small network.

~65 min read + exercises

Forward propagation

Before we begin

Forward propagation (forward pass) means: given current weights and an input, compute the output layer by layer until you have a prediction.

No learning happens during forward pass — you only compute.

Figure

Layer stack for MNIST

Input784 pxHiddenReLUOutput10 logitsForward: input → … → prediction
784 flattened pixels → hidden ReLU → 10 class scores (logits).

What you will learn

  • Trace one MNIST digit through a 2-layer network.
  • View a layer as matrix multiply + bias + activation.
  • Connect flattened images to input vectors (Module 1).

Before this lesson


Step-by-step story

  1. Flatten a 28×28 digit image → vector x of length 784.
  2. Hidden layer: h = ReLU(W₁ x + b₁) — e.g. 128 hidden units.
  3. Output layer: z = W₂ h + b₂ — 10 numbers (logits), one per digit.
  4. Softmax (optional in forward display): turn logits into probabilities.
  5. Prediction: argmax — digit with highest score.

Matrix view

If x is (784×1), W₁ is (128×784):

h_pre = W₁ x + b₁
h = ReLU(h_pre)

Then W₂ is (10×128):

z = W₂ h + b₂

Same math as Module 1 dot products — batched across neurons.


Worked size check

TensorShape
Input x784
W₁128 × 784
h128
W₂10 × 128
Output logits10

Parameter count (rough): 784×128 + 128 + 128×10 + 10 ≈ 101k weights/biases — tiny by modern standards, enough for MNIST.


Inference vs training

  • Inference: forward pass only (app UI predicting your drawing).
  • Training: forward pass → compute loss → backprop → update weights.

Checkpoint

You have 784 inputs and 10 outputs with one hidden layer of 128. How many logits at the output?

Answer sketch

10 — one score per digit class before softmax.


What's next

Lesson 4 — Backpropagation