← Back to curriculum

Module 1 — Math & intuition

Project: predict shading on an image patch

Build gradient descent in NumPy, plot error vs epoch, visualize residuals, and compare with a closed-form solver.

~120 min read + exercises

Project: predict shading on an image patch

Before we begin

This is your first end-to-end training exercise. You will not use PyTorch yet — only Python, NumPy, and Matplotlib. That is intentional: you will see every step that later frameworks hide behind .backward() and .step().

What problem are we solving?

Many photos contain smooth regions where brightness changes gradually — sky, a painted wall, a cheek highlight, a desk surface. If you pick a small patch from such a region, brightness often increases (or decreases) roughly in a straight line as you move left/right or up/down.

We will fit a simple model to that pattern:

predicted brightness = w0 + w1 × x + w2 × y

  • x, y — pixel column and row in the patch
  • w0 — baseline brightness (like an intercept)
  • w1 — how fast brightness changes horizontally
  • w2 — how fast brightness changes vertically

This is linear regression — the simplest form of “learning from data.” The same loop (predict → measure error → compute slopes → update weights) powers far larger models.

Figure

What you are building

brightness ≈ w0 + w1×x + w2×yA smooth lighting gradient is almost linear in (x, y) — perfect for linear regressionPredicted planeminimize MSE over pixels
A smooth patch, a learned prediction, and the difference between them.

What you will build

  1. A dataset of pixels from a real or synthetic patch.
  2. A training loop with gradient descent.
  3. A plot of error vs epoch (should trend downward).
  4. A comparison with NumPy’s built-in solver (sanity check).
  5. A three-panel figure: original patch | prediction | residual (error map).

Estimated time: 2–3 hours if Python is new; 1–2 hours if comfortable with NumPy.


How this connects to Module 1

LessonWhere you use it in code
Vectors & matricesEach row [1, x, y] is a small vector; X is a matrix
Dot productspred = X @ w — one weighted sum per pixel
Probability / noiseSynthetic patch adds Gaussian noise; MSE averages over pixels
Gradient descentw = w - lr * grad loop you implement by hand

Folder layout:

text
phase1-linear-regression/
  your_photo.jpg          # optional
  train_patch.py          # load data + training loop + plots
  outputs/
    loss_curve.png
    three_panel.png

Before you start

  • Finish Lessons 1–4 and the quiz.
  • Install Python 3.10+, then: pip install numpy matplotlib pillow
  • Optional: a .jpg with a smooth sky or wall — or use synthetic data (no photo required).

Choose a photo patch (recommended path)

Why patch choice matters

The model assumes brightness is approximately linear in x and y. That is true for smooth gradients — not for busy texture (grass, hair, sharp edges). A bad patch does not break Python; it produces large residuals — a useful learning moment.

Steps

  1. Load an image (Pillow or similar) and convert to grayscale.
  2. Crop 32×32 (or 24×24) from a smooth region.
  3. For each pixel, record:
    • Inputs: [1, x, y] — the leading 1 lets w0 act as baseline
    • Target: brightness I at that pixel

You will have 32 × 32 = 1024 rows — 1024 tiny training examples from one patch.

Build the input table in code (sketch)

python
import numpy as np
from PIL import Image
 
img = np.array(Image.open("your_photo.jpg").convert("L"), dtype=float)
patch = img[row0:row0+32, col0:col0+32]  # pick row0, col0 for smooth region
H, W = patch.shape
rows = []
targets = []
for y in range(H):
    for x in range(W):
        rows.append([1.0, x, y])
        targets.append(patch[y, x])
X = np.array(rows)       # shape (1024, 3)
I = np.array(targets)    # shape (1024,)

Synthetic patch (no camera needed)

python
import numpy as np
H, W = 32, 32
ys, xs = np.meshgrid(np.arange(H), np.arange(W), indexing="ij")
I_grid = 80 + 1.2 * xs + 0.8 * ys
I_grid += np.random.normal(0, 2, size=I_grid.shape)
I_grid = np.clip(I_grid, 0, 255)
 
rows, targets = [], []
for y in range(H):
    for x in range(W):
        rows.append([1.0, x, y])
        targets.append(I_grid[y, x])
X = np.array(rows)
I = np.array(targets)

True slopes are near 1.2 and 0.8 — you can check if learning recovers them.


The model in plain English

Each row [1, x, y] produces one prediction:

predicted = w0×1 + w1×x + w2×y

All 1024 predictions at once: NumPy can multiply table X by weight list w in one line (X @ w).

This is Lesson 2’s dot product repeated — same weights, different x and y each time.


The loss: mean squared error

For each pixel, error = predicted − true.

Squared error = error² (big mistakes count more).

Mean squared error (MSE) = average of all squared errors across the patch.

You want MSE to go down as weights improve.

Checkpoint: If you multiply every squared error by 10, does the best weight change?

No — same best answer; slopes scale too, so you may need a smaller learning rate.


Gradients: what your code must compute

For each weight, ask: “If I increase this weight slightly, does average error go up or down, and how fast?”

That answer is the slope for that weight. With mean squared error, a compact NumPy form is:

gradient = (2 / N) × (X.transpose() @ errors)

where errors = predictions − targets and N is the number of rows.

Beginner path: implement slopes for 10 rows only in a loop, print them, compare to the formula above, then scale to the full patch.


Training loop (full skeleton)

python
import numpy as np
import matplotlib.pyplot as plt
 
N = X.shape[0]
w = np.zeros(3)
learning_rate = 1e-4   # tune if needed
losses = []
 
for epoch in range(800):
    pred = X @ w
    err = pred - I
    grad = (2 / N) * (X.T @ err)
    w = w - learning_rate * grad
    loss = (err @ err) / N
    losses.append(loss)
 
plt.plot(losses)
plt.xlabel("epoch")
plt.ylabel("mean squared error")
plt.title("Training error should trend downward")
plt.show()
 
print("Learned weights w0, w1, w2:", w)

If training misbehaves

SymptomTry
Error explodes or becomes NaNDivide learning_rate by 10
Error barely movesMultiply learning_rate by 3 (carefully)
Hard to tuneScale x, y to 0–1 and brightness to 0–1

Figure

What a healthy error curve looks like

Loss vs epochs (your project output)epochMSE
Downward trend that flattens. Spikes usually mean learning rate is too high.

Sanity check with NumPy

python
w_exact, _, _, _ = np.linalg.lstsq(X, I, rcond=None)
print("Gradient descent:", w)
print("NumPy exact:      ", w_exact)

They should be close. Large gaps → too few epochs, bad learning rate, or a bug in grad.


Visualize and interpret

  1. Original patch — grayscale image you cropped.
  2. Predicted patch — reshape X @ w back to 32×32.
  3. Residual — original minus predicted (shows where the linear model fails).

Write 3–5 sentences: What region did you use? Did the ramp fit well? Where are residuals large, and why (texture, shadow, edge)?

A perfect residual map is not the goal — understanding when the model fits and when it breaks is.


Deliverables checklist

  • Training loop that updates three weights
  • Error vs epoch plot (downward trend)
  • Comparison with lstsq
  • Three-panel visualization
  • Short written reflection

What you accomplished

Idea from Module 1Where you used it
Lists / tablesRows [1,x,y], targets I
Dot productsEach prediction = weights dot one row
Noise / averagingMSE averages over all pixels
Gradient descentYou implemented the update loop

Module 1 complete. Return to the curriculum. Module 2 will add classification metrics and more projects; a later PyTorch module will automate slopes while you keep control of learning rate and data.