← Back to curriculum

Module 1 — Math & intuition

Probability — when measurements lie a little

Signal and noise, averages and spread, grain in photos, histograms, and why training averages error over many samples.

~65 min read + exercises

Probability — when measurements lie a little

Before we begin

Take two photos of the same white wall, one second apart, same phone, same settings. They look identical. Now zoom until you see individual pixel values — they will not match exactly. One pixel might be 142, the next capture 139 or 145.

That is not a bug in your eyes. Every measurement carries noise — from light, from the sensor, from compression. If you treat a single pixel as absolute truth, you will misunderstand both photography and machine learning.

Probability (used lightly here) gives us language for uncertainty:

  • Average — what value we expect over many repeats
  • Spread — how much individual readings jump around
  • Noise model — a simple story: “true value + random wobble”

Models rarely see one perfect number. They see many noisy examples and learn patterns that stay true even when individual pixels lie a little.

Figure

Same scene, slightly different numbers every time

True signal + Gaussian noisepixel = true value + random noise — grain increases in low lightBell curve (typical noise shape)noise
Each pixel can be written as true brightness plus random noise. More noise means a grainier look.

What you will learn

  • Describe a pixel reading as signal + noise in plain English.
  • Compute a simple average from a table of chances.
  • Explain spread and why grainy photos have more of it.
  • Connect averaging error over many samples to how models train.

Before this lesson


Random does not mean “anything goes”

We say a pixel value is random when it can change between trials even if the scene is fixed. Random here does not mean “completely unpredictable” — it means “follows a pattern of likelihoods.”

Example readings at the same wall pixel:

  • Photo 1: 142
  • Photo 2: 139
  • Photo 3: 145

We might summarize: “usually near 140, rarely below 120 or above 160.” That summary is a distribution — a description of which values are common and which are rare.

Checkpoint: Why can’t you trust a single pixel as perfect ground truth?

Sensors, exposure, and processing all add variation. One sample is informative but not exact.


Average and spread: a toy sensor story

Imagine a broken sensor that only outputs 0 (black) or 255 (white), nothing in between:

ValueChance
030%
25570%

Average (expected value)

If you took millions of readings, what single number would they cluster around?

0 × 30% + 255 × 70% = 178.5

We call 178.5 the average or expected value. Individual readings are still only 0 or 255 — extreme — but the long-run center is 178.5.

Spread

Spread asks: how far do typical readings sit from that average?

Here spread is huge — values jump between extremes. A stable sensor might read 140, 141, 139, 142 — low spread. A grainy night photo might swing more — high spread.

Tools compute standard deviation as one measure of spread; you do not need the formula today. Remember the idea: low spread = trustworthy individual readings; high spread = noisy data.


Bell-shaped noise (the usual wobble)

In many real systems, noise looks like a bell curve when plotted:

measurement = true value + random noise

Read it aloud: “What we record equals the real brightness, plus a small random error.”

  • Most errors are small (near zero).
  • Large positive or negative errors are rarer.

That is the bell shape — many small wobbles, few big ones.

Grain in dark photos (why beginners notice noise at night)

In dim light, each pixel collects fewer photons. Relative to that weak signal, random error looks larger — the image looks grainy. Bright, well-exposed regions often look cleaner because the signal is stronger compared to the noise. Same math story, different feel in the image.


Histograms: see the distribution without formulas

A histogram counts how often each brightness appears in a patch:

  • A tall bar at 140 → many pixels near that brightness (flat gray wall).
  • A wide histogram → many different values (high contrast or heavy noise).
  • A narrow histogram → values clustered tightly (uniform region or blur).

Histograms let your eyes see average and spread without calculating them — useful when debugging datasets.


Why training uses average error

When a model predicts brightness predicted and the true value is true, one pixel’s squared error is:

(true − predicted)²

Squaring makes big mistakes count more than small ones — a reasonable choice when large errors are worse.

If you stopped at one pixel, noise would dominate: the model might chase random wobble instead of real structure. So training uses the average over many pixels and many images:

average error = (sum of all squared errors) ÷ (number of samples)

This is often called mean squared error (MSE).

Averaging:

  • Smooths random noise — errors partly cancel instead of steering the model randomly.
  • Gives one number to improve each step — “how wrong are we overall?”

Your Module 1 project plots this average error over training steps. When the curve goes down, the model is genuinely improving on the pattern, not memorizing one noisy pixel.


Summary

TermPlain meaning
Random variableA number that can change between repeated measurements
AverageLong-run typical value
SpreadHow much individual values scatter
Bell-shaped noiseSmall errors common, large errors rare
Mean squared errorAverage squared difference between prediction and truth

What's next

Derivatives and gradient descent — how learning works — you can measure error; next you learn how models reduce it automatically.