Light, sensors, and the imaging pipeline
This lesson builds a mental model of what a digital image is before you touch convolutions or neural networks. If you know what is being measured (and what noise is), every later algorithm makes more sense.
Figure
Photons to pixels at a glance
Learning objectives
By the end of this lesson you should be able to:
- Trace the path from scene radiance to stored pixel values in plain language.
- Explain quantization, demosaicing, and gamma at a high level.
- List the main sources of sensor noise and why preprocessing matters for vision pipelines.
Prerequisites
- Comfort with basic algebra (ratios, averages).
- Optional: any first exposure to RGB images as 3D arrays (height × width × channels).
Step 1 — What a camera actually measures
A camera does not “capture reality.” It samples light that reaches a sensor plane over a finite exposure time, through optics with a particular spectral response.
- Radiance along a ray depends on surface materials, lighting, and geometry.
- The lens focuses a bundle of rays onto the sensor so that (ideally) each small region on the sensor corresponds to a direction in the scene (the pinhole / thin-lens story).
- Photons are converted to electrical charge per pixel well; that charge is read out as a voltage and digitized.
Checkpoint (conceptual): In one sentence, what physical quantity is ultimately turned into an integer in your image file?
Step 2 — From analog signal to discrete pixels
After readout, the analog signal is amplified and passed through an ADC (analog-to-digital converter).
- Bit depth (e.g. 10–14 bits on many RAW pipelines) sets how finely intensity is quantized before compression.
- Saturation happens when the well fills: highlights clip.
- Black level offsets exist: “zero light” is not always digital zero after processing.
Checkpoint: Why do two different phones sometimes show different brightness for the same scene even before “filters”?
Step 3 — Color filters and demosaicing (Bayer)
Most color cameras place a CFA (color filter array) over the sensor. A common pattern is the Bayer mosaic: each pixel measures mostly red, green, or blue.
Because each spatial location does not have full RGB immediately, the camera (or RAW developer) interpolates missing colors — this is demosaicing.
Figure
Bayer color filter array (RGGB)
- Demosaising choices affect edges (zippering, false color) and fine texture.
- For vision: aggressive sharpening after demosaicing can create artifacts that downstream detectors latch onto.
Exercise (paper / notes): Sketch a tiny 4×4 Bayer pattern and mark which cells are R, G, and B. For one green pixel location, list which neighbors you would use to estimate missing R and B in a naive interpolation.
Step 4 — Gamma and “non-linear” pixel values
Stored 8-bit JPEGs are often not linear in light. A gamma curve (or more general tone mapping) maps sensor response to perceptually spaced values and display constraints.
- Many classical vision algorithms assume linear intensity for physically meaningful operations (e.g. shading, photometric stereo).
- Deep networks often train on display-referred images anyway — but understanding linear vs non-linear helps debug weird failures on edges and bloom.
Checkpoint: If you blur a JPEG in Photoshop and the edges look “glowy,” what non-linearity might be involved?
Step 5 — Noise: where it comes from
Common sources (not mutually exclusive):
- Photon shot noise — photon arrival is random; relative noise decreases as signal increases (Poisson intuition).
- Read noise — electronics add uncertainty even at short exposures.
- Thermal / dark current — matters more for long exposures.
- Quantization noise — rounding to discrete levels.
Figure
Noise regimes vs signal level
Exercise: For a dark indoor frame vs a bright outdoor frame, which noise source tends to dominate visually in each case?
Step 6 — A minimal “pipeline” picture
Putting it together as a block diagram you can redraw from memory:
Scene → optics → CFA + exposure → RAW → (black level, white balance) → demosaic → color correction → gamma / tone → compression → stored file.
You do not need to implement each stage yet. You need the vocabulary to read datasheets, papers, and failure cases.
Check your understanding
- Why does a single RAW “pixel” not immediately give you an RGB triple at that location?
- Name two reasons identical scenes might produce different digital numbers on two devices.
- Why might edge-aware vision algorithms behave differently on JPEG vs RAW-derived linear images?
Lab-style stretch goal (optional)
If you have Python and OpenCV handy: load an image, split channels, and plot intensity histograms for R, G, and B separately. Write three observations about skew and clipping.