Light, sensors, and the imaging pipeline
This lesson builds a mental model of what a digital image is before you touch convolutions or neural networks. If you know what is being measured (and what noise is), every later algorithm makes more sense.
Figure
Photons to pixels at a glance
Learning objectives
By the end of this lesson you should be able to:
- Trace the path from scene radiance to stored pixel values in plain language.
- Distinguish radiance, irradiance, and sensor response — and explain why algorithms care about linearity.
- Relate exposure settings (ISO, shutter, aperture) to signal-to-noise ratio.
- Explain quantization, demosaicing, and gamma with enough precision to debug artifacts.
- List the main sources of sensor noise and model shot noise with a simple formula.
- Sketch an ISP (image signal processor) pipeline and name what each stage changes.
Prerequisites
- Comfort with basic algebra (ratios, averages, square roots).
- Optional: any first exposure to RGB images as 3D arrays (height × width × channels).
Step 1 — What a camera actually measures
A camera does not “capture reality.” It samples light that reaches a sensor plane over a finite exposure time, through optics with a particular spectral response.
Radiance, irradiance, and the pixel well
- Radiance (units: W·sr⁻¹·m⁻²·nm⁻¹) describes how much power leaves a surface patch along a direction. It depends on material (BRDF), lighting, and geometry.
- At the sensor, you care about irradiance — power per unit area hitting the photodiode. The lens maps scene radiance to sensor irradiance; aperture and focal length set how much light is collected.
- Photons are converted to electrical charge in each pixel well. More photons → more electrons, up to full well capacity.
The lens focuses a bundle of rays so that (ideally) each small region on the sensor corresponds to a direction in the scene (pinhole / thin-lens story). Defocus spreads one scene point across multiple pixels — that blur is optical, not algorithmic.
Checkpoint (conceptual): In one sentence, what physical quantity is ultimately turned into an integer in your image file?
Answer sketch: Electrons accumulated in a pixel well during exposure, then amplified, digitized, and heavily processed — the stored integer is only loosely “brightness in the scene.”
Step 2 — Exposure triangle and signal-to-noise
Three controls dominate how many photons you collect:
| Control | Effect on photons | Typical side effect |
|---|---|---|
| Shutter time | Linear: 2× time ≈ 2× photons | Motion blur if scene or camera moves |
| Aperture (f-number) | Area ∝ ; f/2.8 vs f/5.6 is ~4× photons | Depth of field, vignetting |
| ISO / gain | Amplifies electronic signal after collection | Does not add photons; amplifies read noise too |
Signal-to-noise ratio (SNR) in a single pixel (schematic):
where is mean electron count from photons and is read noise (electrons). At high light, shot noise dominates; at low light, read noise sets the floor — boosting ISO does not fix missing photons.
Exercise: You shoot indoors at 1/30 s, f/2.0, ISO 3200 and see grain. List two capture changes that increase photons without raising ISO, and one change that only amplifies electronically.
Step 3 — From analog signal to discrete pixels
After readout, the analog signal is amplified and passed through an ADC (analog-to-digital converter).
- Bit depth (e.g. 10–14 bits on many RAW pipelines) sets how finely intensity is quantized before compression. Quantization step size adds noise on the order of (uniform quantizer intuition).
- Saturation happens when the well fills: highlights clip — no recovery in a single exposure without HDR fusion.
- Black level offsets exist: “zero light” is not digital zero after processing. RAW developers subtract a black level per channel before scaling.
Worked example: A 12-bit ADC gives levels. If full well is 60,000 electrons mapped to 3800 codes, one ADU ≈ 16 electrons. Clipping at code 4095 loses highlight detail permanently in that frame.
Checkpoint: Why do two different phones sometimes show different brightness for the same scene even before “filters”?
Different metering, tone mapping, color matrices, and auto-exposure targets — not necessarily different photon counts.
Step 4 — Color filters and demosaicing (Bayer)
Most color cameras place a CFA (color filter array) over the sensor. A common pattern is the Bayer mosaic (often RGGB): each pixel measures mostly one spectral band.
Because each spatial location does not have full RGB immediately, the camera (or RAW developer) interpolates missing colors — demosaicing.
Figure
Bayer color filter array (RGGB)
What demosaicing must infer
At a green site, R and B are unknown; algorithms use spatial and sometimes spectral correlation:
- Bilinear: average available neighbors — fast, zippering on edges.
- Edge-directed / malvar-he-cutler: steer interpolation along estimated edge direction — fewer color fringes, more compute.
Demosaicing choices affect edges (zippering, false color) and fine texture. Aggressive sharpening after demosaicing can create halos that downstream detectors treat as real structure.
Exercise (paper / notes): Sketch a 4×4 RGGB pattern. For the center green pixel, write which neighbors you would use in bilinear R and B estimates. Why does a red–blue edge confuse naive interpolation?
Step 5 — Gamma, linear light, and display encoding
Linear light: pixel value (after black-level correction) proportional to photoelectrons / irradiance.
Display-referred / sRGB: a transfer function compresses shadows and stretches midtones for human perception and 8-bit storage:
(approximate piecewise form; exact spec has linear toe.)
| Domain | Use in vision |
|---|---|
| Linear RAW | Photometric stereo, HDR merge, physically motivated shading |
| sRGB / JPEG | What most datasets and pretrained nets see |
| Log / PQ (video) | Wide dynamic range display pipelines |
Many classical algorithms assume linear intensity for physically meaningful operations. Deep networks often train on display-referred images anyway — but blending, sharpening, or shadow recovery in sRGB is not the same as in linear space.
Checkpoint: If you blur a JPEG in an editor and edges look “glowy,” what non-linearity might be involved?
Averaging encoded values is darker than averaging linear light then re-encoding — gamma bleeding.
Step 6 — Noise: models you can use
Shot (Poisson) noise
Photon arrivals are random. If mean count is , variance is also :
Relative noise improves with more light — expose to the right (ETTR) in RAW without clipping highlights uses this fact.
Read noise
Additive Gaussian in electrons, independent of signal. Dominates in shadows and short exposures.
Other sources
- Thermal / dark current — grows with exposure time and temperature.
- Fixed pattern noise (FPN) — column/row offsets; often calibrated out in ISP.
- Quantization noise — from ADC and 8-bit export.
Figure
Noise regimes vs signal level
Exercise: For a dark indoor frame vs a bright outdoor frame, which noise source tends to dominate visually in each case? How would stacking identical frames change SNR (approximately)?
Indoor: read + quantization; outdoor: shot. Stacking improvement if noise is independent between frames.
Step 7 — The ISP: from RAW to what algorithms see
A phone ISP typically runs on-sensor or immediately after readout:
| Stage | What it does | Vision impact |
|---|---|---|
| Black level / OB | Subtract offsets | Prevents color cast in shadows |
| Lens shading | Per-channel vignette correction | Uniform illumination for photometry |
| Demosaic | CFA → RGB | Edge artifacts if aggressive |
| White balance | Diagonal color scaling | Changes “true” color ratios |
| Color matrix | Sensor RGB → display RGB | Dataset color statistics |
| Tone / gamma | Dynamic range compression | Non-linear; affects gradients |
| Sharpen / NR | High-frequency boost or suppression | Fake edges, texture loss |
| JPEG encode | Lossy compression | Blocking, ringing |
Putting it together:
Scene → optics → CFA + exposure → RAW → ISP chain → stored file.
You do not need to implement each stage yet. You need the vocabulary to read datasheets, papers, and failure cases.
Deep dive — HDR, rolling shutter, and metrology
Multi-exposure HDR fuses short (highlights) and long (shadows) frames with alignment — ghosting if objects move.
Rolling shutter reads rows sequentially; fast motion or vibration skews geometry (wobbly buildings, bent propellers). Global shutter sensors avoid this at higher cost.
Radiometric calibration maps digital numbers to irradiance via flat-field panels and known lights — required for quantitative vision (agriculture, medical imaging, satellite).
When things go wrong (debugging checklist)
| Symptom | Often caused by |
|---|---|
| Purple/green fringes on edges | Demosaic + chromatic aberration |
| Banding in smooth skies | 8-bit + aggressive tone mapping |
| Flickering exposure between frames | Auto-exposure hunting — breaks optical flow / SLAM |
| “Crunchy” fine detail | Oversharpening after NR |
| Color shift under LED lights | Narrow-band spectra vs daylight-trained AWB |
Check your understanding
- Why does a single RAW “pixel” not immediately give you an RGB triple at that location?
- Name two reasons identical scenes might produce different digital numbers on two devices.
- Why might edge-aware vision algorithms behave differently on JPEG vs RAW-derived linear images?
- If you double shutter time and halve ISO, what happens to photon count and read-noise contribution?
- Why is averaging three JPEGs not equivalent to averaging three linear RAW frames?
Lab-style stretch goals
Histograms: Load an image, split channels, plot R/G/B histograms. Note clipping at 0 and 255 and skew — relate to exposure and tone curve.
RAW vs sRGB (if available): Develop the same RAW twice — “linear 16-bit” vs “camera JPEG.” Run Sobel magnitude on both (next lesson) and compare edge energy in shadows.