← Back to curriculum

Module 1 — Imaging & digital images

Light, sensors, and the imaging pipeline

Radiance vs irradiance, exposure and SNR, Bayer demosaicing, gamma/linear light, noise models, and the full ISP from RAW to JPEG.

~75 min read + exercises

Light, sensors, and the imaging pipeline

This lesson builds a mental model of what a digital image is before you touch convolutions or neural networks. If you know what is being measured (and what noise is), every later algorithm makes more sense.

Figure

Photons to pixels at a glance

Photons → numbers: a typical RGB camera pipelineEvery stage adds choices (and possibly artifacts) you may need to debug later.SceneradianceOpticslens + apertureSensorCFA + exposureADCquantizationRAWlinear bitsDemosaicinterpolate RGBGamma + tonenon-linearStored fileJPEG / HEICLinear, scene-referred valuesDisplay-referred, perceptually spaced
A typical RGB pipeline: every stage is a choice that can later show up as an artifact.

Learning objectives

By the end of this lesson you should be able to:

  • Trace the path from scene radiance to stored pixel values in plain language.
  • Distinguish radiance, irradiance, and sensor response — and explain why algorithms care about linearity.
  • Relate exposure settings (ISO, shutter, aperture) to signal-to-noise ratio.
  • Explain quantization, demosaicing, and gamma with enough precision to debug artifacts.
  • List the main sources of sensor noise and model shot noise with a simple formula.
  • Sketch an ISP (image signal processor) pipeline and name what each stage changes.

Prerequisites

  • Comfort with basic algebra (ratios, averages, square roots).
  • Optional: any first exposure to RGB images as 3D arrays (height × width × channels).

Step 1 — What a camera actually measures

A camera does not “capture reality.” It samples light that reaches a sensor plane over a finite exposure time, through optics with a particular spectral response.

Radiance, irradiance, and the pixel well

  • Radiance LL (units: W·sr⁻¹·m⁻²·nm⁻¹) describes how much power leaves a surface patch along a direction. It depends on material (BRDF), lighting, and geometry.
  • At the sensor, you care about irradiance EE — power per unit area hitting the photodiode. The lens maps scene radiance to sensor irradiance; aperture and focal length set how much light is collected.
  • Photons are converted to electrical charge in each pixel well. More photons → more electrons, up to full well capacity.

The lens focuses a bundle of rays so that (ideally) each small region on the sensor corresponds to a direction in the scene (pinhole / thin-lens story). Defocus spreads one scene point across multiple pixels — that blur is optical, not algorithmic.

Checkpoint (conceptual): In one sentence, what physical quantity is ultimately turned into an integer in your image file?

Answer sketch: Electrons accumulated in a pixel well during exposure, then amplified, digitized, and heavily processed — the stored integer is only loosely “brightness in the scene.”


Step 2 — Exposure triangle and signal-to-noise

Three controls dominate how many photons you collect:

ControlEffect on photonsTypical side effect
Shutter timeLinear: 2× time ≈ 2× photonsMotion blur if scene or camera moves
Aperture (f-number)Area ∝ 1/N21/N^2; f/2.8 vs f/5.6 is ~4× photonsDepth of field, vignetting
ISO / gainAmplifies electronic signal after collectionDoes not add photons; amplifies read noise too

Signal-to-noise ratio (SNR) in a single pixel (schematic):

SNRNsignalNsignal+σread2\mathrm{SNR} \approx \frac{N_{\text{signal}}}{\sqrt{N_{\text{signal}} + \sigma_{\text{read}}^2}}

where NsignalN_{\text{signal}} is mean electron count from photons and σread\sigma_{\text{read}} is read noise (electrons). At high light, shot noise Nsignal\sqrt{N_{\text{signal}}} dominates; at low light, read noise sets the floor — boosting ISO does not fix missing photons.

Exercise: You shoot indoors at 1/30 s, f/2.0, ISO 3200 and see grain. List two capture changes that increase photons without raising ISO, and one change that only amplifies electronically.


Step 3 — From analog signal to discrete pixels

After readout, the analog signal is amplified and passed through an ADC (analog-to-digital converter).

  • Bit depth (e.g. 10–14 bits on many RAW pipelines) sets how finely intensity is quantized before compression. Quantization step size Δ\Delta adds noise on the order of Δ/12\Delta/\sqrt{12} (uniform quantizer intuition).
  • Saturation happens when the well fills: highlights clip — no recovery in a single exposure without HDR fusion.
  • Black level offsets exist: “zero light” is not digital zero after processing. RAW developers subtract a black level per channel before scaling.

Worked example: A 12-bit ADC gives 212=40962^{12} = 4096 levels. If full well is 60,000 electrons mapped to 3800 codes, one ADU ≈ 16 electrons. Clipping at code 4095 loses highlight detail permanently in that frame.

Checkpoint: Why do two different phones sometimes show different brightness for the same scene even before “filters”?

Different metering, tone mapping, color matrices, and auto-exposure targets — not necessarily different photon counts.


Step 4 — Color filters and demosaicing (Bayer)

Most color cameras place a CFA (color filter array) over the sensor. A common pattern is the Bayer mosaic (often RGGB): each pixel measures mostly one spectral band.

Because each spatial location does not have full RGB immediately, the camera (or RAW developer) interpolates missing colors — demosaicing.

Figure

Bayer color filter array (RGGB)

Bayer CFA (RGGB) — each sensor pixel samples only one colorGreen is sampled 2× per 2×2 block to match human luminance sensitivity.RGRGRGGBGBGBRGRGRGGBGBGBRGRGRGGBGBGBA single 2×2 tileR, G, G, B — colors interpolatedfrom neighbors (demosaicing).R sampleG sampleB sample
Each sensor pixel measures only one color; the missing two are filled in by demosaicing from neighbors.

What demosaicing must infer

At a green site, R and B are unknown; algorithms use spatial and sometimes spectral correlation:

  • Bilinear: average available neighbors — fast, zippering on edges.
  • Edge-directed / malvar-he-cutler: steer interpolation along estimated edge direction — fewer color fringes, more compute.

Demosaicing choices affect edges (zippering, false color) and fine texture. Aggressive sharpening after demosaicing can create halos that downstream detectors treat as real structure.

Exercise (paper / notes): Sketch a 4×4 RGGB pattern. For the center green pixel, write which neighbors you would use in bilinear R and B estimates. Why does a red–blue edge confuse naive interpolation?


Step 5 — Gamma, linear light, and display encoding

Linear light: pixel value (after black-level correction) proportional to photoelectrons / irradiance.

Display-referred / sRGB: a transfer function compresses shadows and stretches midtones for human perception and 8-bit storage:

VsRGB{12.92LL0.00313081.055L1/2.40.055L>0.0031308V_{\text{sRGB}} \approx \begin{cases} 12.92\, L & L \le 0.0031308 \\ 1.055\, L^{1/2.4} - 0.055 & L > 0.0031308 \end{cases}

(approximate piecewise form; exact spec has linear toe.)

DomainUse in vision
Linear RAWPhotometric stereo, HDR merge, physically motivated shading
sRGB / JPEGWhat most datasets and pretrained nets see
Log / PQ (video)Wide dynamic range display pipelines

Many classical algorithms assume linear intensity for physically meaningful operations. Deep networks often train on display-referred images anyway — but blending, sharpening, or shadow recovery in sRGB is not the same as in linear space.

Checkpoint: If you blur a JPEG in an editor and edges look “glowy,” what non-linearity might be involved?

Averaging encoded values is darker than averaging linear light then re-encoding — gamma bleeding.


Step 6 — Noise: models you can use

Shot (Poisson) noise

Photon arrivals are random. If mean count is μ\mu, variance is also μ\mu:

σshot=μ\sigma_{\text{shot}} = \sqrt{\mu}

Relative noise σ/μ=1/μ\sigma/\mu = 1/\sqrt{\mu} improves with more light — expose to the right (ETTR) in RAW without clipping highlights uses this fact.

Read noise

Additive Gaussian in electrons, independent of signal. Dominates in shadows and short exposures.

Other sources

  • Thermal / dark current — grows with exposure time and temperature.
  • Fixed pattern noise (FPN) — column/row offsets; often calibrated out in ISP.
  • Quantization noise — from ADC and 8-bit export.

Figure

Noise regimes vs signal level

Where does the noise come from?Schematic: shot noise grows with signal; read noise sets a floor.noise σsignal (photons)total σshot noise (√S)read noise (≈ const)
Schematic: read noise sets a floor; shot noise grows like √signal. At low light the floor dominates; at bright light shot noise wins.

Exercise: For a dark indoor frame vs a bright outdoor frame, which noise source tends to dominate visually in each case? How would stacking NN identical frames change SNR (approximately)?

Indoor: read + quantization; outdoor: shot. Stacking N\sqrt{N} improvement if noise is independent between frames.


Step 7 — The ISP: from RAW to what algorithms see

A phone ISP typically runs on-sensor or immediately after readout:

StageWhat it doesVision impact
Black level / OBSubtract offsetsPrevents color cast in shadows
Lens shadingPer-channel vignette correctionUniform illumination for photometry
DemosaicCFA → RGBEdge artifacts if aggressive
White balanceDiagonal color scalingChanges “true” color ratios
Color matrixSensor RGB → display RGBDataset color statistics
Tone / gammaDynamic range compressionNon-linear; affects gradients
Sharpen / NRHigh-frequency boost or suppressionFake edges, texture loss
JPEG encodeLossy compressionBlocking, ringing

Putting it together:

Scene → optics → CFA + exposure → RAW → ISP chain → stored file.

You do not need to implement each stage yet. You need the vocabulary to read datasheets, papers, and failure cases.


Deep dive — HDR, rolling shutter, and metrology

Multi-exposure HDR fuses short (highlights) and long (shadows) frames with alignment — ghosting if objects move.

Rolling shutter reads rows sequentially; fast motion or vibration skews geometry (wobbly buildings, bent propellers). Global shutter sensors avoid this at higher cost.

Radiometric calibration maps digital numbers to irradiance via flat-field panels and known lights — required for quantitative vision (agriculture, medical imaging, satellite).


When things go wrong (debugging checklist)

SymptomOften caused by
Purple/green fringes on edgesDemosaic + chromatic aberration
Banding in smooth skies8-bit + aggressive tone mapping
Flickering exposure between framesAuto-exposure hunting — breaks optical flow / SLAM
“Crunchy” fine detailOversharpening after NR
Color shift under LED lightsNarrow-band spectra vs daylight-trained AWB

Check your understanding

  1. Why does a single RAW “pixel” not immediately give you an RGB triple at that location?
  2. Name two reasons identical scenes might produce different digital numbers on two devices.
  3. Why might edge-aware vision algorithms behave differently on JPEG vs RAW-derived linear images?
  4. If you double shutter time and halve ISO, what happens to photon count and read-noise contribution?
  5. Why is averaging three JPEGs not equivalent to averaging three linear RAW frames?

Lab-style stretch goals

Histograms: Load an image, split channels, plot R/G/B histograms. Note clipping at 0 and 255 and skew — relate to exposure and tone curve.

RAW vs sRGB (if available): Develop the same RAW twice — “linear 16-bit” vs “camera JPEG.” Run Sobel magnitude on both (next lesson) and compare edge energy in shadows.