Camera models and projection
You will connect 3D geometry to 2D image coordinates using the standard pinhole camera model, then understand what intrinsics, extrinsics, and distortion mean in practice.
Figure
Pinhole projection — rays through the camera center
Learning objectives
- Write the pinhole projection equation in homogeneous coordinates.
- Decompose a camera matrix into intrinsic and extrinsic parameters.
- Explain radial distortion qualitatively and when it matters for calibration-heavy pipelines.
Prerequisites
- Basic linear algebra: matrix–vector multiply, inverse (conceptually).
- Comfort with “rays in 3D” intuition.
Step 1 — The pinhole idealization
A pinhole camera maps a 3D point in space to an intersection with the image plane along a straight line through the camera center.
In camera coordinates (often forward, right, down or up depending on convention), a common projection is:
where is focal length in the same units as (not “mm vs pixels” yet).
Checkpoint: What happens mathematically as ? Why do real lenses not behave like this at macro distances?
Step 2 — Homogeneous coordinates
Homogeneous 4-vectors let you bundle rotation, translation, and projection into chained matrices.
- A 3D point in world coordinates: homogeneous vector .
- Extrinsics map world → camera: (stack rotation and translation next to each other).
Exercise: Why do we use 4 components even though the point is 3D?
Step 3 — Intrinsics: from meters to pixels
Focal length in pixels and principal point encode the affine part of the mapping from normalized camera coordinates to pixel indices:
The intrinsic matrix packages this:
The skew is often ~0 for modern sensors.
Figure
World → camera → pixels
Checkpoint: If you crop the center of the image digitally (not physically recentering the sensor), which intrinsic parameters change?
Step 4 — The full projection (compact form)
Let . For a world point ,
where corresponds to depth in camera coordinates (up to sign conventions).
Exercise (conceptual): Why can you not recover absolute scene scale from a single image without additional constraints?
Step 5 — Lens distortion (radial first pass)
Real lenses bend rays; the pinhole model is an approximation. Radial distortion pulls pixels inward or outward as a function of radius from the principal point.
Figure
Pinhole vs barrel vs pincushion
- SLAM and photogrammetry pipelines often estimate distortion coefficients jointly with intrinsics.
- Many “AI” datasets ignore distortion; that is fine until you try to fuse CAD models with pixels.
Checkpoint: Where in the image are radial distortion effects usually most noticeable?
Step 6 — Calibration in one paragraph
Camera calibration estimates and distortion (and sometimes extrinsics per view) from images of a known pattern (checkerboard, AprilTag grid, etc.).
You do not need the full optimization derivation yet — you need to know what is being estimated and why reprojection error is the usual loss.
Check your understanding
- What is the difference between extrinsics and intrinsics?
- Why is homogeneous scaling arbitrary in projection, yet pixel coordinates are unique?
- Give one application where ignoring distortion would break downstream geometry.
Lab-style stretch goal (optional)
Use any calibration toolkit on printed checkerboard images. Report RMS reprojection error and show one “before vs after undistortion” crop.