← Back to curriculum

Geometry & correspondence

Camera models and projection

Intrinsic and extrinsic parameters, the pinhole model, lens distortion, and projecting 3D points into pixels.

~65 min read + exercises

Camera models and projection

You will connect 3D geometry to 2D image coordinates using the standard pinhole camera model, then understand what intrinsics, extrinsics, and distortion mean in practice.

Figure

Pinhole projection — rays through the camera center

Pinhole projection: a 3D point becomes a 2D pixelRays through the camera center intersect the image plane at (u, v).Z (optical axis)image plane (z = f)C (camera center)X₁X₂x₁x₂f (focal length)Pinhole projectionx = f · X/Z, y = f · Y/Z(in camera coordinates, scale is lost.)
Each 3D point's ray hits the image plane at (x, y) = (f·X/Z, f·Y/Z). Scale is lost — depth and size are entangled in a single image.

Learning objectives

  • Write the pinhole projection equation in homogeneous coordinates.
  • Decompose a camera matrix into intrinsic and extrinsic parameters.
  • Explain radial distortion qualitatively and when it matters for calibration-heavy pipelines.

Prerequisites

  • Basic linear algebra: matrix–vector multiply, inverse (conceptually).
  • Comfort with “rays in 3D” intuition.

Step 1 — The pinhole idealization

A pinhole camera maps a 3D point XX in space to an intersection with the image plane along a straight line through the camera center.

In camera coordinates (often ZZ forward, XX right, YY down or up depending on convention), a common projection is:

x=fXZ,y=fYZx = f \frac{X}{Z}, \quad y = f \frac{Y}{Z}

where ff is focal length in the same units as X,Y,ZX, Y, Z (not “mm vs pixels” yet).

Checkpoint: What happens mathematically as Z0+Z \to 0^{+}? Why do real lenses not behave like this at macro distances?


Step 2 — Homogeneous coordinates

Homogeneous 4-vectors let you bundle rotation, translation, and projection into chained matrices.

  • A 3D point in world coordinates: homogeneous vector X~w=[Xw,Yw,Zw,1]\tilde{X}_w = [X_w, Y_w, Z_w, 1]^\top.
  • Extrinsics map world → camera: Xc=[Rt]X~wX_c = [R\,|\,t]\,\tilde{X}_w (stack rotation RR and translation tt next to each other).

Exercise: Why do we use 4 components even though the point is 3D?


Step 3 — Intrinsics: from meters to pixels

Focal length in pixels (fx,fy)(f_x, f_y) and principal point (cx,cy)(c_x, c_y) encode the affine part of the mapping from normalized camera coordinates to pixel indices:

u=fxXcZc+cx,v=fyYcZc+cyu = f_x \frac{X_c}{Z_c} + c_x, \quad v = f_y \frac{Y_c}{Z_c} + c_y

The intrinsic matrix KK packages this:

K=[fxscx0fycy001]K = \begin{bmatrix} f_x & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}

The skew ss is often ~0 for modern sensors.

Figure

World → camera → pixels

Extrinsics then intrinsics: world → camera → pixelsWorldXᵂCameraXᶜ = R Xᵂ + tNormalized(X/Z, Y/Z, 1)Pixels(u, v) = K · …extrinsics [R | t]project / Zintrinsics K
Extrinsics [R | t] move a 3D point into the camera frame; intrinsics K turn normalized coordinates into pixels.

Checkpoint: If you crop the center of the image digitally (not physically recentering the sensor), which intrinsic parameters change?


Step 4 — The full projection (compact form)

Let M=K[Rt]M = K [R\,|\,t]. For a world point X~w\tilde{X}_w,

λ[uv1]=MX~w\lambda \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = M \tilde{X}_w

where λ\lambda corresponds to depth ZcZ_c in camera coordinates (up to sign conventions).

Exercise (conceptual): Why can you not recover absolute scene scale from a single image without additional constraints?


Step 5 — Lens distortion (radial first pass)

Real lenses bend rays; the pinhole model is an approximation. Radial distortion pulls pixels inward or outward as a function of radius from the principal point.

Figure

Pinhole vs barrel vs pincushion

Radial lens distortion deforms straight world linesEffect is strongest at the periphery; calibration estimates the coefficients.Pinhole (no distortion)Barrel (k₁ < 0)Pincushion (k₁ > 0)
A perfectly straight world grid bends near the edges under real lenses. Calibration estimates the coefficients that undo it.
  • SLAM and photogrammetry pipelines often estimate distortion coefficients jointly with intrinsics.
  • Many “AI” datasets ignore distortion; that is fine until you try to fuse CAD models with pixels.

Checkpoint: Where in the image are radial distortion effects usually most noticeable?


Step 6 — Calibration in one paragraph

Camera calibration estimates KK and distortion (and sometimes extrinsics per view) from images of a known pattern (checkerboard, AprilTag grid, etc.).

You do not need the full optimization derivation yet — you need to know what is being estimated and why reprojection error is the usual loss.


Check your understanding

  1. What is the difference between extrinsics and intrinsics?
  2. Why is homogeneous scaling arbitrary in projection, yet pixel coordinates are unique?
  3. Give one application where ignoring distortion would break downstream geometry.

Lab-style stretch goal (optional)

Use any calibration toolkit on printed checkerboard images. Report RMS reprojection error and show one “before vs after undistortion” crop.