Camera models and projection

You will connect 3D geometry to 2D image coordinates using the standard pinhole camera model, then understand what intrinsics, extrinsics, and distortion mean in practice.

Figure

Pinhole projection — rays through the camera center

Each 3D point's ray hits the image plane at (x, y) = (f·X/Z, f·Y/Z). Scale is lost — depth and size are entangled in a single image.

Learning objectives

Write the pinhole projection equation in homogeneous coordinates.
Decompose a camera matrix into intrinsic and extrinsic parameters.
Explain radial distortion qualitatively and when it matters for calibration-heavy pipelines.

Prerequisites

Basic linear algebra: matrix–vector multiply, inverse (conceptually).
Comfort with “rays in 3D” intuition.

Step 1 — The pinhole idealization

A pinhole camera maps a 3D point $X$ in space to an intersection with the image plane along a straight line through the camera center.

In camera coordinates (often $Z$ forward, $X$ right, $Y$ down or up depending on convention), a common projection is:

x = f \frac{X}{Z}, \quad y = f \frac{Y}{Z}

where $f$ is focal length in the same units as $X, Y, Z$ (not “mm vs pixels” yet).

Checkpoint: What happens mathematically as $Z \to 0^{+}$ ? Why do real lenses not behave like this at macro distances?

Step 2 — Homogeneous coordinates

Homogeneous 4-vectors let you bundle rotation, translation, and projection into chained matrices.

A 3D point in world coordinates: homogeneous vector $\tilde{X}_w = [X_w, Y_w, Z_w, 1]^\top$ .
Extrinsics map world → camera: $X_c = [R\,|\,t]\,\tilde{X}_w$ (stack rotation $R$ and translation $t$ next to each other).

Exercise: Why do we use 4 components even though the point is 3D?

Step 3 — Intrinsics: from meters to pixels

Focal length in pixels $(f_x, f_y)$ and principal point $(c_x, c_y)$ encode the affine part of the mapping from normalized camera coordinates to pixel indices:

u = f_x \frac{X_c}{Z_c} + c_x, \quad v = f_y \frac{Y_c}{Z_c} + c_y

The intrinsic matrix $K$ packages this:

K = \begin{bmatrix} f_x & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}

The skew $s$ is often ~0 for modern sensors.

Figure

World → camera → pixels

Extrinsics [R | t] move a 3D point into the camera frame; intrinsics K turn normalized coordinates into pixels.

Checkpoint: If you crop the center of the image digitally (not physically recentering the sensor), which intrinsic parameters change?

Step 4 — The full projection (compact form)

Let $M = K [R\,|\,t]$ . For a world point $\tilde{X}_w$ ,

\lambda \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = M \tilde{X}_w

where $\lambda$ corresponds to depth $Z_c$ in camera coordinates (up to sign conventions).

Exercise (conceptual): Why can you not recover absolute scene scale from a single image without additional constraints?

Step 5 — Lens distortion (radial first pass)

Real lenses bend rays; the pinhole model is an approximation. Radial distortion pulls pixels inward or outward as a function of radius from the principal point.

Figure

Pinhole vs barrel vs pincushion

A perfectly straight world grid bends near the edges under real lenses. Calibration estimates the coefficients that undo it.

SLAM and photogrammetry pipelines often estimate distortion coefficients jointly with intrinsics.
Many “AI” datasets ignore distortion; that is fine until you try to fuse CAD models with pixels.

Checkpoint: Where in the image are radial distortion effects usually most noticeable?

Step 6 — Calibration in one paragraph

Camera calibration estimates $K$ and distortion (and sometimes extrinsics per view) from images of a known pattern (checkerboard, AprilTag grid, etc.).

You do not need the full optimization derivation yet — you need to know what is being estimated and why reprojection error is the usual loss.

Check your understanding

What is the difference between extrinsics and intrinsics?
Why is homogeneous scaling arbitrary in projection, yet pixel coordinates are unique?
Give one application where ignoring distortion would break downstream geometry.

Lab-style stretch goal (optional)

Use any calibration toolkit on printed checkerboard images. Report RMS reprojection error and show one “before vs after undistortion” crop.