Camera models and projection
You will connect 3D geometry to 2D image coordinates using the pinhole camera model, then understand intrinsics, extrinsics, distortion, and calibration well enough to implement projection in code.
Figure
Pinhole projection — rays through the camera center
Learning objectives
- Write pinhole projection in camera coordinates and in homogeneous form.
- Build the intrinsic matrix and compose .
- Project a 3D world point to pixels with a worked numeric example.
- Explain radial and tangential distortion and the Brown–Conrady model qualitatively.
- Describe Zhang's calibration method and reprojection error.
- State why a single view cannot recover metric scale.
Prerequisites
- Basic linear algebra: matrix–vector multiply, homogeneous coordinates.
- Convolution lesson (helpful for understanding image planes as grids).
Step 1 — The pinhole idealization
A pinhole camera maps a 3D point in camera coordinates ( along the optical axis) to the image plane:
is focal length in the same units as (meters on the sensor plane before pixel scaling).
Checkpoint: What happens as ? Why do real lenses fail at macro distances?
Division blows up; real lenses have minimum focus distance and finite aperture — not a true pinhole.
Step 2 — Homogeneous coordinates
Augment 3D points: . Then
- Extrinsics : rigid transform world → camera, .
- Homogeneous scale is arbitrary until you divide; for the usual forward camera.
Exercise: Why do we use 4 components for a 3D point?
So translation becomes matrix multiplication: is .
Step 3 — Intrinsics: meters to pixels
- : principal point — intersection of optical axis with sensor (often near image center, rarely exact center pixel).
- in pixels: where is pixels per mm.
- Skew : shear if sensor axes not perpendicular; ~0 on modern phones.
Figure
World → camera → pixels
Checkpoint: If you crop the center 512×512 from a 4K frame in software (no sensor change), what changes in ?
shift; unchanged in pixel units if crop is pure translation on the grid.
Step 4 — Worked projection example
World point m. Suppose
Camera coords = world coords. Normalized: , .
Exercise: Move the point to . How do change? What does that say about apparent size vs depth?
Coordinates double — closer points project larger; scale ambiguity in a single image.
Step 5 — Full projection matrix
has 11 degrees of freedom up to scale (15 entries minus rank constraints). Calibration estimates and distortion; pose estimation finds per view.
Checkpoint: Why can you not recover absolute metric scale from one image?
is defined only up to similarity transform of the world unless you fix scale with a known object size or multi-view triangulation.
Step 6 — Lens distortion
Real lenses bend rays. Common Brown–Conrady radial model (2D normalized coords ):
(similarly for ; tangential terms model decentering).
Figure
Pinhole vs barrel vs pincushion
- Undistortion: solve for undistorted normalized coords (iterative or closed-form approximations), then apply .
- SLAM / AR: ignore distortion at your peril on wide-angle phone lenses.
Checkpoint: Where are radial effects usually strongest?
Image periphery where is large.
Step 7 — Calibration with Zhang's method (outline)
Given a planar checkerboard of known square size:
- Detect corners in multiple images at different poses.
- Each view gives homography constraints linking board plane to image.
- Solve for , distortion , and per-image via nonlinear least squares.
- Minimize reprojection error:
where is observed corner in image and is projection including distortion.
| RMS reprojection (px) | Typical interpretation |
|---|---|
| < 0.3 | Excellent |
| 0.3 – 0.7 | Usable for many apps |
| > 1.0 | Re-check board size, focus, motion blur |
Deep dive — coordinate conventions and hand–eye
OpenCV vs OpenGL vs robotics: may point down in image rows but up in world; always document which frame maps between.
Hand–eye calibration (robotics track preview): relates camera frame to gripper frame so pixels → grasp poses. Needs known motion and calibration target or structure.
Check your understanding
- What is the difference between extrinsics and intrinsics?
- Why is homogeneous scale arbitrary in projection, yet pixel coordinates are unique after division?
- Give one application where ignoring distortion would break downstream geometry.
- If , what physical imperfection might that encode?
- Why are at least two views needed to triangulate a 3D point?
Lab-style stretch goals
Calibrate with a printed checkerboard (OpenCV calibrateCamera or similar). Report RMS error and show undistorted lines on a straight-edge scene.
Code sketch: Implement project(K, R, t, X_w) returning ; verify on one corner of your board.