← Back to curriculum

Module 2 — Geometry & correspondence

Features, matching, and robust estimation

Harris corners, SIFT/ORB, Lowe's ratio test, homography vs essential matrix, RANSAC iteration math, and epipolar geometry.

~90 min read + exercises

Features, matching, and robust estimation

This lesson is the bridge between pixels and discrete geometric constraints: detect repeatable keypoints, build descriptors, propose matches, then fit models (homography, essential matrix) despite outliers with RANSAC.

Figure

The classical feature pipeline

The classical feature pipelineDetectkeypointsDescribevector / patchMatchnearest neighborRobust fitRANSAC + refine
Four stages: detect repeatable keypoints, describe them with a vector, find candidate matches, then keep only the geometry-consistent ones.

Learning objectives

  • Derive the Harris corner criterion from the structure tensor.
  • Compare SIFT, ORB, and learned local features at a practical level.
  • Apply Lowe's ratio test and explain mutual nearest neighbor matching.
  • Fit a homography and essential matrix; know when each model applies.
  • Compute RANSAC iteration counts from outlier fraction.
  • Outline epipolar geometry for two calibrated views.

Prerequisites

  • Convolution / gradients lesson (structure tensor uses Ix,IyI_x, I_y).
  • Camera projection lesson (homogeneous coordinates, KK).

Step 1 — Structure tensor and Harris corners

In a window WW, accumulate gradient statistics:

M=(x,y)W[Ix2IxIyIxIyIy2]M = \sum_{(x,y)\in W} \begin{bmatrix} I_x^2 & I_x I_y \\ I_x I_y & I_y^2 \end{bmatrix}

Eigenvalues λ1,λ2\lambda_1, \lambda_2 of MM characterize local structure:

λ1,λ2\lambda_1, \lambda_2Region type
both smallflat — no corner
one large, one smalledge — ambiguous along edge
both largecorner — distinctive

Harris response (one common form):

R=det(M)k(traceM)2=λ1λ2k(λ1+λ2)2R = \det(M) - k\,(\mathrm{trace}\,M)^2 = \lambda_1\lambda_2 - k(\lambda_1+\lambda_2)^2

with k0.04k \approx 0.040.060.06. Peaks in RR are corner candidates; non-maximum suppression thins them.

Figure

Flat patch vs edge vs corner

What makes a “good” local feature?A useful patch must change when shifted in any direction.Flatλ₁ ≈ λ₂ ≈ 0ambiguous — shifts in any direction look the sameEdgeλ₁ ≫ λ₂ambiguous along the edge directionCornerλ₁ ≈ λ₂ ≫ 0uniquely localizable in 2D
A patch only counts as a distinctive feature if shifting it in any direction makes the patch content change.

Checkpoint: Why is a straight step edge a poor unique landmark?

Sliding along the edge does not change appearance — only one large eigenvalue.


Step 2 — Detect → describe → match

StageOutputFailure mode
Detector(x,y,σ)(x,y,\sigma) keypointsRepeatability under blur / exposure
Descriptorvector dRD\mathbf{d} \in \mathbb{R}^DNot invariant to desired transforms
Matcherpairs (i,j)(i,j)Ambiguity on repetitive texture

SIFT (classical gold standard)

  • DoG scale-space extrema for scale selection.
  • Dominant orientation from gradient histogram.
  • 128-D histogram-of-gradients descriptor, normalized for illumination.

ORB (fast, binary)

  • FAST corners + oriented BRIEF (binary tests) → Hamming distance.
  • Common on mobile; less robust to large scale change than SIFT.

Learned features (SuperPoint, etc.)

  • CNN predicts keypoints + descriptors end-to-end; often better on texture-poor scenes at compute cost.

Exercise: List three transforms descriptors target (e.g. rotation) and one that still breaks matchers (e.g. strong specular highlight).


Step 3 — Matching and Lowe's ratio test

Nearest neighbor in descriptor space: j=argminjdidjj^* = \arg\min_j \|\mathbf{d}_i - \mathbf{d}'_j\|.

Lowe's ratio test: accept match iji \to j^* only if

didjdidj<ρ\frac{\|\mathbf{d}_i - \mathbf{d}'_{j^*}\|}{\|\mathbf{d}_i - \mathbf{d}'_{j^{**}}\|} < \rho

(e.g. ρ=0.7\rho = 0.70.80.8), where jj^{**} is the second-best match. Rejects ambiguous matches on repetitive brick.

Mutual consistency: keep iji \to j only if jij \to i under the same metric — removes many one-way false positives.

Checkpoint: Why do brick walls break naive matching?

Many descriptors are equidistant — ratio test fails without distinct second-nearest gap.


Step 4 — Homography (planar / rotating camera)

If scene is planar or camera rotates about its center, 2D points relate by a homography HR3×3H \in \mathbb{R}^{3\times 3} (8 DOF):

λ[uv1]=H[uv1]\lambda \begin{bmatrix} u' \\ v' \\ 1 \end{bmatrix} = H \begin{bmatrix} u \\ v \\ 1 \end{bmatrix}

DLT: each match gives 2 linear equations in entries of HH; 4 matches minimum, more via least squares on inliers.

Exercise: Panorama stitching of a flat mural — homography or essential matrix? Why?

Homography — plane induces projective warp between views.


Step 5 — RANSAC and iteration budget

RANSAC loop:

  1. Sample minimal set (4 for homography, 5 for essential matrix in calibrated case).
  2. Fit model; count inliers within tolerance ϵ\epsilon (pixels or Sampson distance).
  3. Keep best model; refine on all inliers.

Probability that at least one sample is all-inlier in kk iterations:

Psuccess=1(1(1e)s)kP_{\text{success}} = 1 - (1 - (1-e)^s)^k

where ee = outlier fraction, ss = sample size. Solve for kk given desired PsuccessP_{\text{success}}.

Example: e=0.5e=0.5, s=4s=4, want 99% success: (10.54)k=0.01k35(1-0.5^4)^k = 0.01 \Rightarrow k \approx 35.

Figure

RANSAC keeps the line with the most votes

RANSAC keeps the line that earns the most votesOutliers can dominate least-squares; RANSAC samples small subsets and counts agreement.Legendinlier (within ε)outlierbest modelinlier band ±ε
Inliers fall inside an ε-tolerance band around the candidate model. Outliers don't influence the chosen line — that's why RANSAC beats least-squares here.

Checkpoint: Outlier fraction doubles — what happens to required kk?

Grows quickly — RANSAC cost is why good descriptor + ratio test front-ends matter.


Step 6 — Epipolar geometry (two views, general 3D)

For calibrated cameras, normalized coords x^,x^\hat{\mathbf{x}}, \hat{\mathbf{x}}' satisfy the epipolar constraint:

x^Ex^=0\hat{\mathbf{x}}'^\top E \hat{\mathbf{x}} = 0

E=[t]×RE = [\mathbf{t}]_\times R (essential matrix, 5 DOF). Uncalibrated case uses fundamental matrix FF (7 DOF).

  • Epipolar line: in image 2, match for x^\hat{\mathbf{x}} lies on line l=Ex^l' = E\hat{\mathbf{x}} — search 1D instead of 2D.
  • Pose from EE: decompose EE into four (R,t)(R,\mathbf{t}) pairs; disambiguate with cheirality (points in front of both cameras).

Triangulation: with known P,PP, P' and correspondences, least-squares triangulation (DLT or midpoint) yields 3D points — scale fixed if baseline metric.

ModelDOFWhen valid
Homography8Planar scene / pure rotation
Essential EE5General 3D, calibrated
Fundamental FF7General 3D, uncalibrated

Deep dive — failure modes in production

SymptomLikely cause
Panorama tearsParallax — non-planar scene, homography wrong
Few inliersMotion blur, exposure change, repetitive texture
Ghost duplicatesSymmetric structures, wrong second-best in ratio test
Drift in VOPure rotation mistaken as translation without parallax

Check your understanding

  1. What is the difference between a feature detector and a descriptor?
  2. Why does least-squares homography on all matches fail?
  3. Name two sources of false matches unrelated to descriptor distance.
  4. How many DOF does EE have, and why fewer than 9 entries?
  5. When is a homography exactly valid for a 3D scene?

Lab-style stretch goals

Match two desk photos with ORB or SIFT: visualize all matches, then RANSAC homography inliers vs outliers. Repeat with a scene that has depth variation — watch inlier count drop.

Stretch: Estimate FF with RANSAC, draw epipolar lines on a few points (OpenCV findFundamentalMat).