Features, matching, and robust estimation

This lesson is the bridge between pixels and discrete geometric constraints: you extract repeatable local descriptors, propose matches, then fit models (like homographies) despite outliers using RANSAC.

Figure

The classical feature pipeline

Four stages: detect repeatable keypoints, describe them with a vector, find candidate matches, then keep only the geometry-consistent ones.

Learning objectives

Explain why corners (high gradient variation in multiple directions) make stronger features than flat regions or straight edges alone.
Describe the pipeline: detect → describe → match → robust fit.
State what RANSAC optimizes for and why outliers break least-squares.

Prerequisites

Convolution / gradients lesson.
Camera projection lesson (helpful for geometric interpretation of matches).

Step 1 — What makes a “good” local feature?

A local feature is a point neighborhood that is:

Detectable reliably under viewpoint and lighting changes (within limits).
Describable with a vector summarizing local appearance for matching.

Harris corners (classic) score intensity structure using the second-moment matrix of image gradients.

Figure

Flat patch vs edge vs corner

A patch only counts as a distinctive feature if shifting it in any direction makes the patch content change.

Checkpoint: Why is a straight step edge a poor unique landmark compared to a corner?

Step 2 — From corners to descriptors (SIFT / ORB intuition)

Classical SIFT (Scale-Invariant Feature Transform) searches scale-space for extrema, assigns orientation, and forms a histogram-of-gradients descriptor.

Faster modern alternatives (e.g. ORB) trade some invariance for speed — common on mobile.

Exercise: List three transformations a descriptor tries to be invariant to, and one transformation that still commonly breaks matchers.

Step 3 — Nearest-neighbor matching and ratio test

Given descriptors dᵢ in image A and d′ⱼ in image B, a naive matcher finds the nearest neighbor in descriptor space (often Euclidean distance).

Lowe’s ratio test: compare best distance to second-best distance to reject ambiguous matches (e.g. repetitive texture).

Checkpoint: Why do brick walls break naive feature matching?

Step 4 — Geometric consistency: homography as an example

If the scene is approximately planar (or the camera purely rotates), corresponding points can be related by a projective homography $H$ (a $3 \times 3$ matrix up to scale):

\lambda \begin{bmatrix} u' \\ v' \\ 1 \end{bmatrix} = H \begin{bmatrix} u \\ v \\ 1 \end{bmatrix}

Matches that agree with the same $H$ are inliers; mismatches are outliers.

Exercise (conceptual): Give a real scene where the planar homography assumption fails badly.

Step 5 — RANSAC in plain language

RANSAC (Random Sample Consensus):

Randomly sample the minimum set of matches needed to estimate a model (e.g. 4 points for homography).
Count how many other matches agree within a tolerance (inliers).
Repeat many iterations; keep the model with the most inliers.
Optionally refine with all inliers.

Figure

RANSAC keeps the line with the most votes

Inliers fall inside an ε-tolerance band around the candidate model. Outliers don't influence the chosen line — that's why RANSAC beats least-squares here.

Checkpoint: What happens to required iterations if the outlier fraction increases?

Step 6 — Epipolar geometry (preview pointer)

When you have two calibrated views of a general 3D scene, matches must satisfy the epipolar constraint rather than a single homography (unless planar). The next track modules in advanced courses develop the essential and fundamental matrices.

For now: remember that geometry narrows search — matching along epipolar lines is cheaper and more robust than global search.

Check your understanding

What is the difference between a feature detector and a descriptor?
Why does least-squares fitting of a homography to all matches (including wrong ones) fail?
Name two sources of false matches unrelated to the descriptor itself.

Lab-style stretch goal (optional)

Run ORB or SIFT matching between two photos of the same desk. Visualize matches, then run homography RANSAC and visualize inliers vs outliers.