Features, matching, and robust estimation
This lesson is the bridge between pixels and discrete geometric constraints: you extract repeatable local descriptors, propose matches, then fit models (like homographies) despite outliers using RANSAC.
Figure
The classical feature pipeline
Learning objectives
- Explain why corners (high gradient variation in multiple directions) make stronger features than flat regions or straight edges alone.
- Describe the pipeline: detect → describe → match → robust fit.
- State what RANSAC optimizes for and why outliers break least-squares.
Prerequisites
- Convolution / gradients lesson.
- Camera projection lesson (helpful for geometric interpretation of matches).
Step 1 — What makes a “good” local feature?
A local feature is a point neighborhood that is:
- Detectable reliably under viewpoint and lighting changes (within limits).
- Describable with a vector summarizing local appearance for matching.
Harris corners (classic) score intensity structure using the second-moment matrix of image gradients.
Figure
Flat patch vs edge vs corner
Checkpoint: Why is a straight step edge a poor unique landmark compared to a corner?
Step 2 — From corners to descriptors (SIFT / ORB intuition)
Classical SIFT (Scale-Invariant Feature Transform) searches scale-space for extrema, assigns orientation, and forms a histogram-of-gradients descriptor.
Faster modern alternatives (e.g. ORB) trade some invariance for speed — common on mobile.
Exercise: List three transformations a descriptor tries to be invariant to, and one transformation that still commonly breaks matchers.
Step 3 — Nearest-neighbor matching and ratio test
Given descriptors dᵢ in image A and d′ⱼ in image B, a naive matcher finds the nearest neighbor in descriptor space (often Euclidean distance).
Lowe’s ratio test: compare best distance to second-best distance to reject ambiguous matches (e.g. repetitive texture).
Checkpoint: Why do brick walls break naive feature matching?
Step 4 — Geometric consistency: homography as an example
If the scene is approximately planar (or the camera purely rotates), corresponding points can be related by a projective homography (a matrix up to scale):
Matches that agree with the same are inliers; mismatches are outliers.
Exercise (conceptual): Give a real scene where the planar homography assumption fails badly.
Step 5 — RANSAC in plain language
RANSAC (Random Sample Consensus):
- Randomly sample the minimum set of matches needed to estimate a model (e.g. 4 points for homography).
- Count how many other matches agree within a tolerance (inliers).
- Repeat many iterations; keep the model with the most inliers.
- Optionally refine with all inliers.
Figure
RANSAC keeps the line with the most votes
Checkpoint: What happens to required iterations if the outlier fraction increases?
Step 6 — Epipolar geometry (preview pointer)
When you have two calibrated views of a general 3D scene, matches must satisfy the epipolar constraint rather than a single homography (unless planar). The next track modules in advanced courses develop the essential and fundamental matrices.
For now: remember that geometry narrows search — matching along epipolar lines is cheaper and more robust than global search.
Check your understanding
- What is the difference between a feature detector and a descriptor?
- Why does least-squares fitting of a homography to all matches (including wrong ones) fail?
- Name two sources of false matches unrelated to the descriptor itself.
Lab-style stretch goal (optional)
Run ORB or SIFT matching between two photos of the same desk. Visualize matches, then run homography RANSAC and visualize inliers vs outliers.