Features, matching, and robust estimation
This lesson is the bridge between pixels and discrete geometric constraints: detect repeatable keypoints, build descriptors, propose matches, then fit models (homography, essential matrix) despite outliers with RANSAC.
Figure
The classical feature pipeline
Learning objectives
- Derive the Harris corner criterion from the structure tensor.
- Compare SIFT, ORB, and learned local features at a practical level.
- Apply Lowe's ratio test and explain mutual nearest neighbor matching.
- Fit a homography and essential matrix; know when each model applies.
- Compute RANSAC iteration counts from outlier fraction.
- Outline epipolar geometry for two calibrated views.
Prerequisites
- Convolution / gradients lesson (structure tensor uses ).
- Camera projection lesson (homogeneous coordinates, ).
Step 1 — Structure tensor and Harris corners
In a window , accumulate gradient statistics:
Eigenvalues of characterize local structure:
| Region type | |
|---|---|
| both small | flat — no corner |
| one large, one small | edge — ambiguous along edge |
| both large | corner — distinctive |
Harris response (one common form):
with –. Peaks in are corner candidates; non-maximum suppression thins them.
Figure
Flat patch vs edge vs corner
Checkpoint: Why is a straight step edge a poor unique landmark?
Sliding along the edge does not change appearance — only one large eigenvalue.
Step 2 — Detect → describe → match
| Stage | Output | Failure mode |
|---|---|---|
| Detector | keypoints | Repeatability under blur / exposure |
| Descriptor | vector | Not invariant to desired transforms |
| Matcher | pairs | Ambiguity on repetitive texture |
SIFT (classical gold standard)
- DoG scale-space extrema for scale selection.
- Dominant orientation from gradient histogram.
- 128-D histogram-of-gradients descriptor, normalized for illumination.
ORB (fast, binary)
- FAST corners + oriented BRIEF (binary tests) → Hamming distance.
- Common on mobile; less robust to large scale change than SIFT.
Learned features (SuperPoint, etc.)
- CNN predicts keypoints + descriptors end-to-end; often better on texture-poor scenes at compute cost.
Exercise: List three transforms descriptors target (e.g. rotation) and one that still breaks matchers (e.g. strong specular highlight).
Step 3 — Matching and Lowe's ratio test
Nearest neighbor in descriptor space: .
Lowe's ratio test: accept match only if
(e.g. –), where is the second-best match. Rejects ambiguous matches on repetitive brick.
Mutual consistency: keep only if under the same metric — removes many one-way false positives.
Checkpoint: Why do brick walls break naive matching?
Many descriptors are equidistant — ratio test fails without distinct second-nearest gap.
Step 4 — Homography (planar / rotating camera)
If scene is planar or camera rotates about its center, 2D points relate by a homography (8 DOF):
DLT: each match gives 2 linear equations in entries of ; 4 matches minimum, more via least squares on inliers.
Exercise: Panorama stitching of a flat mural — homography or essential matrix? Why?
Homography — plane induces projective warp between views.
Step 5 — RANSAC and iteration budget
RANSAC loop:
- Sample minimal set (4 for homography, 5 for essential matrix in calibrated case).
- Fit model; count inliers within tolerance (pixels or Sampson distance).
- Keep best model; refine on all inliers.
Probability that at least one sample is all-inlier in iterations:
where = outlier fraction, = sample size. Solve for given desired .
Example: , , want 99% success: .
Figure
RANSAC keeps the line with the most votes
Checkpoint: Outlier fraction doubles — what happens to required ?
Grows quickly — RANSAC cost is why good descriptor + ratio test front-ends matter.
Step 6 — Epipolar geometry (two views, general 3D)
For calibrated cameras, normalized coords satisfy the epipolar constraint:
(essential matrix, 5 DOF). Uncalibrated case uses fundamental matrix (7 DOF).
- Epipolar line: in image 2, match for lies on line — search 1D instead of 2D.
- Pose from : decompose into four pairs; disambiguate with cheirality (points in front of both cameras).
Triangulation: with known and correspondences, least-squares triangulation (DLT or midpoint) yields 3D points — scale fixed if baseline metric.
| Model | DOF | When valid |
|---|---|---|
| Homography | 8 | Planar scene / pure rotation |
| Essential | 5 | General 3D, calibrated |
| Fundamental | 7 | General 3D, uncalibrated |
Deep dive — failure modes in production
| Symptom | Likely cause |
|---|---|
| Panorama tears | Parallax — non-planar scene, homography wrong |
| Few inliers | Motion blur, exposure change, repetitive texture |
| Ghost duplicates | Symmetric structures, wrong second-best in ratio test |
| Drift in VO | Pure rotation mistaken as translation without parallax |
Check your understanding
- What is the difference between a feature detector and a descriptor?
- Why does least-squares homography on all matches fail?
- Name two sources of false matches unrelated to descriptor distance.
- How many DOF does have, and why fewer than 9 entries?
- When is a homography exactly valid for a 3D scene?
Lab-style stretch goals
Match two desk photos with ORB or SIFT: visualize all matches, then RANSAC homography inliers vs outliers. Repeat with a scene that has depth variation — watch inlier count drop.
Stretch: Estimate with RANSAC, draw epipolar lines on a few points (OpenCV findFundamentalMat).