← Back to curriculum

Module 4 — Object detection

IoU, NMS, mAP & evaluation

Worked IoU examples, NMS step-by-step, precision-recall and AP, COCO mAP@0.5:0.95, threshold tuning, and qualitative failure analysis.

~95 min read + exercises

IoU, NMS, mAP & evaluation

Before we begin

Training loss going down does not guarantee a shippable detector. Products care about: Did we find the objects? Are boxes tight? Are there duplicate false alarms?

IoU, NMS, and mAP are how the field answers those questions consistently. This lesson is as detailed as Module 2 metrics for classification — but for boxes.


What you will learn

  • Compute IoU by hand and in code.
  • Implement NMS step-by-step and interpret failures.
  • Build precision–recall for one class at multiple score thresholds.
  • Explain AP and COCO mAP@0.5:0.95.
  • Tune score threshold and NMS IoU on validation data.

Before this lesson


IoU — definition and worked example

IoU(A,B)=ABAB\mathrm{IoU}(A,B) = \frac{|A \cap B|}{|A \cup B|}

Figure

IoU geometry

IoU = area(A ∩ B) ÷ area(A ∪ B)Used to match predicted boxes to ground truth during training and evaluation.A (prediction)B (ground truth)A ∩ B
Intersection over union — 0 to 1.

Pred xyxy: [10, 10, 50, 50] → area 402=160040^2=1600
GT xyxy: [30, 30, 70, 70] → area 16001600
Intersection: [30,30,50,50]202=40020^2=400
Union: 1600+1600400=28001600+1600-400=2800
IoU =400/28000.143= 400/2800 \approx 0.143

At COCO threshold 0.5, this prediction is a miss (false positive unless another pred matches).

python
def iou_xyxy(box_a, box_b):
    xa1, ya1, xa2, ya2 = box_a
    xb1, yb1, xb2, yb2 = box_b
    ix1, iy1 = max(xa1, xb1), max(ya1, yb1)
    ix2, iy2 = min(xa2, xb2), min(ya2, yb2)
    inter = max(0, ix2 - ix1) * max(0, iy2 - iy1)
    area_a = (xa2 - xa1) * (ya2 - ya1)
    area_b = (xb2 - xb1) * (yb2 - yb1)
    union = area_a + area_b - inter
    return inter / union if union > 0 else 0.0

GIoU / DIoU (preview)

When boxes do not overlap, IoU = 0 — gradient may vanish. GIoU adds penalty for smallest enclosing box — still used in YOLO training losses.


Matching predictions to ground truth (per image)

For class cc, at score threshold tt:

  1. Filter preds with score ≥ tt and class cc.
  2. Sort preds by score descending.
  3. Greedy match: each pred → highest IoU unmatched GT if IoU ≥ threshold.
  4. Count TP, FP; unmatched GT → FN.
OutcomeMeaning
TPPred matched GT, correct class, IoU ok
FPPred unmatched or wrong class
FNGT with no matching pred

Non-maximum suppression (NMS)

Problem: dense detectors emit 5–50 overlapping boxes on one person.

Figure

NMS intuition

Non-maximum suppressionKeep highest score; suppress overlapping lower-score duplicates.Before NMS (3 boxes, 1 object)0.920.780.71After NMS0.92 kept
Keep best score; suppress overlapping duplicates.
python
def nms_xyxy(boxes, scores, iou_thresh=0.5):
    order = scores.argsort(descending=True)
    keep = []
    while order.numel() > 0:
        i = order[0].item()
        keep.append(i)
        if order.numel() == 1:
            break
        ious = torch.tensor([iou_xyxy(boxes[i], boxes[j]) for j in order[1:]])
        remaining = (ious <= iou_thresh).nonzero().squeeze(1)
        order = order[1:][remaining]
    return keep

Soft-NMS: reduce scores of overlapping boxes instead of deleting — better in crowds.

Failure: two people hugging → high mutual IoU → one person suppressed. Mitigations: lower NMS threshold, specialized crowd NMS, higher input resolution.


Precision and recall (detection)

At fixed score threshold:

precision=TPTP+FP,recall=TPTP+FN\mathrm{precision} = \frac{TP}{TP+FP}, \quad \mathrm{recall} = \frac{TP}{TP+FN}
Raise score thresholdUsually
Precision↑ fewer junk boxes
Recall↓ missed objects

Product mapping:

  • Safety-critical (miss = bad) → favor recall — lower threshold
  • Spammy UI overlays → favor precision — higher threshold

Average Precision (AP)

Vary score threshold → sweep precision/recall → curve.

Figure

PR curve

Precision–recall curve (one class)AP = area under this curve. Lower score threshold → move right (higher recall).RecallPrecision
AP = shaded area under curve (per VOC/COCO interpolation rules).

AP (one class): area under PR curve (COCO uses 101-point interpolation).

mAP: mean AP over classes. COCO mAP also averages over IoU thresholds 0.5,0.55,,0.950.5, 0.55, \ldots, 0.95.

MetricStrictness
AP@0.5Loose boxes OK
AP@0.75Tight localization
mAP@[.5:.95]Industry standard for papers

Never compare your AP@0.5 number to someone's COCO mAP — different scales.


Computing AP in the project (simplified)

python
# torchmetrics (extension)
from torchmetrics.detection import MeanAveragePrecision
 
metric = MeanAveragePrecision(iou_type="bbox")
metric.update(preds, targets)  # list of dicts in COCO format
out = metric.compute()
print(out["map"], out["map_50"], out["map_75"])

For course project, reporting mAP@0.5 on val is acceptable; note which metric you used.


Tuning inference knobs (validation only)

KnobDefault-ishEffect
score_thresh0.5–0.7FP vs FN trade-off
nms_thresh0.5duplicate removal aggressiveness
max_detections100cap boxes per image
python
# torchvision inference — model handles NMS internally
model.eval()
with torch.no_grad():
    pred = model([img])[0]
mask = pred["scores"] > 0.6
boxes = pred["boxes"][mask]

Log precision/recall on val while sweeping threshold — plot curve in project README.


Qualitative evaluation (mandatory)

Build a failure gallery:

Failure typeWhat to look for
Missed small objectsneed FPN / higher res
Duplicate boxesNMS / threshold
Class confusionmore data / hard negatives
Jittery boxes on videotemporal smoothing (Module 6)

mAP aggregates — images teach.


Check your understanding

  1. IoU 0.45 at threshold 0.5 — TP or FP?
  2. Lower NMS IoU threshold — more or fewer boxes kept?
  3. Why report AP@0.75 in addition to AP@0.5?

Sketches: (1) FP (below 0.5). (2) fewer (more aggressive suppression). (3) tight localization matters for grasping/measurement.


What's next

Lesson 5 — On-device detection