Segmentation losses & metrics
Before we begin
Whether you train U-Net, DeepLab, or a fine-tuned Mask R-CNN mask head, optimization boils down to losses on predictions and evaluation metrics that match the task. A forward pass on a 256×256 image produces hundreds of thousands of logits. Training needs a loss that scores every pixel; evaluation needs overlap metrics — not misleading “99% accuracy” on empty background.
This lesson covers:
- Per-pixel cross-entropy — default for semantic segmentation
- IoU — standard validation metric
- Dice — common in medical imaging and imbalanced foreground
- What to log so you catch broken models early
Figure
IoU intuition
What you will learn
- Wire
CrossEntropyLossfor(N, C, H, W)logits and(N, H, W)targets. - Compute IoU and Dice by hand and in code.
- Explain why pixel accuracy fails on imbalanced masks.
- Build a minimal evaluation loop for mIoU.
Before this lesson
Per-pixel cross-entropy
Same formula as MNIST — applied independently at each pixel.
For pixel i with target class y_i and logits z_i (length C):
loss_i = -log( softmax(z_i)[y_i] )
total loss = mean (or sum) over all pixelsPyTorch expects:
criterion = nn.CrossEntropyLoss() # optional: ignore_index=255 for void pixels
logits = model(images) # (N, C, H, W) float
targets = masks.long() # (N, H, W) int64 — values in 0 .. C-1
loss = criterion(logits, targets)Common bugs:
| Bug | Symptom |
|---|---|
Target shape (N, 1, H, W) without squeeze | Runtime error or wrong broadcast |
| Target floats 0.0–1.0 | CE expects class indices |
Logits (N, H, W, C) wrong dim order | Use (N, C, H, W) for Conv2d heads |
Instance note: Mask R-CNN applies BCE or CE on mask pixels inside each positive RoI — same per-pixel idea, smaller spatial crops.
Class imbalance — the accuracy trap
Pet / portrait masks are mostly background.
Example: 90% background, 10% pet.
A model that predicts background everywhere gets 90% pixel accuracy — looks great in a spreadsheet — but 0% useful pet IoU.
| Metric | What it rewards |
|---|---|
| Pixel accuracy | Majority class (background) |
| Foreground IoU | Overlap on the class you care about |
| Mean IoU (mIoU) | Average IoU across classes — fairer |
Rule for this course: always report mIoU or per-class IoU on validation; treat pixel accuracy as optional.
IoU (Intersection over Union)
Also called Jaccard index. For binary prediction mask P and ground truth G:
- 1.0 — perfect overlap
- 0.0 — no overlap
- Union in denominator penalizes both missed pixels and false alarm pixels
Worked example (counts of pixels)
| Region | Count |
|---|---|
| Predicted foreground only | 10 |
| Ground truth foreground only | 10 |
| Both foreground | 40 |
Intersection = 40. Union = 10 + 10 + 40 = 60.
IoU = 40/60 ≈ 0.67
Multi-class mIoU
Compute IoU for each class c separately (binary: “pixel is class c” vs not), then:
mIoU = mean( IoU_c for c in classes )Often ignore void / ignore_index classes in the average.
Code sketch
def iou_per_class(preds, targets, num_classes):
"""preds, targets: (N, H, W) int64"""
ious = []
for c in range(num_classes):
p = preds == c
t = targets == c
inter = (p & t).sum().item()
union = (p | t).sum().item()
if union == 0:
ious.append(float("nan")) # class absent in batch
else:
ious.append(inter / union)
return ious # nanmean for mIoUDice coefficient
For binary masks, Dice is the same as F1 score on pixels.
| IoU | Dice (binary) |
|---|---|
| Emphasizes union | Emphasizes overlap vs total pred+GT mass |
| Standard in detection/seg benchmarks | Very common in medical segmentation |
Dice loss: 1 - Dice — gradients push for overlap; sometimes mixed with CE:
loss = CE + λ * (1 - Dice)For the pet project, CE alone is enough to start; add Dice if foreground IoU plateaus.
Oxford-IIIT Pet — three classes in the project
Trimap labels (after remapping in the project):
| ID | Meaning |
|---|---|
| 0 | Pet (foreground) |
| 1 | Background |
| 2 | Border / trimap transition |
Border pixels are thin — easy to miss. A model can score high IoU on background and pet while border IoU stays low. Per-class IoU table in README tells the full story.
What to log every epoch
| Log | Why |
|---|---|
train_loss | Optimization progress |
val_mIoU | Generalization — pick best checkpoint |
| Per-class val IoU | Spot weak classes (often border) |
| 3–5 overlay PNGs | Human eyes catch failures metrics miss |
Overlay = 0.6 * RGB image + 0.4 * colorized pred maskIf loss drops but overlays look worse → bug, overfit, or metric computed wrong.
Train vs val discipline
- Tune thresholds / early stopping on validation mIoU.
- Touch test set once at the end for honest reporting.
- Same rule as Module 2 spam project — segmentation is no different.
Failure modes checklist
| Observation | Likely cause |
|---|---|
| mIoU ≈ 0 always | Wrong mask encoding; CE on wrong value range |
| mIoU high, overlays wrong | Metric on subset; color map bug |
| All-background predictions | Class imbalance — check foreground IoU |
| Striped masks | Augmentation misalignment |
Checkpoint
- Logits shape
(4, 3, 128, 128)— what is3? - 1000 GT foreground pixels, model predicts 0 — IoU?
- Why log overlays if loss already decreases?
Answers: (1)
num_classeschannel dimension. (2) 0 — intersection empty. (3) Loss can improve on easy background pixels while borders stay wrong; overlays reveal that.
What's next
Module 5 quiz — then the U-Net project.