Segmentation losses & metrics

Before we begin

Whether you train U-Net, DeepLab, or a fine-tuned Mask R-CNN mask head, optimization boils down to losses on predictions and evaluation metrics that match the task. A forward pass on a 256×256 image produces hundreds of thousands of logits. Training needs a loss that scores every pixel; evaluation needs overlap metrics — not misleading “99% accuracy” on empty background.

This lesson covers:

Per-pixel cross-entropy — default for semantic segmentation
IoU — standard validation metric
Dice — common in medical imaging and imbalanced foreground
What to log so you catch broken models early

Figure

IoU intuition

Overlap divided by union — 1.0 is perfect, 0.0 is no overlap.

What you will learn

Wire CrossEntropyLoss for (N, C, H, W) logits and (N, H, W) targets.
Compute IoU and Dice by hand and in code.
Explain why pixel accuracy fails on imbalanced masks.
Build a minimal evaluation loop for mIoU.

Before this lesson

Per-pixel cross-entropy

Same formula as MNIST — applied independently at each pixel.

For pixel i with target class y_i and logits z_i (length C):

text

loss_i = -log( softmax(z_i)[y_i] )
total loss = mean (or sum) over all pixels

PyTorch expects:

python

criterion = nn.CrossEntropyLoss()  # optional: ignore_index=255 for void pixels
 
logits = model(images)       # (N, C, H, W) float
targets = masks.long()       # (N, H, W) int64 — values in 0 .. C-1
 
loss = criterion(logits, targets)

Common bugs:

Bug	Symptom
Target shape `(N, 1, H, W)` without squeeze	Runtime error or wrong broadcast
Target floats 0.0–1.0	CE expects class indices
Logits `(N, H, W, C)` wrong dim order	Use `(N, C, H, W)` for `Conv2d` heads

Instance note: Mask R-CNN applies BCE or CE on mask pixels inside each positive RoI — same per-pixel idea, smaller spatial crops.

Class imbalance — the accuracy trap

Pet / portrait masks are mostly background.

Example: 90% background, 10% pet.

A model that predicts background everywhere gets 90% pixel accuracy — looks great in a spreadsheet — but 0% useful pet IoU.

Metric	What it rewards
Pixel accuracy	Majority class (background)
Foreground IoU	Overlap on the class you care about
Mean IoU (mIoU)	Average IoU across classes — fairer

Rule for this course: always report mIoU or per-class IoU on validation; treat pixel accuracy as optional.

IoU (Intersection over Union)

Also called Jaccard index. For binary prediction mask P and ground truth G:

\text{IoU} = \frac{|P \cap G|}{|P \cup G|}

1.0 — perfect overlap
0.0 — no overlap
Union in denominator penalizes both missed pixels and false alarm pixels

Worked example (counts of pixels)

Region	Count
Predicted foreground only	10
Ground truth foreground only	10
Both foreground	40

Intersection = 40. Union = 10 + 10 + 40 = 60.

IoU = 40/60 ≈ 0.67

Multi-class mIoU

Compute IoU for each class c separately (binary: “pixel is class c” vs not), then:

text

mIoU = mean( IoU_c for c in classes )

Often ignore void / ignore_index classes in the average.

Code sketch

python

def iou_per_class(preds, targets, num_classes):
    """preds, targets: (N, H, W) int64"""
    ious = []
    for c in range(num_classes):
        p = preds == c
        t = targets == c
        inter = (p & t).sum().item()
        union = (p | t).sum().item()
        if union == 0:
            ious.append(float("nan"))  # class absent in batch
        else:
            ious.append(inter / union)
    return ious  # nanmean for mIoU

Dice coefficient

\text{Dice} = \frac{2|P \cap G|}{|P| + |G|}

For binary masks, Dice is the same as F1 score on pixels.

IoU	Dice (binary)
Emphasizes union	Emphasizes overlap vs total pred+GT mass
Standard in detection/seg benchmarks	Very common in medical segmentation

Dice loss: 1 - Dice — gradients push for overlap; sometimes mixed with CE:

text

loss = CE + λ * (1 - Dice)

For the pet project, CE alone is enough to start; add Dice if foreground IoU plateaus.

Oxford-IIIT Pet — three classes in the project

Trimap labels (after remapping in the project):

ID	Meaning
0	Pet (foreground)
1	Background
2	Border / trimap transition

Border pixels are thin — easy to miss. A model can score high IoU on background and pet while border IoU stays low. Per-class IoU table in README tells the full story.

What to log every epoch

Log	Why
`train_loss`	Optimization progress
`val_mIoU`	Generalization — pick best checkpoint
Per-class val IoU	Spot weak classes (often border)
3–5 overlay PNGs	Human eyes catch failures metrics miss

text

Overlay = 0.6 * RGB image + 0.4 * colorized pred mask

If loss drops but overlays look worse → bug, overfit, or metric computed wrong.

Train vs val discipline

Tune thresholds / early stopping on validation mIoU.
Touch test set once at the end for honest reporting.
Same rule as Module 2 spam project — segmentation is no different.

Failure modes checklist

Observation	Likely cause
mIoU ≈ 0 always	Wrong mask encoding; CE on wrong value range
mIoU high, overlays wrong	Metric on subset; color map bug
All-background predictions	Class imbalance — check foreground IoU
Striped masks	Augmentation misalignment

Checkpoint

Logits shape (4, 3, 128, 128) — what is 3?
1000 GT foreground pixels, model predicts 0 — IoU?
Why log overlays if loss already decreases?

Answers: (1) num_classes channel dimension. (2) 0 — intersection empty. (3) Loss can improve on easy background pixels while borders stay wrong; overlays reveal that.

What's next

Module 5 quiz — then the U-Net project.