← Back to curriculum

Module 5 — Segmentation & instance masks

Project: U-Net pet segmentation

Train a U-Net on Oxford-IIIT Pet masks with PyTorch, visualize predictions, compute mIoU, and compare with a pretrained DeepLab baseline.

~300 min read + exercises

Project: U-Net pet segmentation

Before we begin

Train a U-Net (or segmentation_models_pytorch) on Oxford-IIIT Pet trimap masks. Visualize predictions and report mIoU.

Time: ~5 hours (GPU recommended).


Setup

bash
pip install torch torchvision segmentation-models-pytorch matplotlib numpy

Data

torchvision.datasets.OxfordIIITPet with target_types="segmentation" — map trimap 3 to binary pet vs background or 3-class.

Resize to 256×256 for faster iteration; use same resize on image and mask (NEAREST for mask).


Model

python
import segmentation_models_pytorch as smp
 
model = smp.Unet(
    encoder_name="resnet18",
    encoder_weights="imagenet",
    in_channels=3,
    classes=3,
)

Combine CrossEntropyLoss with optional DiceLoss from smp.


Metrics

Per epoch compute mIoU on validation:

python
def iou_per_class(pred, target, num_classes):
    ious = []
    for c in range(num_classes):
        p = pred == c
        t = target == c
        inter = (p & t).sum().item()
        union = (p | t).sum().item()
        ious.append(inter / (union + 1e-6))
    return sum(ious) / num_classes

Deliverables

  1. Val mIoU in README.
  2. Side-by-side: image | ground truth | prediction (6 examples).
  3. Paragraph on failure cases (boundaries, similar background).

Extension

Compare with DeepLabV3+ pretrained head — same metric table.


What's next

Welcome to Module 6 — video & motion.