Project: U-Net pet segmentation

Before we begin

Train a U-Net (or segmentation_models_pytorch) on Oxford-IIIT Pet trimap masks. Visualize predictions and report mIoU.

Time: ~5 hours (GPU recommended).

Setup

bash

pip install torch torchvision segmentation-models-pytorch matplotlib numpy

Data

torchvision.datasets.OxfordIIITPet with target_types="segmentation" — map trimap 3 to binary pet vs background or 3-class.

Resize to 256×256 for faster iteration; use same resize on image and mask (NEAREST for mask).

Model

python

import segmentation_models_pytorch as smp
 
model = smp.Unet(
    encoder_name="resnet18",
    encoder_weights="imagenet",
    in_channels=3,
    classes=3,
)

Combine CrossEntropyLoss with optional DiceLoss from smp.

Metrics

Per epoch compute mIoU on validation:

python

def iou_per_class(pred, target, num_classes):
    ious = []
    for c in range(num_classes):
        p = pred == c
        t = target == c
        inter = (p & t).sum().item()
        union = (p | t).sum().item()
        ious.append(inter / (union + 1e-6))
    return sum(ious) / num_classes

Deliverables

Val mIoU in README.
Side-by-side: image | ground truth | prediction (6 examples).
Paragraph on failure cases (boundaries, similar background).

Extension

Compare with DeepLabV3+ pretrained head — same metric table.

What's next

Welcome to Module 6 — video & motion.