Project: U-Net pet segmentation
Before we begin
Train a U-Net (or segmentation_models_pytorch) on Oxford-IIIT Pet trimap masks. Visualize predictions and report mIoU.
Time: ~5 hours (GPU recommended).
Setup
pip install torch torchvision segmentation-models-pytorch matplotlib numpyData
torchvision.datasets.OxfordIIITPet with target_types="segmentation" — map trimap 3 to binary pet vs background or 3-class.
Resize to 256×256 for faster iteration; use same resize on image and mask (NEAREST for mask).
Model
import segmentation_models_pytorch as smp
model = smp.Unet(
encoder_name="resnet18",
encoder_weights="imagenet",
in_channels=3,
classes=3,
)Combine CrossEntropyLoss with optional DiceLoss from smp.
Metrics
Per epoch compute mIoU on validation:
def iou_per_class(pred, target, num_classes):
ious = []
for c in range(num_classes):
p = pred == c
t = target == c
inter = (p & t).sum().item()
union = (p | t).sum().item()
ious.append(inter / (union + 1e-6))
return sum(ious) / num_classesDeliverables
- Val mIoU in README.
- Side-by-side: image | ground truth | prediction (6 examples).
- Paragraph on failure cases (boundaries, similar background).
Extension
Compare with DeepLabV3+ pretrained head — same metric table.
What's next
Welcome to Module 6 — video & motion.