Instance segmentation & Mask R-CNN
Before we begin
Instance segmentation asks: which pixels belong to object instance #1 vs #2 — both class and individual object identity.
Learning objectives
- Distinguish semantic vs instance vs panoptic segmentation.
- Describe Mask R-CNN heads on Faster R-CNN.
- Explain ROI Align vs ROI Pool.
- Outline COCO-style mask AP evaluation.
Mask R-CNN
Builds on Faster R-CNN:
- Backbone + FPN — multi-scale features.
- RPN — region proposals.
- Box head — class + box refinement.
- Mask head — small FCN per ROI → mask per instance.
Each detection gets its own binary mask — instances separated.
ROI Align
ROI Pool quantized grid cells → misaligned masks. ROI Align samples with bilinear interpolation at continuous coordinates — sharper masks.
Panoptic segmentation
Stuff classes (sky, grass) — semantic only, one region.
Things (people, cars) — instance masks.
Every pixel gets exactly one panoptic ID.
When to use which
| Task | Output |
|---|---|
| Semantic | Per-pixel class, no instance IDs |
| Instance | Mask per object instance |
| Detection only | Boxes — faster, less pixel detail |