Instance segmentation & Mask R-CNN

Before we begin

Instance segmentation asks: which pixels belong to object instance #1 vs #2 — both class and individual object identity.

Builds on Faster R-CNN:

Each detection gets its own binary mask — instances separated.

ROI Pool quantized grid cells → misaligned masks. ROI Align samples with bilinear interpolation at continuous coordinates — sharper masks.

Stuff classes (sky, grass) — semantic only, one region.
Things (people, cars) — instance masks.
Every pixel gets exactly one panoptic ID.

Task	Output
Semantic	Per-pixel class, no instance IDs
Instance	Mask per object instance
Detection only	Boxes — faster, less pixel detail