← Back to curriculum

Module 5 — Segmentation & instance masks

Instance segmentation & Mask R-CNN

Object masks vs semantic labels, Mask R-CNN heads, ROI align, panoptic segmentation overview, and COCO-style evaluation.

~85 min read + exercises

Instance segmentation & Mask R-CNN

Before we begin

Instance segmentation asks: which pixels belong to object instance #1 vs #2 — both class and individual object identity.


Learning objectives

  • Distinguish semantic vs instance vs panoptic segmentation.
  • Describe Mask R-CNN heads on Faster R-CNN.
  • Explain ROI Align vs ROI Pool.
  • Outline COCO-style mask AP evaluation.

Mask R-CNN

Builds on Faster R-CNN:

  1. Backbone + FPN — multi-scale features.
  2. RPN — region proposals.
  3. Box head — class + box refinement.
  4. Mask head — small FCN per ROI → K×KK \times K mask per instance.

Each detection gets its own binary mask — instances separated.


ROI Align

ROI Pool quantized grid cells → misaligned masks. ROI Align samples with bilinear interpolation at continuous coordinates — sharper masks.


Panoptic segmentation

Stuff classes (sky, grass) — semantic only, one region.
Things (people, cars) — instance masks.
Every pixel gets exactly one panoptic ID.


When to use which

TaskOutput
SemanticPer-pixel class, no instance IDs
InstanceMask per object instance
Detection onlyBoxes — faster, less pixel detail

What's next

Lesson 3 — Segmentation losses & metrics