← Back to curriculum

Module 5 — Image segmentation

Welcome to Module 5

Why dense prediction matters, full lesson path (U-Net plus other models), study pacing, and how this module connects to CNNs.

~35 min read + exercises

Welcome to Module 5 — image segmentation

Before we begin

In Module 3 you trained a network on MNIST — one label for the whole image. In Module 4 you learned CNNs: filters that scan local patches and build up edges, textures, and parts. Module 5 is a full vision module — not a quick detour. You will spend meaningful time on what segmentation is, how encoder–decoders work, U-Net, other major families (FCN, DeepLab, SegFormer), instance models (Mask R-CNN), metrics, and a hands-on project.

Classification: “What is in this photo?” → cat
Segmentation: “Which pixels belong to the cat?” → a mask the same size as the image

That mask powers portrait blur, background removal, medical outlines, and driving perception. This module is designed to feel complete, not rushed — budget 12–15 hours for lessons + quiz + project.

Figure

Module 5 at a glance

Module 5 — image segmentation (full path)0Welcomeoverview1Typesseg flavors2Encoderdense pred3U-Netskips4BeyondFCN DeepLab5InstanceMask R-CNN6MetricsIoU Dice7Quizcheck8Projecttrain U-Net
Eight lessons: foundations → U-Net → other models → instance seg → metrics → quiz → project.

What you will learn (by the end of this module)

SkillYou will be able to…
VocabularyDistinguish semantic, instance, and panoptic segmentation
Encoder–decoderTrace spatial sizes and explain the bottleneck problem
U-NetImplement skips and train on real mask labels
Model landscapeCompare FCN, DeepLab/ASPP, SegFormer, Mask R-CNN — when each fits
MetricsUse CE, IoU, Dice; avoid the accuracy trap
ProjectTrain U-Net on pets; optional compare to pretrained DeepLab

Lesson path (read in order)

#LessonFocus
1What is segmentation?Task ladder, types, portrait walkthrough
2Encoder–decoderDense prediction, upsampling, alignment
3U-NetSkips, shapes, implementation map
4Beyond U-NetFCN, DeepLab, ASPP, SegFormer
5Instance & Mask R-CNNTwo-stage instance masks
6Losses & metricsCE, IoU, Dice, logging
7Quiz25 questions — pass 19/25
8ProjectU-Net from scratch + optional DeepLab compare

Why this module is harder (and worth it)

Earlier projectOutput size
MNIST1 digit label
SegmentationH × W labels per image

A 256×256 image = 65,536 predictions per forward pass. That is why we teach IoU instead of raw accuracy — and why you look at mask overlays every epoch.


How Module 5 connects to prior work

Prior lessonCarries forward here
Module 1 — Image patchesImages as grids — now every cell gets a label
Module 4 — CNNsConv stacks in encoders; pretrained backbones in DeepLab
Module 3 — Training loopSame forward → loss → backward → step

You do not need the sentiment LSTM project finished. You do need CNNs and PyTorch comfort.


Before you start

Required

Install before the project

bash
pip install torch torchvision matplotlib numpy
# optional stretch: pip install segmentation-models-pytorch

Optional: CV track — object detection for depth on mAP, NMS, and detector training.


How to study (avoid rushing)

  • Block 2–3 evenings for Lessons 1–5 before touching code.
  • After Lesson 3, sketch U-Net on paper with skip arrows.
  • After Lesson 4, write one sentence: “I would pick DeepLab over U-Net when ___.”
  • In the project, save overlays by epoch 3 — do not wait until training ends.

Ready?

Lesson 1 — What is image segmentation?