Welcome to Module 4 — object detection
Before we begin
If you completed Module 3, you can train a model to answer: "Is there a dog in this photo?"
Production vision systems usually need more:
- How many dogs?
- Where are they (pixel coordinates)?
- How confident is each prediction?
That is object detection — and it is one of the most deployed CV tasks (autonomous driving, retail analytics, security, robotics, document OCR regions).
This module is in depth: not just names of models, but box math, training assignment, evaluation (mAP), failure modes, and a full fine-tuning project.
Figure
Module 4 at a glance
Key concepts (plain English)
Bounding box — Rectangle or center+size that tightly contains one object.
Confidence score — Model's estimated probability that the box contains class (after softmax or sigmoid per design).
Anchor — Template box at a grid cell; the network predicts offsets from the anchor (Faster R-CNN / SSD / older YOLO).
Proposal — Candidate region that might contain an object (RPN output in two-stage detectors).
NMS (non-maximum suppression) — Post-processing that removes duplicate boxes on the same object.
mAP (mean Average Precision) — Standard detection benchmark: integrates precision–recall across score thresholds and classes.
What detection adds over classification
Figure
Four vision output types
| Task | Output | Example question |
|---|---|---|
| Classification | 1 label | "Is this a cat photo?" |
| Detection | N boxes + labels | "Where are all the pedestrians?" |
| Semantic segmentation | H×W class map | "Which pixels are road?" |
| Instance segmentation | Mask per object | "Which pixel belongs to person #2?" |
Module 5 covers masks. Module 4 makes you fluent in boxes.
What Module 4 covers
| # | Lesson | You will be able to… |
|---|---|---|
| 1 | Classification → detection | Convert between box formats; explain set outputs |
| 2 | Architectures | Trace Faster R-CNN and YOLO data flow |
| 3 | Training | Interpret loss dicts; load COCO/YOLO labels |
| 4 | IoU, NMS, mAP | Compute metrics; tune thresholds honestly |
| 5 | On-device | Budget latency; export quantized models |
| Quiz | 25 MCQs | Self-check with review links |
| Project | Faster R-CNN | Fine-tune, mAP, failure analysis, ONNX |
Estimated module time: ~18–22 hours (reading + project).
Before you start
Required:
- Module 3 project — PyTorch, transfer learning, train/val discipline.
Install before the project:
pip install torch torchvision matplotlib numpy
# optional for extensions:
pip install torchmetrics pycocotools onnx onnxruntimeGPU: strongly recommended for the project (CPU works but slow).
How to read these lessons
- Do checkpoint questions before reading answer sketches.
- Run short code snippets in a notebook when suggested.
- Sketch boxes on paper for IoU exercises — muscle memory matters.
- After the project, keep a failure gallery — best way to learn detection.
Progress saves in this browser when you open each lesson.