Welcome to Module 4 — object detection

Before we begin

If you completed Module 3, you can train a model to answer: "Is there a dog in this photo?"
Production vision systems usually need more:

How many dogs?
Where are they (pixel coordinates)?
How confident is each prediction?

That is object detection — and it is one of the most deployed CV tasks (autonomous driving, retail analytics, security, robotics, document OCR regions).

This module is in depth: not just names of models, but box math, training assignment, evaluation (mAP), failure modes, and a full fine-tuning project.

Figure

Module 4 at a glance

Seven lessons, quiz, then a hands-on detector project with mAP and threshold tuning.

Key concepts (plain English)

Bounding box — Rectangle $(x_1, y_1, x_2, y_2)$ or center+size that tightly contains one object.

Confidence score — Model's estimated probability that the box contains class $c$ (after softmax or sigmoid per design).

Anchor — Template box at a grid cell; the network predicts offsets from the anchor (Faster R-CNN / SSD / older YOLO).

Proposal — Candidate region that might contain an object (RPN output in two-stage detectors).

NMS (non-maximum suppression) — Post-processing that removes duplicate boxes on the same object.

mAP (mean Average Precision) — Standard detection benchmark: integrates precision–recall across score thresholds and classes.

What detection adds over classification

Figure

Four vision output types

Detection sits between whole-image labels and per-pixel masks.

Task	Output	Example question
Classification	1 label	"Is this a cat photo?"
Detection	N boxes + labels	"Where are all the pedestrians?"
Semantic segmentation	H×W class map	"Which pixels are road?"
Instance segmentation	Mask per object	"Which pixel belongs to person #2?"

Module 5 covers masks. Module 4 makes you fluent in boxes.

What Module 4 covers

#	Lesson	You will be able to…
1	Classification → detection	Convert between box formats; explain set outputs
2	Architectures	Trace Faster R-CNN and YOLO data flow
3	Training	Interpret loss dicts; load COCO/YOLO labels
4	IoU, NMS, mAP	Compute metrics; tune thresholds honestly
5	On-device	Budget latency; export quantized models
Quiz	25 MCQs	Self-check with review links
Project	Faster R-CNN	Fine-tune, mAP, failure analysis, ONNX

Estimated module time: ~18–22 hours (reading + project).

Before you start

Required:

Module 3 project — PyTorch, transfer learning, train/val discipline.

Install before the project:

bash

pip install torch torchvision matplotlib numpy
# optional for extensions:
pip install torchmetrics pycocotools onnx onnxruntime

GPU: strongly recommended for the project (CPU works but slow).

How to read these lessons

Do checkpoint questions before reading answer sketches.
Run short code snippets in a notebook when suggested.
Sketch boxes on paper for IoU exercises — muscle memory matters.
After the project, keep a failure gallery — best way to learn detection.

Progress saves in this browser when you open each lesson.

What's next

Lesson 1 — From classification to detection