← All learning paths
Learning pathBeginner~98 hours

Computer Vision Foundations

A complete, standalone path from light and sensors through geometry, classical features, CNNs, detection, segmentation, video, and production deployment — with worked examples, quizzes, and portfolio-ready projects at every stage.

What you'll learn

  • Imaging to deployment
  • Hands-on projects
  • Self-paced modules

Your progress

0 / 43 lessons reached

Lessons in this path

Work top to bottom within each module, or jump in from the table of contents on each lesson page.

Module 1. Module 1 — Imaging & digital images

How cameras turn light into arrays of numbers — radiance, noise, convolution, edges, and color spaces — ending with a hands-on imaging pipeline lab.

  1. Lesson 130 min

    Welcome — start here

    Key CV vocabulary (pixel, radiance, convolution, ISP), how to read lessons, what Module 1 covers, and what to install before the project.

  2. Lesson 275 min

    Light, sensors, and the imaging pipeline

    Radiance vs irradiance, exposure and SNR, Bayer demosaicing, gamma/linear light, noise models, and the full ISP from RAW to JPEG.

  3. Lesson 380 min

    Pixels, convolution, and edges

    Discrete convolution with numeric examples, Gaussian scale-space, Sobel gradients, the full Canny pipeline, and separable filters.

  4. Lesson 470 min

    Color spaces & preprocessing

    RGB vs HSV vs Lab, white balance, histogram equalization, normalization for ML, and common preprocessing pitfalls.

  5. Lesson 545 min

    Module 1 quiz & review

    20 interactive multiple-choice questions with instant feedback, explanations, and lesson links for topics you miss.

  6. Lesson 6150 min

    Project: build an imaging pipeline lab

    Load RAW-like data, demosaic, apply gamma, add synthetic noise, visualize SNR vs ISO, and compare edge detectors in Python.

Module 2. Module 2 — Geometry & correspondence

Pinhole cameras, projection, features, matching, stereo depth, and robust estimation — ending with a panorama stitching project.

  1. Lesson 725 min

    Welcome to Module 2

    How Module 2 builds on Module 1, coordinate frames in vision, what you will learn, and prerequisites for the stitching project.

  2. Lesson 885 min

    Camera models and projection

    Homogeneous projection, K and [R|t], worked pixel examples, Brown–Conrady distortion, and Zhang calibration with reprojection error.

  3. Lesson 990 min

    Features, matching, and robust estimation

    Harris corners, SIFT/ORB, Lowe's ratio test, homography vs essential matrix, RANSAC iteration math, and epipolar geometry.

  4. Lesson 1075 min

    Stereo depth & triangulation

    Disparity maps, rectified stereo, triangulation geometry, depth from disparity, and common failure modes in real scenes.

  5. Lesson 1145 min

    Module 2 quiz & review

    20 interactive multiple-choice questions covering projection, features, RANSAC, and stereo with lesson review links.

  6. Lesson 12180 min

    Project: panorama stitching with OpenCV

    Detect ORB features, match with ratio test, estimate homography with RANSAC, warp and blend two photos into a panorama.

Module 3. Module 3 — Deep learning for vision

Convolutional networks, data augmentation, and transfer learning — ending with a fine-tuned image classifier project.

  1. Lesson 1325 min

    Welcome to Module 3

    When classical CV ends and learning begins, PyTorch setup, GPU vs CPU, and how Module 3 connects to Modules 1–2.

  2. Lesson 1490 min

    Convolutional networks for images

    Conv mechanics, receptive-field recurrence, batch norm and ResNet skips, backprop through conv, cross-entropy training, and augmentation.

  3. Lesson 1570 min

    Data augmentation & training tricks

    Random crop, flip, color jitter, mixup preview, learning rate schedules, weight decay, and monitoring train vs val curves.

  4. Lesson 1675 min

    Transfer learning & fine-tuning

    ImageNet pretraining, freezing vs unfreezing layers, learning rate schedules, domain shift, and when fine-tuning beats training from scratch.

  5. Lesson 1745 min

    Module 3 quiz & review

    20 interactive multiple-choice questions on CNNs, augmentation, and transfer learning with review links.

  6. Lesson 18240 min

    Project: fine-tuned image classifier

    Fine-tune a ResNet on a custom dataset with PyTorch, track train/val accuracy, visualize misclassifications, and export to ONNX.

Module 4. Module 4 — Object detection

A deep dive from classification to bounding boxes — architectures (Faster R-CNN, YOLO, DETR), training losses, IoU/NMS/mAP, deployment, and a full detector fine-tuning project with evaluation.

  1. Lesson 1935 min

    Welcome to Module 4

    Why detection is harder than classification, COCO vs YOLO labels, module roadmap, and what to install before the project.

  2. Lesson 2095 min

    From classification to detection

    Variable-size outputs, box formats, anchors, assignment, set matching, and annotation formats with worked numeric examples.

  3. Lesson 21100 min

    Detector architectures — two-stage, one-stage & FPN

    Faster R-CNN stage-by-stage, YOLO grid intuition, RetinaNet focal loss, DETR matching, and how to pick a family for your product.

  4. Lesson 2290 min

    Training detectors — losses, labels & data pipelines

    Classification + box regression losses, RPN and RoI training, hard negative mining, COCO/YOLO dataset loading, and common training bugs.

  5. Lesson 2395 min

    IoU, NMS, mAP & evaluation

    Worked IoU examples, NMS step-by-step, precision-recall and AP, COCO mAP@0.5:0.95, threshold tuning, and qualitative failure analysis.

  6. Lesson 2480 min

    On-device detection & deployment

    Latency vs resolution budgets, batching, INT8 quantization for detectors, ONNX/TFLite export, and profiling on CPU/GPU/mobile.

  7. Lesson 2555 min

    Module 4 quiz & review

    25 interactive multiple-choice questions on detection theory, training, metrics, and deployment with lesson review links.

  8. Lesson 26360 min

    Project: train and evaluate an object detector

    Fine-tune Faster R-CNN on Penn-Fudan, plot PR curves, compute mAP, tune thresholds, visualize failures, and export ONNX.

Module 5. Module 5 — Segmentation & instance masks

Semantic, instance, and panoptic segmentation — U-Net, Mask R-CNN, losses and metrics — ending with a pet segmentation project.

  1. Lesson 2725 min

    Welcome to Module 5

    Dense prediction vs detection, encoder–decoder intuition, what Module 5 covers, and dataset prep for segmentation.

  2. Lesson 2880 min

    Semantic segmentation & U-Net

    Per-pixel classification, encoder–decoder architectures, skip connections, U-Net blocks, and when segmentation beats bounding boxes.

  3. Lesson 2985 min

    Instance segmentation & Mask R-CNN

    Object masks vs semantic labels, Mask R-CNN heads, ROI align, panoptic segmentation overview, and COCO-style evaluation.

  4. Lesson 3070 min

    Segmentation losses & metrics

    Cross-entropy vs Dice vs focal loss, IoU and mIoU, boundary-aware losses, class imbalance, and qualitative vs quantitative eval.

  5. Lesson 3145 min

    Module 5 quiz & review

    20 interactive multiple-choice questions on U-Net, Mask R-CNN, losses, and mIoU with lesson review links.

  6. Lesson 32300 min

    Project: U-Net pet segmentation

    Train a U-Net on Oxford-IIIT Pet masks with PyTorch, visualize predictions, compute mIoU, and compare with a pretrained DeepLab baseline.

Module 6. Module 6 — Video & motion

Optical flow, temporal consistency, object tracking, and data association — ending with a multi-object video tracker project.

  1. Lesson 3325 min

    Welcome to Module 6

    Why video is harder than images, temporal redundancy, frame rates, and how Module 6 connects detection to tracking.

  2. Lesson 3475 min

    Optical flow & motion

    Brightness constancy, Lucas–Kanade, Horn–Schunck, flow visualization, motion boundaries, and when flow fails.

  3. Lesson 3580 min

    Object tracking & data association

    SORT and DeepSORT, Kalman filters for boxes, Hungarian matching, re-identification embeddings, and track lifecycle management.

  4. Lesson 3640 min

    Module 6 quiz & review

    15 interactive multiple-choice questions on optical flow, tracking, and data association with review links.

  5. Lesson 37240 min

    Project: multi-object video tracker

    Run a detector per frame, associate detections with a Kalman + IoU tracker, visualize tracks over a video, and measure ID switches.

Module 7. Module 7 — CV production & deployment

Model serving, edge optimization, monitoring, and data drift for vision systems — ending with a deployed inference API project.

  1. Lesson 3825 min

    Welcome to Module 7

    From notebook to production, latency vs accuracy trade-offs, and what a deployable vision pipeline needs beyond mAP.

  2. Lesson 3970 min

    Model serving for vision

    REST vs gRPC inference, batching, TensorRT and ONNX Runtime, warm-up, preprocessing on server vs client, and container basics.

  3. Lesson 4075 min

    Edge deployment & optimization

    INT8 quantization, pruning, knowledge distillation, TFLite and CoreML, NPU delegates, and profiling latency on device.

  4. Lesson 4165 min

    Monitoring, drift & retraining

    Input distribution shift, concept drift, shadow deployments, human-in-the-loop labeling, and when to retrain vs fine-tune.

  5. Lesson 4240 min

    Module 7 quiz & review

    15 interactive multiple-choice questions on serving, quantization, and production monitoring with review links.

  6. Lesson 43300 min

    Project: deploy a vision inference API

    Export a classifier to ONNX, serve with FastAPI + ONNX Runtime, add health checks, batch endpoint, and a simple latency dashboard.