← Back to curriculum

Module 3 — Deep learning for vision

Transfer learning & fine-tuning

ImageNet pretraining, freezing vs unfreezing layers, learning rate schedules, domain shift, and when fine-tuning beats training from scratch.

~75 min read + exercises

Transfer learning & fine-tuning

Before we begin

Training ResNet from scratch on ImageNet needs massive data and compute. In practice you start from pretrained weights and adapt to your task — transfer learning.


Learning objectives

  • Explain pretraining and fine-tuning.
  • Choose freeze vs unfreeze strategies by dataset size.
  • Set discriminative learning rates.
  • Recognize domain shift and mitigation (augmentation, more data, longer fine-tune).

Pretrained backbone

ImageNet pretraining teaches low-level edges/textures and mid-level parts useful across domains.

Typical pipeline:

  1. Replace final classifier head for your number of classes.
  2. Train head only (frozen backbone) for a few epochs.
  3. Unfreeze last blocks with small LR for full fine-tune.

Freezing layers

python
for param in model.parameters():
    param.requires_grad = False
for param in model.fc.parameters():  # or model.head
    param.requires_grad = True

Small dataset + similar domain → freeze more. Large dataset or very different imagery → unfreeze deeper layers.


Learning rates

Use lower LR for backbone (e.g. 1e-4) and higher for new head (e.g. 1e-3). Optimizer param groups:

python
optimizer = torch.optim.Adam([
    {"params": head.parameters(), "lr": 1e-3},
    {"params": backbone.parameters(), "lr": 1e-4},
])

Data augmentation

torchvision.transforms: RandomResizedCrop, HorizontalFlip, ColorJitter — critical when data is limited.

Match eval transforms (center crop, no jitter) at validation and inference.


Domain shift

Medical X-ray, satellite, or factory cameras differ from ImageNet — transfer still helps but expect to:

  • Fine-tune longer
  • Use stronger augmentation
  • Monitor validation closely for overfitting

Checkpoint

When is training only the classifier head insufficient?

Answer sketch: When domain gap is large or classes need fine spatial detail not in generic features — unfreeze deeper layers.


What's next

Module 3 quiz — then classifier project.