Overfitting and underfitting

Before we begin

You touched on overfitting in Module 1 welcome. Now it becomes a core skill: recognizing it, measuring it, and fixing it.

Overfitting = great on training data, weak on new data.
Underfitting = weak everywhere — too simple to capture the pattern.

The goal is generalization: doing well on data the model has never seen.

Figure

The overfitting gap

Training accuracy can keep rising while test performance drops — a red flag.

Signs:

Causes: model too simple, not enough training, wrong features.

Fixes: richer features, more complex model (carefully), train longer if training error is still high.

Analogy: Studying with a cheat sheet that only says “pick the middle answer” — fails both practice and real exam.

Signs:

Why it happens:

Too flexible a model for the amount of data (many parameters, few examples).
Training too long — keeps improving on training noise.
Duplicate or leaky data — test-like examples snuck into training.
Too few diverse examples — memorization is easier than learning a rule.

Analogy: You memorized exact exam questions but cannot solve new ones with different numbers.

Model	Train accuracy	Test accuracy	Diagnosis
A	62%	60%	Underfitting
B	99%	61%	Overfitting
C	88%	85%	Reasonable generalization
D	95%	94%	Strong — verify test is honest

Checkpoint: You train a spam model on 12 emails until 100% training accuracy. Inbox tests look bad. Why?

Answer sketch

Tiny dataset + perfect training score → likely overfitting (memorized those 12 emails).

Strategy	What it does
More data	Harder to memorize; patterns must generalize
Simpler model	Fewer knobs to overfit noise
Regularization	Penalizes huge weights (L2, etc.)
Early stopping	Stop when validation error rises
Better features	Signal over noise
Cross-validation	More reliable estimate on small data

You do not need every technique in Module 2 — know the menu.

Good models sit in the middle for your data size and noise level.

Lesson 4 — Train, validation, and test splits — how to measure generalization honestly.