Regression vs classification
Before we begin
Supervised learning splits into two families by output type:
Regression predicts a number.
Classification picks a category.
Same training loop spirit as Module 1 — different output shape and different ways to measure success.
Figure
Number vs category
What you will learn
- Tell regression and classification apart from a problem description.
- Connect Module 1 linear regression to regression problems.
- See why spam detection is classification, not regression.
Before this lesson
Regression — predict a quantity
Output: continuous number (any value in a range, often).
| Problem | Output |
|---|---|
| House pricing | $285,000 |
| Weather | 22.3°C tomorrow |
| Module 1 patch project | Pixel brightness 142.7 |
| Stock (harder!) | Expected return |
Error metrics: mean squared error (MSE), mean absolute error (MAE) — “how far off is the number?”
You already built linear regression in Module 1. That was regression in its purest form.
Classification — pick a label
Output: discrete category (class).
| Problem | Output |
|---|---|
| Spam filter | spam or ham |
| Digit recognition | 0, 1, …, 9 |
| Medical screening | positive or negative |
| Sentiment | positive / neutral / negative |
Error metrics: accuracy, precision, recall, F1 — Lesson 5 covers these.
Multi-class: more than two labels (digit 0–9). Binary: two labels (spam/ham).
Logistic regression — name trap
Logistic regression sounds like regression but is mainly used for binary classification. It outputs a probability between 0 and 1 (e.g. “87% chance spam”), then picks a threshold.
Your Module 2 project can use logistic regression or Naive Bayes — both classic text classifiers.
Worked example — which type?
| Description | Type | Output |
|---|---|---|
| Predict rent from square feet | Regression | Dollars/month |
| Is this email spam? | Classification | spam / ham |
| Predict age from photo | Regression | Years (number) |
| Which animal is in the photo? | Classification | cat / dog / bird / … |
Checkpoint: Predict whether a transaction is fraud (yes/no). Regression or classification?
Answer sketch
Binary classification. Yes/no are categories, not a continuous dollar amount.
Same inputs, different tasks
The same features (word counts, metadata) can support different tasks:
- Predict spam score 0–100 → regression framing (uncommon for spam).
- Predict spam vs ham → classification (standard).
Product teams usually want a decision — classification.
Common mistakes
- Using MSE on category labels coded as 0/1 without understanding probability models.
- Calling multi-class digit recognition “regression” because outputs are numbers 0–9 — it is classification (each digit is a class).
- Ignoring class imbalance (99% ham, 1% spam) — accuracy alone misleads (Lesson 5).
What's next
Lesson 3 — Overfitting & underfitting — when your model learns the training set too well.