Regression vs classification

Before we begin

Supervised learning splits into two families by output type:

Regression predicts a number.
Classification picks a category.

Same training loop spirit as Module 1 — different output shape and different ways to measure success.

Figure

Number vs category

Regression draws a line through numeric targets. Classification assigns labels.

What you will learn

Tell regression and classification apart from a problem description.
Connect Module 1 linear regression to regression problems.
See why spam detection is classification, not regression.

Before this lesson

Lesson 1 — Supervised vs unsupervised

Regression — predict a quantity

Output: continuous number (any value in a range, often).

Problem	Output
House pricing	$285,000
Weather	22.3°C tomorrow
Module 1 patch project	Pixel brightness 142.7
Stock (harder!)	Expected return

Error metrics: mean squared error (MSE), mean absolute error (MAE) — “how far off is the number?”

You already built linear regression in Module 1. That was regression in its purest form.

Classification — pick a label

Output: discrete category (class).

Problem	Output
Spam filter	spam or ham
Digit recognition	0, 1, …, 9
Medical screening	positive or negative
Sentiment	positive / neutral / negative

Error metrics: accuracy, precision, recall, F1 — Lesson 5 covers these.

Multi-class: more than two labels (digit 0–9). Binary: two labels (spam/ham).

Logistic regression — name trap

Logistic regression sounds like regression but is mainly used for binary classification. It outputs a probability between 0 and 1 (e.g. “87% chance spam”), then picks a threshold.

Your Module 2 project can use logistic regression or Naive Bayes — both classic text classifiers.

Worked example — which type?

Description	Type	Output
Predict rent from square feet	Regression	Dollars/month
Is this email spam?	Classification	spam / ham
Predict age from photo	Regression	Years (number)
Which animal is in the photo?	Classification	cat / dog / bird / …

Checkpoint: Predict whether a transaction is fraud (yes/no). Regression or classification?

Answer sketch

Binary classification. Yes/no are categories, not a continuous dollar amount.

Same inputs, different tasks

The same features (word counts, metadata) can support different tasks:

Predict spam score 0–100 → regression framing (uncommon for spam).
Predict spam vs ham → classification (standard).

Product teams usually want a decision — classification.

Common mistakes

Using MSE on category labels coded as 0/1 without understanding probability models.
Calling multi-class digit recognition “regression” because outputs are numbers 0–9 — it is classification (each digit is a class).
Ignoring class imbalance (99% ham, 1% spam) — accuracy alone misleads (Lesson 5).

What's next

Lesson 3 — Overfitting & underfitting — when your model learns the training set too well.