← Back to curriculum

Module 2 — Core machine learning

Regression vs classification

Numeric outputs vs categories, logistic regression naming, multi-class vs binary, and connecting to the Module 1 project.

~65 min read + exercises

Regression vs classification

Before we begin

Supervised learning splits into two families by output type:

Regression predicts a number.
Classification picks a category.

Same training loop spirit as Module 1 — different output shape and different ways to measure success.

Figure

Number vs category

Output type decides the problem typeRegression → numberprice, temperature, brightnessClassification → categoryspamhamspamspam/ham, cat/dog, yes/no
Regression draws a line through numeric targets. Classification assigns labels.

What you will learn

  • Tell regression and classification apart from a problem description.
  • Connect Module 1 linear regression to regression problems.
  • See why spam detection is classification, not regression.

Before this lesson


Regression — predict a quantity

Output: continuous number (any value in a range, often).

ProblemOutput
House pricing$285,000
Weather22.3°C tomorrow
Module 1 patch projectPixel brightness 142.7
Stock (harder!)Expected return

Error metrics: mean squared error (MSE), mean absolute error (MAE) — “how far off is the number?”

You already built linear regression in Module 1. That was regression in its purest form.


Classification — pick a label

Output: discrete category (class).

ProblemOutput
Spam filterspam or ham
Digit recognition0, 1, …, 9
Medical screeningpositive or negative
Sentimentpositive / neutral / negative

Error metrics: accuracy, precision, recall, F1 — Lesson 5 covers these.

Multi-class: more than two labels (digit 0–9). Binary: two labels (spam/ham).


Logistic regression — name trap

Logistic regression sounds like regression but is mainly used for binary classification. It outputs a probability between 0 and 1 (e.g. “87% chance spam”), then picks a threshold.

Your Module 2 project can use logistic regression or Naive Bayes — both classic text classifiers.


Worked example — which type?

DescriptionTypeOutput
Predict rent from square feetRegressionDollars/month
Is this email spam?Classificationspam / ham
Predict age from photoRegressionYears (number)
Which animal is in the photo?Classificationcat / dog / bird / …

Checkpoint: Predict whether a transaction is fraud (yes/no). Regression or classification?

Answer sketch

Binary classification. Yes/no are categories, not a continuous dollar amount.


Same inputs, different tasks

The same features (word counts, metadata) can support different tasks:

  • Predict spam score 0–100 → regression framing (uncommon for spam).
  • Predict spam vs ham → classification (standard).

Product teams usually want a decision — classification.


Common mistakes

  • Using MSE on category labels coded as 0/1 without understanding probability models.
  • Calling multi-class digit recognition “regression” because outputs are numbers 0–9 — it is classification (each digit is a class).
  • Ignoring class imbalance (99% ham, 1% spam) — accuracy alone misleads (Lesson 5).

What's next

Lesson 3 — Overfitting & underfitting — when your model learns the training set too well.