Supervised vs unsupervised learning

Before we begin

Every ML tutorial mentions supervised learning. Here is the entire idea in one sentence:

Supervised learning = learning from examples that include the correct answer.

Unsupervised learning = no correct answers provided — the algorithm finds structure on its own (clusters, patterns, anomalies).

Knowing which type you have determines everything: data labeling cost, metrics, and project design.

Figure

Labels vs no labels

Supervised needs known answers. Unsupervised discovers groups or patterns.

What you will learn

Define supervised and unsupervised learning in plain language.
Name real examples of each.
Recognize which type a new problem belongs to.

Before this lesson

Module 2 welcome
Module 1 welcome — key concepts (model, training, labels)

Supervised learning — learning with a teacher

Imagine flashcards:

Front: email text
Back: label says spam or not spam

The model sees thousands of fronts with known backs. Training adjusts weights so predictions match the backs.

Other supervised examples:

Input	Label (correct answer)
House size, bedrooms	Price in dollars
Photo of a digit	Digit 0–9
Review text	Positive or negative
Medical scan	Disease present yes/no

What you need: labeled dataset. Labels often come from humans — that costs time and money.

Training goal: minimize prediction error vs labels (same spirit as Module 1 gradient descent, often with different loss functions).

Unsupervised learning — finding structure alone

No flashcard backs. You only get inputs.

Examples:

Clustering — group customers by behavior without predefined segments.
Topic modeling — discover themes in thousands of documents.
Anomaly detection — flag transactions unlike typical ones.
Dimensionality reduction — compress high-dimensional data for visualization.

The algorithm might output cluster IDs or scores — but nobody told it the “right” groups in advance.

Semi-supervised (quick note)

Real projects sometimes have few labels + many unlabeled examples. Techniques mix both. You will see this again with modern LLMs (pre-train on text, fine-tune on labels). For now, know the name exists.

Worked example — classify the problem

Problem	Supervised or unsupervised?	Why
Predict house price from listings with sold prices	Supervised	Sold price is the label
Group news articles by topic without tags	Unsupervised	No topic labels given
Detect spam with 10,000 labeled emails	Supervised	spam/ham labels exist
Find unusual login patterns without fraud labels	Often unsupervised / anomaly	No labeled fraud needed upfront

Checkpoint: You have 1M product reviews without star ratings. You want to discover common complaint themes. Supervised or unsupervised?

Answer sketch

Unsupervised (or topic modeling). You are discovering structure, not predicting a provided label.

Common mistakes

Calling any ML “supervised” when labels are missing.
Using test labels during training (that is cheating — covered in Lesson 4).
Assuming unsupervised outputs are “true” clusters — they are useful views, not ground truth.

Why it matters for your project

The spam classifier is supervised: each training email has a spam or ham label. Module 2 project = classic supervised classification.

What's next

Lesson 2 — Regression vs classification — once you have labels, is the answer a number or a category?