← Back to curriculum

Module 2 — Core machine learning

Supervised vs unsupervised learning

Labels vs no labels, real-world examples of each, semi-supervised preview, and how to classify a new problem.

~60 min read + exercises

Supervised vs unsupervised learning

Before we begin

Every ML tutorial mentions supervised learning. Here is the entire idea in one sentence:

Supervised learning = learning from examples that include the correct answer.

Unsupervised learning = no correct answers provided — the algorithm finds structure on its own (clusters, patterns, anomalies).

Knowing which type you have determines everything: data labeling cost, metrics, and project design.

Figure

Labels vs no labels

Two learning stylesSupervisedInput + known labelemail → spam / hamphoto → cat / doghouse features → price $UnsupervisedInput only — no labelscluster similar customersfind topics in documentsdetect unusual transactions
Supervised needs known answers. Unsupervised discovers groups or patterns.

What you will learn

  • Define supervised and unsupervised learning in plain language.
  • Name real examples of each.
  • Recognize which type a new problem belongs to.

Before this lesson


Supervised learning — learning with a teacher

Imagine flashcards:

  • Front: email text
  • Back: label says spam or not spam

The model sees thousands of fronts with known backs. Training adjusts weights so predictions match the backs.

Other supervised examples:

InputLabel (correct answer)
House size, bedroomsPrice in dollars
Photo of a digitDigit 0–9
Review textPositive or negative
Medical scanDisease present yes/no

What you need: labeled dataset. Labels often come from humans — that costs time and money.

Training goal: minimize prediction error vs labels (same spirit as Module 1 gradient descent, often with different loss functions).


Unsupervised learning — finding structure alone

No flashcard backs. You only get inputs.

Examples:

  • Clustering — group customers by behavior without predefined segments.
  • Topic modeling — discover themes in thousands of documents.
  • Anomaly detection — flag transactions unlike typical ones.
  • Dimensionality reduction — compress high-dimensional data for visualization.

The algorithm might output cluster IDs or scores — but nobody told it the “right” groups in advance.


Semi-supervised (quick note)

Real projects sometimes have few labels + many unlabeled examples. Techniques mix both. You will see this again with modern LLMs (pre-train on text, fine-tune on labels). For now, know the name exists.


Worked example — classify the problem

ProblemSupervised or unsupervised?Why
Predict house price from listings with sold pricesSupervisedSold price is the label
Group news articles by topic without tagsUnsupervisedNo topic labels given
Detect spam with 10,000 labeled emailsSupervisedspam/ham labels exist
Find unusual login patterns without fraud labelsOften unsupervised / anomalyNo labeled fraud needed upfront

Checkpoint: You have 1M product reviews without star ratings. You want to discover common complaint themes. Supervised or unsupervised?

Answer sketch

Unsupervised (or topic modeling). You are discovering structure, not predicting a provided label.


Common mistakes

  • Calling any ML “supervised” when labels are missing.
  • Using test labels during training (that is cheating — covered in Lesson 4).
  • Assuming unsupervised outputs are “true” clusters — they are useful views, not ground truth.

Why it matters for your project

The spam classifier is supervised: each training email has a spam or ham label. Module 2 project = classic supervised classification.


What's next

Lesson 2 — Regression vs classification — once you have labels, is the answer a number or a category?