← Back to curriculum

Module 1 — Math & intuition

Dot products — measuring similarity

Multiply-and-add similarity, template matching intuition, brightness bias, cosine similarity, and how classifiers score inputs.

~70 min read + exercises

Dot products — measuring similarity

Before we begin

In the last lesson, you learned that a patch of an image can become a list of numbers — a vector. The next question is one of the most important in all of machine learning:

“How similar are these two lists?”

If two patches look alike, their lists should score high on some similarity measure. If they look different, the score should be low. The dot product is the simplest such measure — and despite its simplicity, it appears everywhere: template matching, linear classifiers, image filters, and (much later) attention in large language models.

This lesson takes time with three layers:

  1. How to compute it — multiply matching pairs, add the results.
  2. What the result means — alignment and similarity in plain language.
  3. Where it is used — finding patterns in images and making yes/no decisions.

Figure

Dot product = how much two lists line up

Dot product = signed length of projectionMultiply pairs and add — larger when lists aligna (template)b (patch)projection
When two lists point in a similar direction, the dot product is larger. Think of it as a similarity score.

What you will learn

  • Compute a dot product by hand on small lists.
  • Explain the result as “how much two patterns rise and fall together.”
  • Understand brightness bias — why raw dot products can mislead you on images.
  • Use cosine similarity as a fairer comparison when brightness changes.
  • See how dot products become classifier scores and filter outputs.

Before this lesson


The recipe: multiply pairs, then add

Suppose two lists have the same length (same number of entries):

  • List a: [a1, a2, a3, …]
  • List b: [b1, b2, b3, …]

The dot product combines them like this:

a1×b1 + a2×b2 + a3×b3 + …

That is the whole mechanical recipe. No magic — just multiply each aligned pair and sum.

Worked example 1

[1, 2, 3] dot [4, 5, 6]

  • 1×4 = 4
  • 2×5 = 10
  • 3×6 = 18
  • Sum: 4 + 10 + 18 = 32

Worked example 2 (try this yourself)

[2, 0, 1] dot [3, 4, 5]

  • 2×3 = 6
  • 0×4 = 0
  • 1×5 = 5
  • Sum: 6 + 0 + 5 = 11

Why length must match

You cannot dot a 6-number list with a 48-number list — there are no aligned pairs. Length mismatch in code is the same class of bug as flattening with the wrong shape: the operation is undefined.


What the number actually tells you

Imagine two flattened patches from an image. Each list encodes a pattern of brighter and darker values.

Dot product resultPlain English
Large positiveWhere one list is high, the other tends to be high too — patterns move together
Near zeroNo simple linear relationship — patterns do not line up
NegativeWhere one is high, the other tends to be low — opposite trends

Intuition: If two patches show the same edge or gradient direction, their lists often produce a ** larger** dot product than two random patches.

Checkpoint: Two identical lists — dot product of the list with itself: large or small?

Large positive, unless every value is zero (black patch).


Finding a pattern in an image (template matching)

Suppose you save a small template — a corner, a logo, a fingerprint pattern — as a list t. You slide a window across a larger image. At each position, you flatten the window into list p and compute t dot p.

  • High score → “this region looks like the template.”
  • Low score → “probably not a match.”

Face unlock, document scanners, and quality-control cameras use variations of this idea. They rarely show you the dot product directly, but the math underneath is the same family.

The brightness problem (important)

Raw dot products are sensitive to overall brightness. If patch p is twice as bright as patch q but has the same pattern, every number in p is roughly doubled — and t dot p can be roughly twice t dot q even though the shapes match.

That is unfair if you care about pattern, not brightness. The fix is cosine similarity (next section).


Cosine similarity: compare shape, not brightness

Cosine similarity adjusts for how “long” each list is:

cosine similarity = dot product ÷ (length of a × length of b)

Length of a list (for example [3, 4]):

  • Square each number: 9 + 16 = 25
  • Square root: 5

So [3, 4] has length 5.

Results usually fall between -1 and 1:

  • Close to 1 → lists point the same direction (similar pattern)
  • Close to 0 → unrelated
  • Close to -1 → opposite patterns

Worked comparison

Patch A = [1, 1, 1, 1]
Patch B = [2, 2, 2, 2] (same pattern, twice as bright)
Template T = [1, 0, 1, 0]

  • Raw dot T with A vs T with B: B’s score is twice A’s — brightness bias.
  • Cosine similarity with T: same for A and B — pattern shape wins.

When comparing image patches, cosine similarity (or a related normalized score) is often more honest than a raw dot product.


Dot products in classification

A linear classifier often decides using:

score = (weights dot features) + bias

  • features — your input list (flattened patch, pixel values, measurements)
  • weights — learned importance of each feature (positive weight = “this feature pushes toward yes”)
  • bias — a constant nudge up or down

If score > 0 → predict class A; if score < 0 → predict class B (simplified story). Training (later phases) finds good weights from labeled examples. The decision rule is still multiply-and-add — a dot product.

Example in words: if dark corners in a patch push toward “indoor” and bright sky pushes toward “outdoor,” weights encode those tendencies. New patch → compute score → pick a label.


Link to image filters

When you blur or sharpen a photo, each output pixel is often computed from a small neighborhood of neighbors. That computation is frequently a dot product between:

  • a fixed kernel (list of weights like a small matrix), and
  • the neighborhood’s pixel values (as a list)

Convolutional neural networks learn those weight lists from data instead of hand-designing them — but each operation is still, at core, weighted sums built from dot products.


Summary

IdeaRemember
Dot productMultiply aligned pairs, add all products
MeaningSimilarity / co-movement of two lists
Raw dotFast but biased by overall scale (brightness)
Cosine similarityDot product normalized by list lengths
Classifiersscore = weights dot features + bias

What's next

Probability — when measurements lie a little — real pixel values are noisy; training averages over many samples so models learn stable patterns instead of random wobble.