Dot products — measuring similarity
Before we begin
In the last lesson, you learned that a patch of an image can become a list of numbers — a vector. The next question is one of the most important in all of machine learning:
“How similar are these two lists?”
If two patches look alike, their lists should score high on some similarity measure. If they look different, the score should be low. The dot product is the simplest such measure — and despite its simplicity, it appears everywhere: template matching, linear classifiers, image filters, and (much later) attention in large language models.
This lesson takes time with three layers:
- How to compute it — multiply matching pairs, add the results.
- What the result means — alignment and similarity in plain language.
- Where it is used — finding patterns in images and making yes/no decisions.
Figure
Dot product = how much two lists line up
What you will learn
- Compute a dot product by hand on small lists.
- Explain the result as “how much two patterns rise and fall together.”
- Understand brightness bias — why raw dot products can mislead you on images.
- Use cosine similarity as a fairer comparison when brightness changes.
- See how dot products become classifier scores and filter outputs.
Before this lesson
The recipe: multiply pairs, then add
Suppose two lists have the same length (same number of entries):
- List a:
[a1, a2, a3, …] - List b:
[b1, b2, b3, …]
The dot product combines them like this:
a1×b1 + a2×b2 + a3×b3 + …
That is the whole mechanical recipe. No magic — just multiply each aligned pair and sum.
Worked example 1
[1, 2, 3] dot [4, 5, 6]
- 1×4 = 4
- 2×5 = 10
- 3×6 = 18
- Sum: 4 + 10 + 18 = 32
Worked example 2 (try this yourself)
[2, 0, 1] dot [3, 4, 5]
- 2×3 = 6
- 0×4 = 0
- 1×5 = 5
- Sum: 6 + 0 + 5 = 11
Why length must match
You cannot dot a 6-number list with a 48-number list — there are no aligned pairs. Length mismatch in code is the same class of bug as flattening with the wrong shape: the operation is undefined.
What the number actually tells you
Imagine two flattened patches from an image. Each list encodes a pattern of brighter and darker values.
| Dot product result | Plain English |
|---|---|
| Large positive | Where one list is high, the other tends to be high too — patterns move together |
| Near zero | No simple linear relationship — patterns do not line up |
| Negative | Where one is high, the other tends to be low — opposite trends |
Intuition: If two patches show the same edge or gradient direction, their lists often produce a ** larger** dot product than two random patches.
Checkpoint: Two identical lists — dot product of the list with itself: large or small?
Large positive, unless every value is zero (black patch).
Finding a pattern in an image (template matching)
Suppose you save a small template — a corner, a logo, a fingerprint pattern — as a list t. You slide a window across a larger image. At each position, you flatten the window into list p and compute t dot p.
- High score → “this region looks like the template.”
- Low score → “probably not a match.”
Face unlock, document scanners, and quality-control cameras use variations of this idea. They rarely show you the dot product directly, but the math underneath is the same family.
The brightness problem (important)
Raw dot products are sensitive to overall brightness. If patch p is twice as bright as patch q but has the same pattern, every number in p is roughly doubled — and t dot p can be roughly twice t dot q even though the shapes match.
That is unfair if you care about pattern, not brightness. The fix is cosine similarity (next section).
Cosine similarity: compare shape, not brightness
Cosine similarity adjusts for how “long” each list is:
cosine similarity = dot product ÷ (length of a × length of b)
Length of a list (for example [3, 4]):
- Square each number: 9 + 16 = 25
- Square root: 5
So [3, 4] has length 5.
Results usually fall between -1 and 1:
- Close to 1 → lists point the same direction (similar pattern)
- Close to 0 → unrelated
- Close to -1 → opposite patterns
Worked comparison
Patch A = [1, 1, 1, 1]
Patch B = [2, 2, 2, 2] (same pattern, twice as bright)
Template T = [1, 0, 1, 0]
- Raw dot T with A vs T with B: B’s score is twice A’s — brightness bias.
- Cosine similarity with T: same for A and B — pattern shape wins.
When comparing image patches, cosine similarity (or a related normalized score) is often more honest than a raw dot product.
Dot products in classification
A linear classifier often decides using:
score = (weights dot features) + bias
- features — your input list (flattened patch, pixel values, measurements)
- weights — learned importance of each feature (positive weight = “this feature pushes toward yes”)
- bias — a constant nudge up or down
If score > 0 → predict class A; if score < 0 → predict class B (simplified story). Training (later phases) finds good weights from labeled examples. The decision rule is still multiply-and-add — a dot product.
Example in words: if dark corners in a patch push toward “indoor” and bright sky pushes toward “outdoor,” weights encode those tendencies. New patch → compute score → pick a label.
Link to image filters
When you blur or sharpen a photo, each output pixel is often computed from a small neighborhood of neighbors. That computation is frequently a dot product between:
- a fixed kernel (list of weights like a small matrix), and
- the neighborhood’s pixel values (as a list)
Convolutional neural networks learn those weight lists from data instead of hand-designing them — but each operation is still, at core, weighted sums built from dot products.
Summary
| Idea | Remember |
|---|---|
| Dot product | Multiply aligned pairs, add all products |
| Meaning | Similarity / co-movement of two lists |
| Raw dot | Fast but biased by overall scale (brightness) |
| Cosine similarity | Dot product normalized by list lengths |
| Classifiers | score = weights dot features + bias |
What's next
Probability — when measurements lie a little — real pixel values are noisy; training averages over many samples so models learn stable patterns instead of random wobble.