Pixels, convolution, and edges
Here you will treat an image as a function on a grid and build intuition for linear filtering — the same family of operations that generalizes into the first layers of convolutional neural networks.
Figure
A 3×3 kernel sweeping across pixels
Learning objectives
- Represent grayscale and multi-channel images as on a discrete lattice.
- Apply 2D convolution with small kernels by hand on a numeric patch.
- Derive gradients, Sobel operators, and the Canny pipeline step by step.
- Explain separable kernels and count operations saved.
- Connect linear filters to frequency intuition (low-pass vs high-pass).
- Recognize when intensity edges do not imply geometric edges.
Prerequisites
- Lesson: Light, sensors, and the imaging pipeline (pixel values, linear vs sRGB).
- Basic idea of averaging and weighted sums.
Step 1 — Images as discrete signals
Let be intensity at integer coordinates .
- Domain: , .
- Channels: color images stack or luma–chroma (e.g. Y in YCbCr) — many filters run per-channel; others (color edge detectors) mix channels deliberately.
- Boundary handling: zero padding, reflect, replicate, or wrap. Different policies change border responses by several pixels.
Checkpoint: Why do boundary pixels behave differently under convolution almost no matter what you do?
The kernel window extends outside the image; synthetic border values invent content that is not in the scene.
Step 2 — Convolution vs correlation
In vision libraries, conv2d often implements cross-correlation:
True convolution flips the kernel: . For symmetric kernels (Gaussian, Laplacian) the distinction vanishes; for Sobel-x they differ by sign only on the kernel, which you can absorb into downstream logic.
Worked example (3×3 patch): Suppose a neighborhood (row-major) is
and kernel (box filter). Output = .
Exercise: With zero padding on a 4×4 image, how many output positions does a 3×3 kernel produce? (Answer: still 4×4 if you pad by 1 on each side.)
Step 3 — Gaussian smoothing and scale
Continuous 2D Gaussian:
Discrete kernels truncate at . Larger → more blur → scale space: objects smaller than disappear from the smoothed image.
| Kernel | Role |
|---|---|
| Box 3×3 | Fast, blocky frequency response |
| Gaussian | Smooth, no sharp nulls in spectrum |
| Median 3×3 | Nonlinear — removes salt-and-pepper, preserves step edges better than mean |
Checkpoint: If you smooth before edge detection, what happens to edge maps visually and why?
Noise spikes shrink; true edges widen and weaken — threshold trade-off moves.
Step 4 — Gradients and discrete derivatives
Forward difference:
Central difference (better symmetry):
Gradient magnitude and orientation:
Figure
Step edge → peaked gradient
Exercise: Row . Compute central at the step (index 3). Where does peak?
Step 5 — Sobel and structured derivatives
Sobel-x (unnormalized classic form):
The center weights approximate a Gaussian-smoothed derivative — less sensitive to isolated noise than bare differences.
Checkpoint: Why is “differentiate then smooth” equivalent to “smooth then differentiate” for linear operators?
Convolution is associative: .
Step 6 — Canny edge detector (full pipeline)
Canny (1986) is still the reference classical edge detector:
- Gaussian smooth — control .
- Gradient magnitude and angle — often Sobel.
- Non-maximum suppression (NMS): keep a pixel only if it is a local maximum along the gradient direction (thin edges).
- Double threshold: strong edges , weak . Strong pixels are edges; weak pixels kept only if connected to strong (hysteresis).
- Optional morphological cleanup.
| Parameter | Too low | Too high |
|---|---|---|
| Noisy, cluttered edges | Miss thin structures | |
| Everything is an edge | Broken contours | |
| Streaks of weak edges | Gaps in boundaries |
Exercise: Why does NMS require knowing edge orientation, not just magnitude?
Without orientation you cannot decide which neighbors to compare along the ridge.
Step 7 — Separable kernels
If , then
Cost: vs for 2D — for that is ~6× fewer multiplies.
Figure
Separable kernel: 2D = row ⊗ column
Exercise: A 1D Gaussian has 11 taps. How many multiplies per pixel for separable 2D vs naive 11×11?
Separable: ; full: .
Step 8 — Frequency intuition (short)
Convolution in space is multiplication in frequency: low-pass kernels attenuate high frequencies (noise, texture); high-pass (Laplacian, second derivative) emphasize edges.
Laplacian of Gaussian (LoG): — blob detector at scale ; zero-crossings locate edges. Used historically; today learned filters subsume much of this.
Deep dive — edges are not always “object boundaries”
| Edge cause | Geometric boundary? |
|---|---|
| Depth discontinuity | Often yes |
| Cast shadow | No — same surface |
| Texture (stripes) | No |
| Specular highlight | No |
| Albedo change (paint) | Sometimes |
Intensity-only edge detectors cannot disambiguate these without depth, motion, stereo, or learning.
Bridge to the next lessons
- Corners (next module) need variation in two directions — built from gradient structure tensors.
- CNNs replace hand-designed with learned filters but keep locality and translation equivariance.
Check your understanding
- What is the difference between an edge due to depth discontinuity vs cast shadow — can intensity alone always tell them apart?
- Why do CNNs use small kernels repeatedly rather than one giant kernel?
- Name two boundary policies and one artifact each can introduce.
- In Canny, what problem does hysteresis solve?
- Why is median filtering not equivalent to a single convolution kernel?
Lab-style stretch goals
Implement Sobel magnitude + NMS + hysteresis on grayscale (or use OpenCV Canny and compare your thresholds).
Color: Convert to LAB, run edges on L channel only vs each RGB channel — when does chroma create spurious edges?