Pixels, convolution, and edges
Here you will treat an image as a function on a grid and build intuition for linear filtering — the same family of operations that generalizes into the first layers of convolutional neural networks.
Figure
A 3×3 kernel sweeping across pixels
Learning objectives
- Represent a grayscale image as on a discrete lattice.
- Apply a 2D convolution with small kernels by hand (at least conceptually).
- Connect gradients and edge strength to differences of neighboring pixels.
- Explain separable kernels and why they matter computationally.
Prerequisites
- Lesson: Light, sensors, and the imaging pipeline (you should be comfortable with “what is a pixel value”).
- Basic idea of averaging and weighted sums.
Step 1 — Images as discrete signals
Let be intensity at integer coordinates .
- Domain: finite rectangle , .
- Boundary handling: when a filter asks for neighbors outside the image, common choices are zero padding, reflect, or clamp. Different choices change edges slightly.
Checkpoint: Why do boundary pixels behave differently under convolution almost no matter what you do?
Step 2 — What convolution means here (cross-correlation note)
In vision libraries, “conv2d” often implements cross-correlation:
For learning, the key is local linear combinations of neighbors, not the formal sign flip between convolution vs correlation.
Exercise: With a 3×3 kernel and zero padding, write the formula for the center pixel update as a nested sum in words: “multiply each neighbor by weight and add.”
Step 3 — Mean smoothing and its trade-offs
A uniform 3×3 box filter averages the 3×3 neighborhood.
- Pros: suppresses random noise (to a degree).
- Cons: blurs edges — exactly where much semantic information lives.
Checkpoint: If you smooth before edge detection, what happens to edge maps visually and why?
Step 4 — Gradients: discrete derivatives
A simple derivative along can be approximated by differences, e.g.
Similarly for . Gradient magnitude is often summarized as
and orientation as .
Exercise (small numeric example): Invent a 1×5 row of pixel intensities that step up once. Compute by forward differences and mark where the edge response peaks.
Figure
Step edge → peaked gradient
Step 5 — Sobel and Prewitt kernels
Sobel operators combine smoothing with differentiation — less sensitive to isolated noise spikes than a bare difference.
You should memorize the idea more than the exact numbers:
- Estimate derivative in one axis while lightly averaging perpendicular to that axis.
Checkpoint: Why is “differentiate then smooth” often similar in spirit to “smooth then differentiate” for linear operators?
Step 6 — Separable kernels
A 2D kernel is separable if it can be written as an outer product of 1D kernels:
Then 2D convolution can be done as two 1D passes — fewer operations for large kernels.
Figure
Separable kernel: 2D = row ⊗ column
Exercise: Explain in one paragraph why a 2D Gaussian blur is commonly implemented as two 1D convolutions.
Step 7 — From edges to “features” (preview)
Edges are local. Later modules use them as building blocks for corners, blobs, and learned feature maps. The deep-learning story replaces hand-designed with learned filters — but the locality and translation structure remain.
Check your understanding
- What is the difference between an edge due to depth discontinuity vs an edge due to cast shadow — can intensity alone always tell them apart?
- Why do convolutional neural networks use small kernels repeatedly rather than one giant kernel?
- Name two boundary policies and one artifact each can introduce.
Lab-style stretch goal (optional)
Implement Sobel magnitude on a grayscale image and threshold the magnitude map. Tune the threshold: what disappears first — texture or object boundaries?