← Back to curriculum

Module 1 — Imaging & digital images

Pixels, convolution, and edges

Discrete convolution with numeric examples, Gaussian scale-space, Sobel gradients, the full Canny pipeline, and separable filters.

~80 min read + exercises

Pixels, convolution, and edges

Here you will treat an image as a function on a grid and build intuition for linear filtering — the same family of operations that generalizes into the first layers of convolutional neural networks.

Figure

A 3×3 kernel sweeping across pixels

A 3×3 kernel sweeps every pixelAt each location, multiply neighbors by kernel weights and sum.-101-202-101Output pixelo[x, y] = Σ I[x+i, y+j]· K[i, j] (here Sobel-x)Image I[x,y] on a discrete grid
At every location the kernel multiplies neighbors by its weights and sums them — here visualized with Sobel-x weights.

Learning objectives

  • Represent grayscale and multi-channel images as I[x,y,c]\mathrm{I}[x,y,c] on a discrete lattice.
  • Apply 2D convolution with small kernels by hand on a numeric patch.
  • Derive gradients, Sobel operators, and the Canny pipeline step by step.
  • Explain separable kernels and count operations saved.
  • Connect linear filters to frequency intuition (low-pass vs high-pass).
  • Recognize when intensity edges do not imply geometric edges.

Prerequisites

  • Lesson: Light, sensors, and the imaging pipeline (pixel values, linear vs sRGB).
  • Basic idea of averaging and weighted sums.

Step 1 — Images as discrete signals

Let I[x,y]\mathrm{I}[x,y] be intensity at integer coordinates (x,y)(x,y).

  • Domain: 0x<W0 \le x < W, 0y<H0 \le y < H.
  • Channels: color images stack c{R,G,B}c \in \{R,G,B\} or luma–chroma (e.g. Y in YCbCr) — many filters run per-channel; others (color edge detectors) mix channels deliberately.
  • Boundary handling: zero padding, reflect, replicate, or wrap. Different policies change border responses by several pixels.

Checkpoint: Why do boundary pixels behave differently under convolution almost no matter what you do?

The kernel window extends outside the image; synthetic border values invent content that is not in the scene.


Step 2 — Convolution vs correlation

In vision libraries, conv2d often implements cross-correlation:

(IK)[x,y]=ijI[x+i,y+j]K[i,j](I * K)[x,y] = \sum_{i}\sum_{j} I[x+i, y+j]\, K[i,j]

True convolution flips the kernel: K[i,j]=K[i,j]K'[i,j] = K[-i,-j]. For symmetric kernels (Gaussian, Laplacian) the distinction vanishes; for Sobel-x they differ by sign only on the kernel, which you can absorb into downstream logic.

Worked example (3×3 patch): Suppose a neighborhood (row-major) is

[121000121]\begin{bmatrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ 1 & 2 & 1 \end{bmatrix}

and kernel K=1913×3K = \frac{1}{9}\mathbf{1}_{3\times 3} (box filter). Output = (1+2+1+0+0+0+1+2+1)/9=8/90.89(1+2+1+0+0+0+1+2+1)/9 = 8/9 \approx 0.89.

Exercise: With zero padding on a 4×4 image, how many output positions does a 3×3 kernel produce? (Answer: still 4×4 if you pad by 1 on each side.)


Step 3 — Gaussian smoothing and scale

Continuous 2D Gaussian:

Gσ(x,y)=12πσ2e(x2+y2)/(2σ2)G_\sigma(x,y) = \frac{1}{2\pi\sigma^2} e^{-(x^2+y^2)/(2\sigma^2)}

Discrete kernels truncate at 3σ\approx 3\sigma. Larger σ\sigma → more blur → scale space: objects smaller than σ\sigma disappear from the smoothed image.

KernelRole
Box 3×3Fast, blocky frequency response
GaussianSmooth, no sharp nulls in spectrum
Median 3×3Nonlinear — removes salt-and-pepper, preserves step edges better than mean

Checkpoint: If you smooth before edge detection, what happens to edge maps visually and why?

Noise spikes shrink; true edges widen and weaken — threshold trade-off moves.


Step 4 — Gradients and discrete derivatives

Forward difference:

Gx[x,y]I[x+1,y]I[x,y]G_x[x,y] \approx I[x+1,y] - I[x,y]

Central difference (better symmetry):

Gx[x,y]I[x+1,y]I[x1,y]2G_x[x,y] \approx \frac{I[x+1,y] - I[x-1,y]}{2}

Gradient magnitude and orientation:

I=Gx2+Gy2,θ=atan2(Gy,Gx)\|\nabla I\| = \sqrt{G_x^2 + G_y^2}, \quad \theta = \operatorname{atan2}(G_y, G_x)

Figure

Step edge → peaked gradient

Step edge → peaked gradient responseForward difference Gx[i] = I[i+1] − I[i]. The edge lights up exactly at the jump.Intensity I[x]40404040200200200200200Gradient Gx[x]00016000000The peak in |Gx| localizes the edge.Smoothing before differentiation broadens the peak (Sobel ≈ smooth + differentiate).
Forward differences on a 1D row of pixels: the gradient lights up exactly where intensity jumps. The 2D story is the same in each direction.

Exercise: Row [10,10,10,50,50,50][10, 10, 10, 50, 50, 50]. Compute central GxG_x at the step (index 3). Where does Gx|G_x| peak?


Step 5 — Sobel and structured derivatives

Sobel-x (unnormalized classic form):

Kx=[101202101],Ky=[121000121]K_x = \begin{bmatrix} -1 & 0 & 1 \\ -2 & 0 & 2 \\ -1 & 0 & 1 \end{bmatrix}, \quad K_y = \begin{bmatrix} -1 & -2 & -1 \\ 0 & 0 & 0 \\ 1 & 2 & 1 \end{bmatrix}

The ×2\times 2 center weights approximate a Gaussian-smoothed derivative — less sensitive to isolated noise than bare differences.

Checkpoint: Why is “differentiate then smooth” equivalent to “smooth then differentiate” for linear operators?

Convolution is associative: G(I)=(GI)G * (\partial I) = \partial (G * I).


Step 6 — Canny edge detector (full pipeline)

Canny (1986) is still the reference classical edge detector:

  1. Gaussian smooth — control σ\sigma.
  2. Gradient magnitude and angle — often Sobel.
  3. Non-maximum suppression (NMS): keep a pixel only if it is a local maximum along the gradient direction (thin edges).
  4. Double threshold: strong edges ThT_h, weak TlT_l. Strong pixels are edges; weak pixels kept only if connected to strong (hysteresis).
  5. Optional morphological cleanup.
ParameterToo lowToo high
σ\sigmaNoisy, cluttered edgesMiss thin structures
ThT_hEverything is an edgeBroken contours
TlT_lStreaks of weak edgesGaps in boundaries

Exercise: Why does NMS require knowing edge orientation, not just magnitude?

Without orientation you cannot decide which neighbors to compare along the ridge.


Step 7 — Separable kernels

If K=kkK = \mathbf{k}\,\mathbf{k}^\top, then

IK=(Ik)kI * K = (I * \mathbf{k}) * \mathbf{k}^\top

Cost: 2WHk2WHk vs WHk2W H k^2 for k×kk\times k 2D — for k=11k=11 that is ~6× fewer multiplies.

Figure

Separable kernel: 2D = row ⊗ column

Separable kernel: 2D = (1D row) × (1D column)A 2D Gaussian blur runs as two 1D passes — far fewer multiplications.121242121K (3×3)=121k (1×3)121kᵀ (3×1)
A 3×3 Gaussian-like kernel as the outer product of two 1D kernels. Two 1D passes replace one 2D pass — much faster as kernels grow.

Exercise: A 1D Gaussian has 11 taps. How many multiplies per pixel for separable 2D vs naive 11×11?

Separable: 11+11=2211+11=22; full: 121121.


Step 8 — Frequency intuition (short)

Convolution in space is multiplication in frequency: low-pass kernels attenuate high frequencies (noise, texture); high-pass (Laplacian, second derivative) emphasize edges.

Laplacian of Gaussian (LoG): 2(GσI)\nabla^2 (G_\sigma * I) — blob detector at scale σ\sigma; zero-crossings locate edges. Used historically; today learned filters subsume much of this.


Deep dive — edges are not always “object boundaries”

Edge causeGeometric boundary?
Depth discontinuityOften yes
Cast shadowNo — same surface
Texture (stripes)No
Specular highlightNo
Albedo change (paint)Sometimes

Intensity-only edge detectors cannot disambiguate these without depth, motion, stereo, or learning.


Bridge to the next lessons

  • Corners (next module) need variation in two directions — built from gradient structure tensors.
  • CNNs replace hand-designed KK with learned filters but keep locality and translation equivariance.

Check your understanding

  1. What is the difference between an edge due to depth discontinuity vs cast shadow — can intensity alone always tell them apart?
  2. Why do CNNs use small kernels repeatedly rather than one giant kernel?
  3. Name two boundary policies and one artifact each can introduce.
  4. In Canny, what problem does hysteresis solve?
  5. Why is median filtering not equivalent to a single convolution kernel?

Lab-style stretch goals

Implement Sobel magnitude + NMS + hysteresis on grayscale (or use OpenCV Canny and compare your thresholds).

Color: Convert to LAB, run edges on L channel only vs each RGB channel — when does chroma create spurious edges?