← Back to curriculum

Imaging & digital images

Pixels, convolution, and edges

Grids, channels, linear filters, separable kernels, gradients, and building intuition before neural networks.

~60 min read + exercises

Pixels, convolution, and edges

Here you will treat an image as a function on a grid and build intuition for linear filtering — the same family of operations that generalizes into the first layers of convolutional neural networks.

Figure

A 3×3 kernel sweeping across pixels

A 3×3 kernel sweeps every pixelAt each location, multiply neighbors by kernel weights and sum.-101-202-101Output pixelo[x, y] = Σ I[x+i, y+j]· K[i, j] (here Sobel-x)Image I[x,y] on a discrete grid
At every location the kernel multiplies neighbors by its weights and sums them — here visualized with Sobel-x weights.

Learning objectives

  • Represent a grayscale image as I[x,y]\mathrm{I}[x,y] on a discrete lattice.
  • Apply a 2D convolution with small kernels by hand (at least conceptually).
  • Connect gradients and edge strength to differences of neighboring pixels.
  • Explain separable kernels and why they matter computationally.

Prerequisites

  • Lesson: Light, sensors, and the imaging pipeline (you should be comfortable with “what is a pixel value”).
  • Basic idea of averaging and weighted sums.

Step 1 — Images as discrete signals

Let I[x,y]\mathrm{I}[x,y] be intensity at integer coordinates (x,y)(x,y).

  • Domain: finite rectangle 0x<W0 \le x < W, 0y<H0 \le y < H.
  • Boundary handling: when a filter asks for neighbors outside the image, common choices are zero padding, reflect, or clamp. Different choices change edges slightly.

Checkpoint: Why do boundary pixels behave differently under convolution almost no matter what you do?


Step 2 — What convolution means here (cross-correlation note)

In vision libraries, “conv2d” often implements cross-correlation:

(IK)[x,y]=ijI[x+i,y+j]K[i,j](I * K)[x,y] = \sum_{i}\sum_{j} I[x+i, y+j]\, K[i,j]

For learning, the key is local linear combinations of neighbors, not the formal sign flip between convolution vs correlation.

Exercise: With a 3×3 kernel and zero padding, write the formula for the center pixel update as a nested sum in words: “multiply each neighbor by weight and add.”


Step 3 — Mean smoothing and its trade-offs

A uniform 3×3 box filter averages the 3×3 neighborhood.

  • Pros: suppresses random noise (to a degree).
  • Cons: blurs edges — exactly where much semantic information lives.

Checkpoint: If you smooth before edge detection, what happens to edge maps visually and why?


Step 4 — Gradients: discrete derivatives

A simple derivative along xx can be approximated by differences, e.g.

Gx[x,y]I[x+1,y]I[x,y]G_x[x,y] \approx I[x+1,y] - I[x,y]

Similarly for GyG_y. Gradient magnitude is often summarized as

IGx2+Gy2\|\nabla I\| \approx \sqrt{G_x^2 + G_y^2}

and orientation as atan2(Gy,Gx)\operatorname{atan2}(G_y, G_x).

Exercise (small numeric example): Invent a 1×5 row of pixel intensities that step up once. Compute GxG_x by forward differences and mark where the edge response peaks.

Figure

Step edge → peaked gradient

Step edge → peaked gradient responseForward difference Gx[i] = I[i+1] − I[i]. The edge lights up exactly at the jump.Intensity I[x]40404040200200200200200Gradient Gx[x]00016000000The peak in |Gx| localizes the edge.Smoothing before differentiation broadens the peak (Sobel ≈ smooth + differentiate).
Forward differences on a 1D row of pixels: the gradient lights up exactly where intensity jumps. The 2D story is the same in each direction.

Step 5 — Sobel and Prewitt kernels

Sobel operators combine smoothing with differentiation — less sensitive to isolated noise spikes than a bare difference.

You should memorize the idea more than the exact numbers:

  • Estimate derivative in one axis while lightly averaging perpendicular to that axis.

Checkpoint: Why is “differentiate then smooth” often similar in spirit to “smooth then differentiate” for linear operators?


Step 6 — Separable kernels

A 2D kernel K[i,j]K[i,j] is separable if it can be written as an outer product of 1D kernels:

K=kkK = k\, k^\top

Then 2D convolution can be done as two 1D passes — fewer operations for large kernels.

Figure

Separable kernel: 2D = row ⊗ column

Separable kernel: 2D = (1D row) × (1D column)A 2D Gaussian blur runs as two 1D passes — far fewer multiplications.121242121K (3×3)=121k (1×3)121kᵀ (3×1)
A 3×3 Gaussian-like kernel as the outer product of two 1D kernels. Two 1D passes replace one 2D pass — much faster as kernels grow.

Exercise: Explain in one paragraph why a 2D Gaussian blur is commonly implemented as two 1D convolutions.


Step 7 — From edges to “features” (preview)

Edges are local. Later modules use them as building blocks for corners, blobs, and learned feature maps. The deep-learning story replaces hand-designed KK with learned filters — but the locality and translation structure remain.


Check your understanding

  1. What is the difference between an edge due to depth discontinuity vs an edge due to cast shadow — can intensity alone always tell them apart?
  2. Why do convolutional neural networks use small kernels repeatedly rather than one giant kernel?
  3. Name two boundary policies and one artifact each can introduce.

Lab-style stretch goal (optional)

Implement Sobel magnitude on a grayscale image and threshold the magnitude map. Tune the threshold: what disappears first — texture or object boundaries?