Pixels, convolution, and edges

Here you will treat an image as a function on a grid and build intuition for linear filtering — the same family of operations that generalizes into the first layers of convolutional neural networks.

Figure

A 3×3 kernel sweeping across pixels

At every location the kernel multiplies neighbors by its weights and sums them — here visualized with Sobel-x weights.

Learning objectives

Represent a grayscale image as $\mathrm{I}[x,y]$ on a discrete lattice.
Apply a 2D convolution with small kernels by hand (at least conceptually).
Connect gradients and edge strength to differences of neighboring pixels.
Explain separable kernels and why they matter computationally.

Prerequisites

Lesson: Light, sensors, and the imaging pipeline (you should be comfortable with “what is a pixel value”).
Basic idea of averaging and weighted sums.

Step 1 — Images as discrete signals

Let $\mathrm{I}[x,y]$ be intensity at integer coordinates $(x,y)$ .

Domain: finite rectangle $0 \le x < W$ , $0 \le y < H$ .
Boundary handling: when a filter asks for neighbors outside the image, common choices are zero padding, reflect, or clamp. Different choices change edges slightly.

Checkpoint: Why do boundary pixels behave differently under convolution almost no matter what you do?

Step 2 — What convolution means here (cross-correlation note)

In vision libraries, “conv2d” often implements cross-correlation:

(I * K)[x,y] = \sum_{i}\sum_{j} I[x+i, y+j]\, K[i,j]

For learning, the key is local linear combinations of neighbors, not the formal sign flip between convolution vs correlation.

Exercise: With a 3×3 kernel and zero padding, write the formula for the center pixel update as a nested sum in words: “multiply each neighbor by weight and add.”

Step 3 — Mean smoothing and its trade-offs

A uniform 3×3 box filter averages the 3×3 neighborhood.

Pros: suppresses random noise (to a degree).
Cons: blurs edges — exactly where much semantic information lives.

Checkpoint: If you smooth before edge detection, what happens to edge maps visually and why?

Step 4 — Gradients: discrete derivatives

A simple derivative along $x$ can be approximated by differences, e.g.

G_x[x,y] \approx I[x+1,y] - I[x,y]

Similarly for $G_y$ . Gradient magnitude is often summarized as

\|\nabla I\| \approx \sqrt{G_x^2 + G_y^2}

and orientation as $\operatorname{atan2}(G_y, G_x)$ .

Exercise (small numeric example): Invent a 1×5 row of pixel intensities that step up once. Compute $G_x$ by forward differences and mark where the edge response peaks.

Figure

Step edge → peaked gradient

Forward differences on a 1D row of pixels: the gradient lights up exactly where intensity jumps. The 2D story is the same in each direction.

Step 5 — Sobel and Prewitt kernels

Sobel operators combine smoothing with differentiation — less sensitive to isolated noise spikes than a bare difference.

You should memorize the idea more than the exact numbers:

Estimate derivative in one axis while lightly averaging perpendicular to that axis.

Checkpoint: Why is “differentiate then smooth” often similar in spirit to “smooth then differentiate” for linear operators?

Step 6 — Separable kernels

A 2D kernel $K[i,j]$ is separable if it can be written as an outer product of 1D kernels:

K = k\, k^\top

Then 2D convolution can be done as two 1D passes — fewer operations for large kernels.

Figure

Separable kernel: 2D = row ⊗ column

A 3×3 Gaussian-like kernel as the outer product of two 1D kernels. Two 1D passes replace one 2D pass — much faster as kernels grow.

Exercise: Explain in one paragraph why a 2D Gaussian blur is commonly implemented as two 1D convolutions.

Step 7 — From edges to “features” (preview)

Edges are local. Later modules use them as building blocks for corners, blobs, and learned feature maps. The deep-learning story replaces hand-designed $K$ with learned filters — but the locality and translation structure remain.

Check your understanding

What is the difference between an edge due to depth discontinuity vs an edge due to cast shadow — can intensity alone always tell them apart?
Why do convolutional neural networks use small kernels repeatedly rather than one giant kernel?
Name two boundary policies and one artifact each can introduce.

Lab-style stretch goal (optional)

Implement Sobel magnitude on a grayscale image and threshold the magnitude map. Tune the threshold: what disappears first — texture or object boundaries?