← Back to curriculum

Module 3 — Function approximation

Why tabular methods break

State-space explosion, generalization, and partial observability preview.

~55 min read + exercises

Why tabular methods break

Before we begin

Tabular Q-learning works when you can store one number per state or per (state, action) pair. Real problems — Atari frames, robot joint angles, language observations — have enormous or continuous state spaces. Visiting every state infinitely often is impossible.

Function approximation replaces tables with parameterized functions: linear weights, neural networks, decision trees. Generalization helps; wrong generalization hurts.


What you will learn

  • Quantify curse of dimensionality for tabular methods.
  • Explain why generalization across similar states is necessary.
  • Identify when tabular methods still win (small discrete MDPs).
  • Map observations to feature vectors for approximators.
  • Preview failure modes that motivate Module 3–4 stabilizers.

State space growth

ProblemRough state count
4×4 FrozenLake16
Chess board~10⁴⁷ legal positions
Atari 84×84 grayscale256⁷⁰⁵⁶
Continuous pendulum angle ωUncountably infinite

Actions: discrete 18 on Atari vs continuous torque on Pendulum.

Storage: |S| × |A| floats for Q-table. 10⁶ states × 10 actions × 4 bytes ≈ 40 MB — manageable. 10¹² is not.


Generalization idea

Instead of Q(s, a), learn Q̂(s, a; w) shared across states.

ApproachParametersGeneralizes how
TabularO(S
Linear FAO(d) featuresSimilar features → similar Q
Neural netO(weights)Representation learning

Similar states should have similar values — smoothness assumption.


Feature design examples

CartPole (4-dim observation): raw obs as features, or hand-crafted polynomials.

Atari: convolutional net on stacked frames — features learned, not hand-written.

python
# Linear Q: Q_hat(s,a) = w_a · phi(s)
import numpy as np
 
def phi_cartpole(obs):
    x, x_dot, theta, theta_dot = obs
    return np.array([1.0, x, x_dot, theta, theta_dot, x**2, theta**2])
 
# One weight vector per action
W = np.zeros((2, 7))  # 2 actions, 7 features

When tabular still wins

  • Tiny grids for teaching and debugging
  • Exact DP / value iteration baselines
  • Discrete bandits with few arms
  • Verification against FA implementations

Always solve a small tabular version before scaling.


What breaks with naive FA

IssueSymptom
ExtrapolationWild Q values off distribution
Moving targetNetwork chases its own predictions
Correlated samplesOverfits recent trajectory

These motivate linear methods first, then deadly triad analysis and DQN tricks.

Checkpoint: 1000 states, 10 actions, visit each 100 times — 10⁶ samples. 10⁶ states with FA and 10⁴ parameters — need 10⁶ samples still?

Answer

Not necessarily — generalization shares learning across states. You might need far fewer if features capture structure. Wrong features → need as many as tabular effectively.


Observation vs state revisited

High-dimensional pixels are compressed into latent features. If compression loses Markov information, FA cannot fix POMDP — fix representation (frame stack, RNN, belief state).


Common mistakes

  • Jumping to deep nets on a problem solvable tabular — hides bugs.
  • No normalization of continuous inputs — training unstable.
  • Measuring “state space size” from raw pixels instead of learned feature dimension.
  • Expecting FA to converge like tabular Q-learning without stabilizers.

Tabular methods do not scale; we need functions. Linear function approximation is the cleanest next step — convex updates, clear theory, and a bridge to neural nets.


Before this lesson


What's next