Why tabular methods break
Before we begin
Tabular Q-learning works when you can store one number per state or per (state, action) pair. Real problems — Atari frames, robot joint angles, language observations — have enormous or continuous state spaces. Visiting every state infinitely often is impossible.
Function approximation replaces tables with parameterized functions: linear weights, neural networks, decision trees. Generalization helps; wrong generalization hurts.
What you will learn
- Quantify curse of dimensionality for tabular methods.
- Explain why generalization across similar states is necessary.
- Identify when tabular methods still win (small discrete MDPs).
- Map observations to feature vectors for approximators.
- Preview failure modes that motivate Module 3–4 stabilizers.
State space growth
| Problem | Rough state count |
|---|---|
| 4×4 FrozenLake | 16 |
| Chess board | ~10⁴⁷ legal positions |
| Atari 84×84 grayscale | 256⁷⁰⁵⁶ |
| Continuous pendulum angle ω | Uncountably infinite |
Actions: discrete 18 on Atari vs continuous torque on Pendulum.
Storage: |S| × |A| floats for Q-table. 10⁶ states × 10 actions × 4 bytes ≈ 40 MB — manageable. 10¹² is not.
Generalization idea
Instead of Q(s, a), learn Q̂(s, a; w) shared across states.
| Approach | Parameters | Generalizes how |
|---|---|---|
| Tabular | O( | S |
| Linear FA | O(d) features | Similar features → similar Q |
| Neural net | O(weights) | Representation learning |
Similar states should have similar values — smoothness assumption.
Feature design examples
CartPole (4-dim observation): raw obs as features, or hand-crafted polynomials.
Atari: convolutional net on stacked frames — features learned, not hand-written.
# Linear Q: Q_hat(s,a) = w_a · phi(s)
import numpy as np
def phi_cartpole(obs):
x, x_dot, theta, theta_dot = obs
return np.array([1.0, x, x_dot, theta, theta_dot, x**2, theta**2])
# One weight vector per action
W = np.zeros((2, 7)) # 2 actions, 7 featuresWhen tabular still wins
- Tiny grids for teaching and debugging
- Exact DP / value iteration baselines
- Discrete bandits with few arms
- Verification against FA implementations
Always solve a small tabular version before scaling.
What breaks with naive FA
| Issue | Symptom |
|---|---|
| Extrapolation | Wild Q values off distribution |
| Moving target | Network chases its own predictions |
| Correlated samples | Overfits recent trajectory |
These motivate linear methods first, then deadly triad analysis and DQN tricks.
Checkpoint: 1000 states, 10 actions, visit each 100 times — 10⁶ samples. 10⁶ states with FA and 10⁴ parameters — need 10⁶ samples still?
Answer
Not necessarily — generalization shares learning across states. You might need far fewer if features capture structure. Wrong features → need as many as tabular effectively.
Observation vs state revisited
High-dimensional pixels are compressed into latent features. If compression loses Markov information, FA cannot fix POMDP — fix representation (frame stack, RNN, belief state).
Common mistakes
- Jumping to deep nets on a problem solvable tabular — hides bugs.
- No normalization of continuous inputs — training unstable.
- Measuring “state space size” from raw pixels instead of learned feature dimension.
- Expecting FA to converge like tabular Q-learning without stabilizers.
Tabular methods do not scale; we need functions. Linear function approximation is the cleanest next step — convex updates, clear theory, and a bridge to neural nets.