Why tabular methods break

Before we begin

Tabular Q-learning works when you can store one number per state or per (state, action) pair. Real problems — Atari frames, robot joint angles, language observations — have enormous or continuous state spaces. Visiting every state infinitely often is impossible.

Function approximation replaces tables with parameterized functions: linear weights, neural networks, decision trees. Generalization helps; wrong generalization hurts.

What you will learn

Quantify curse of dimensionality for tabular methods.
Explain why generalization across similar states is necessary.
Identify when tabular methods still win (small discrete MDPs).
Map observations to feature vectors for approximators.
Preview failure modes that motivate Module 3–4 stabilizers.

State space growth

Problem	Rough state count
4×4 FrozenLake	16
Chess board	~10⁴⁷ legal positions
Atari 84×84 grayscale	256⁷⁰⁵⁶
Continuous pendulum angle ω	Uncountably infinite

Actions: discrete 18 on Atari vs continuous torque on Pendulum.

Storage: |S| × |A| floats for Q-table. 10⁶ states × 10 actions × 4 bytes ≈ 40 MB — manageable. 10¹² is not.

Generalization idea

Instead of Q(s, a), learn Q̂(s, a; w) shared across states.

Approach	Parameters	Generalizes how
Tabular	O(	S
Linear FA	O(d) features	Similar features → similar Q
Neural net	O(weights)	Representation learning

Similar states should have similar values — smoothness assumption.

Feature design examples

CartPole (4-dim observation): raw obs as features, or hand-crafted polynomials.

Atari: convolutional net on stacked frames — features learned, not hand-written.

python

# Linear Q: Q_hat(s,a) = w_a · phi(s)
import numpy as np
 
def phi_cartpole(obs):
    x, x_dot, theta, theta_dot = obs
    return np.array([1.0, x, x_dot, theta, theta_dot, x**2, theta**2])
 
# One weight vector per action
W = np.zeros((2, 7))  # 2 actions, 7 features

When tabular still wins

Tiny grids for teaching and debugging
Exact DP / value iteration baselines
Discrete bandits with few arms
Verification against FA implementations

Always solve a small tabular version before scaling.

What breaks with naive FA

Issue	Symptom
Extrapolation	Wild Q values off distribution
Moving target	Network chases its own predictions
Correlated samples	Overfits recent trajectory

These motivate linear methods first, then deadly triad analysis and DQN tricks.

Checkpoint: 1000 states, 10 actions, visit each 100 times — 10⁶ samples. 10⁶ states with FA and 10⁴ parameters — need 10⁶ samples still?

Answer

Not necessarily — generalization shares learning across states. You might need far fewer if features capture structure. Wrong features → need as many as tabular effectively.

Observation vs state revisited

High-dimensional pixels are compressed into latent features. If compression loses Markov information, FA cannot fix POMDP — fix representation (frame stack, RNN, belief state).

Common mistakes

Jumping to deep nets on a problem solvable tabular — hides bugs.
No normalization of continuous inputs — training unstable.
Measuring “state space size” from raw pixels instead of learned feature dimension.
Expecting FA to converge like tabular Q-learning without stabilizers.

Tabular methods do not scale; we need functions. Linear function approximation is the cleanest next step — convex updates, clear theory, and a bridge to neural nets.

Before this lesson

Previous lesson

What's next

Next lesson — Linear function approximation