← Back to curriculum

Module 3 — Function approximation

Project: CartPole with linear FA

Tile coding or polynomial features + semi-gradient TD on CartPole-v1.

~110 min read + exercises

Project: CartPole with linear FA

Before we begin

Solve CartPole-v1 using linear function approximation — hand-crafted features and a linear model for action values or state values. This bridges tabular Q-learning (Module 2) and neural DQN (Module 4) without jumping straight to deep nets.


How this connects to Module 3

LessonWhere you use it
Why tabular breaksCartPole has continuous-ish observations
Linear FAQ̂(s,a) = wᵀ φ(s,a) or V̂(s) = wᵀ φ(s)
Neural nets as approximatorsPreview: same structure, nonlinear φ
Deadly triadWhy you still use target-style updates carefully

What you will build

PiecePurpose
Feature map φ(s)Polynomial / tile features over (x, ẋ, θ, θ̇)
Linear Q or SARSA(λ) with FAWeight vector w updated each step
Learning curveEpisode return vs episode

Folder layout:

text
cartpole-linear-fa/
  features.py        # φ(s) and φ(s,a)
  linear_agent.py
  train.py
  outputs/returns.png

Estimated time: 3–5 hours.


Before you start

  • Finish the Module 3 quiz.
  • pip install gymnasium numpy matplotlib

Step 1 — Features for CartPole

CartPole observation: [cart_pos, cart_vel, pole_angle, pole_vel]. Build expanded features:

python
import numpy as np
 
def phi(state: np.ndarray) -> np.ndarray:
    x, x_dot, theta, theta_dot = state
    return np.array([
        1.0, x, x_dot, theta, theta_dot,
        x**2, theta**2, x * theta,
        np.sin(theta), np.cos(theta),
    ])

For semi-gradient Q-learning with 2 actions, use phi(s, a): concatenate φ(s) with a one-hot action block, or maintain separate weight vectors per action.


Step 2 — Linear Q-learning update

python
class LinearQLearning:
    def __init__(self, n_features, n_actions, alpha=1e-4, gamma=0.99, epsilon=0.1):
        self.w = np.zeros((n_actions, n_features))
        self.alpha, self.gamma, self.epsilon = alpha, gamma, epsilon
        self.n_actions = n_actions
 
    def q_values(self, state):
        f = phi(state)
        return self.w @ f
 
    def act(self, state):
        if np.random.random() < self.epsilon:
            return np.random.randint(self.n_actions)
        return int(np.argmax(self.q_values(state)))
 
    def learn(self, s, a, r, s_next, done):
        f = phi(s)
        q_sa = self.w[a] @ f
        q_next = 0.0 if done else np.max(self.q_values(s_next))
        td_error = r + self.gamma * q_next - q_sa
        self.w[a] += self.alpha * td_error * f

Tune alpha — linear FA is sensitive (try 1e-5 … 1e-3).


Step 3 — Training

python
import gymnasium as gym
 
env = gym.make("CartPole-v1")
agent = LinearQLearning(n_features=10, n_actions=2)
returns = []
 
for ep in range(1000):
    s, _ = env.reset(seed=ep)
    total = 0.0
    done = False
    while not done:
        a = agent.act(s)
        s_next, r, term, trunc, _ = env.step(a)
        done = term or trunc
        agent.learn(s, a, r, s_next, done)
        s = s_next
        total += r
    returns.append(total)
    agent.epsilon = max(0.01, agent.epsilon * 0.995)

Success criteria

CriterionTarget
Mean return over last 100 episodes ≥ 195CartPole "solved" threshold
README notes which features mattered mostRequired
Compare random baseline vs trainedRequired

Extension ideas

  • Tile coding instead of polynomials (coarse bins per dimension).
  • Replace linear model with a 1-hidden-layer net (preview Module 4).

What's next

Return to the course curriculum and continue to the next module when your project runs end-to-end.