← Back to curriculum

Module 9 — Production & advanced topics

Project: production RL serving

Serve a trained policy via FastAPI; health checks, batch inference, latency logging.

~180 min read + exercises

Project: Production RL serving

Before we begin

This capstone is not another training loop — you serve a trained policy behind an HTTP API with health checks, batch inference, and latency logging, matching how RL models reach production in Module 9 lessons.


How this connects to Module 9

LessonWhere you use it
Offline RLServe a frozen policy from logged training
Safety & deploymentValidate actions, fail closed on bad inputs
MonitoringLog p50/p95 latency, request counts, errors
EvaluationShadow mode / A-B hooks (stub acceptable)

What you will build

PieceTechPurpose
Trained policySB3 .zip or PyTorch .ptCartPole or Pendulum from earlier modules
serve.pyFastAPIPOST /act, GET /health
client.pyrequestsLoad test + latency stats
logs/JSON linesTimestamp, obs hash, action, latency_ms

Folder layout:

text
rl-serving/
  policies/cartpole_ppo.zip   # export from Module 5/6 or quick SB3 train
  serve.py
  client.py
  logs/requests.jsonl
  README.md                   # architecture diagram in words + SLO notes

Estimated time: 4–6 hours.


Before you start

  • Finish the Module 9 quiz.
  • pip install fastapi uvicorn stable-baselines3 gymnasium pydantic
  • Have any trained discrete-action policy (CartPole PPO/DQN is fine).

Quick policy if needed:

python
import gymnasium as gym
from stable_baselines3 import PPO
env = gym.make("CartPole-v1")
model = PPO("MlpPolicy", env, verbose=0)
model.learn(50_000)
model.save("policies/cartpole_ppo")

Step 1 — FastAPI service

python
# serve.py
import time
import json
from pathlib import Path
 
import gymnasium as gym
import numpy as np
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
from stable_baselines3 import PPO
 
app = FastAPI(title="RL Policy Server")
model = PPO.load("policies/cartpole_ppo")
LOG = Path("logs/requests.jsonl")
LOG.parent.mkdir(exist_ok=True)
 
class ObsRequest(BaseModel):
    observation: list[float] = Field(..., min_length=4, max_length=4)
 
@app.get("/health")
def health():
    return {"status": "ok", "model": "cartpole_ppo"}
 
@app.post("/act")
def act(req: ObsRequest):
    t0 = time.perf_counter()
    obs = np.array(req.observation, dtype=np.float32)
    if not np.all(np.isfinite(obs)):
        raise HTTPException(400, "Invalid observation")
    action, _ = model.predict(obs, deterministic=True)
    latency_ms = (time.perf_counter() - t0) * 1000
    record = {"obs": req.observation, "action": int(action), "latency_ms": latency_ms}
    with LOG.open("a") as f:
        f.write(json.dumps(record) + "\n")
    return {"action": int(action), "latency_ms": round(latency_ms, 3)}

Run: uvicorn serve:app --host 0.0.0.0 --port 8000


Step 2 — Batch endpoint (stretch)

python
class BatchRequest(BaseModel):
    observations: list[list[float]]
 
@app.post("/act/batch")
def act_batch(req: BatchRequest):
  # predict each row; return actions + total latency
  ...

Step 3 — Client load test

python
# client.py
import time
import requests
import numpy as np
 
url = "http://127.0.0.1:8000/act"
latencies = []
for _ in range(200):
    obs = np.random.randn(4).tolist()  # use valid CartPole ranges in real test
    t0 = time.perf_counter()
    r = requests.post(url, json={"observation": obs}, timeout=5)
    r.raise_for_status()
    latencies.append((time.perf_counter() - t0) * 1000)
 
latencies.sort()
print("p50:", latencies[len(latencies)//2], "ms")
print("p95:", latencies[int(len(latencies)*0.95)], "ms")

Use realistic observations (e.g. from env.reset()) for meaningful tests.


Step 4 — Monitoring checklist

SignalHow
AvailabilityGET /health returns 200
Latencyp50/p95 from client or logs
Errors400 on bad obs; 500 logged
VersionInclude model name in health JSON

Success criteria

CriterionTarget
/health and /act work locallyRequired
200 requests without crashRequired
p95 latency < 50 ms on CPU (CartPole MLP)Typical
JSONL log with ≥ 50 entriesRequired
README describes rollback if policy regressesRecommended

Extension ideas

  • Dockerize with Dockerfile + HEALTHCHECK.
  • Prometheus /metrics counter for requests and errors.
  • Shadow mode: log what a new policy would do without serving it.

What's next

Return to the course curriculum and continue to the next module when your project runs end-to-end.