Monitoring, logging & error handling
Before we begin
When the LLM times out at 2 a.m., you need logs and metrics — not guesswork. Production GenAI requires the same observability discipline as any backend service.
Figure
Observability pipeline
What you will learn
- Log each request with correlation IDs and token counts.
- Track latency, errors, cache hit rate, and cost proxies.
- Handle failures gracefully — visible to users, actionable for ops.
Before this lesson
Structured logging
One JSON line per request — easy to query in Datadog, Axiom, CloudWatch, etc.
console.log(JSON.stringify({
level: "info",
requestId: crypto.randomUUID(),
route: "/api/chat",
model: "gpt-4o-mini",
latencyMs: 1840,
inputTokens: 2100,
outputTokens: 340,
cacheHit: false,
userId: "user_abc", // no PII you cannot retain legally
}));Never log: API keys, raw passwords, full credit card numbers.
Metrics worth tracking
| Metric | Why |
|---|---|
llm_latency_ms p50/p95 | SLA and regressions |
llm_errors_total | Provider outages |
tokens_input / tokens_output | Cost forecasting |
cache_hit_ratio | Cache effectiveness |
rag_chunks_retrieved | Retrieval tuning |
agent_steps_count | Runaway loop detection |
Start simple — a /admin/metrics JSON endpoint or hosted dashboard.
Traces (optional but valuable)
Wrap sub-steps in spans: embed, retrieve, llm_call, tool_weather. See which span dominates p95.
OpenTelemetry integrates with most hosts.
Error handling patterns
Upstream timeout
catch (e) {
if (e.name === "AbortError") {
return Response.json(
{ error: "The AI service timed out. Please try a shorter question." },
{ status: 504 }
);
}
logger.error({ requestId, err: String(e) });
return Response.json({ error: "Something went wrong." }, { status: 500 });
}Graceful degradation
- Tool API down → answer from RAG only, show banner “live data unavailable”.
- Redis down → skip cache, log warning, continue (higher cost OK briefly).
Retries
- Idempotent LLM read calls: retry 429/503 with backoff.
- Tool writes: no automatic retry without idempotency keys.
User trust on failure
Honest messages beat silent hangs. Show request ID in support-facing errors so users can report issues you can grep.
Checkpoint
Describe what you would log for one RAG chat request and one alert you'd set on llm_errors_total.