← Back to curriculum

Module 10 — Production & scaling

Monitoring, logging & errors

Structured logs, metrics, traces, graceful degradation, retries with backoff, and user-visible failure messages.

~75 min read + exercises

Monitoring, logging & error handling

Before we begin

When the LLM times out at 2 a.m., you need logs and metrics — not guesswork. Production GenAI requires the same observability discipline as any backend service.

Figure

Observability pipeline

APIrequestLogsJSONMetricsp95 msTracesspansDashalert
Structured logs, metrics, and traces feed dashboards and alerts.

What you will learn

  • Log each request with correlation IDs and token counts.
  • Track latency, errors, cache hit rate, and cost proxies.
  • Handle failures gracefully — visible to users, actionable for ops.

Before this lesson


Structured logging

One JSON line per request — easy to query in Datadog, Axiom, CloudWatch, etc.

typescript
console.log(JSON.stringify({
  level: "info",
  requestId: crypto.randomUUID(),
  route: "/api/chat",
  model: "gpt-4o-mini",
  latencyMs: 1840,
  inputTokens: 2100,
  outputTokens: 340,
  cacheHit: false,
  userId: "user_abc", // no PII you cannot retain legally
}));

Never log: API keys, raw passwords, full credit card numbers.


Metrics worth tracking

MetricWhy
llm_latency_ms p50/p95SLA and regressions
llm_errors_totalProvider outages
tokens_input / tokens_outputCost forecasting
cache_hit_ratioCache effectiveness
rag_chunks_retrievedRetrieval tuning
agent_steps_countRunaway loop detection

Start simple — a /admin/metrics JSON endpoint or hosted dashboard.


Traces (optional but valuable)

Wrap sub-steps in spans: embed, retrieve, llm_call, tool_weather. See which span dominates p95.

OpenTelemetry integrates with most hosts.


Error handling patterns

Upstream timeout

typescript
catch (e) {
  if (e.name === "AbortError") {
    return Response.json(
      { error: "The AI service timed out. Please try a shorter question." },
      { status: 504 }
    );
  }
  logger.error({ requestId, err: String(e) });
  return Response.json({ error: "Something went wrong." }, { status: 500 });
}

Graceful degradation

  • Tool API down → answer from RAG only, show banner “live data unavailable”.
  • Redis down → skip cache, log warning, continue (higher cost OK briefly).

Retries

  • Idempotent LLM read calls: retry 429/503 with backoff.
  • Tool writes: no automatic retry without idempotency keys.

User trust on failure

Honest messages beat silent hangs. Show request ID in support-facing errors so users can report issues you can grep.


Checkpoint

Describe what you would log for one RAG chat request and one alert you'd set on llm_errors_total.


What's next

Module 10 quiz