Monitoring, logging & error handling

Before we begin

When the LLM times out at 2 a.m., you need logs and metrics — not guesswork. Production GenAI requires the same observability discipline as any backend service.

Figure

Observability pipeline

Structured logs, metrics, and traces feed dashboards and alerts.

What you will learn

Log each request with correlation IDs and token counts.
Track latency, errors, cache hit rate, and cost proxies.
Handle failures gracefully — visible to users, actionable for ops.

Before this lesson

Structured logging

One JSON line per request — easy to query in Datadog, Axiom, CloudWatch, etc.

typescript

console.log(JSON.stringify({
  level: "info",
  requestId: crypto.randomUUID(),
  route: "/api/chat",
  model: "gpt-4o-mini",
  latencyMs: 1840,
  inputTokens: 2100,
  outputTokens: 340,
  cacheHit: false,
  userId: "user_abc", // no PII you cannot retain legally
}));

Never log: API keys, raw passwords, full credit card numbers.

Metrics worth tracking

Metric	Why
`llm_latency_ms` p50/p95	SLA and regressions
`llm_errors_total`	Provider outages
`tokens_input` / `tokens_output`	Cost forecasting
`cache_hit_ratio`	Cache effectiveness
`rag_chunks_retrieved`	Retrieval tuning
`agent_steps_count`	Runaway loop detection

Start simple — a /admin/metrics JSON endpoint or hosted dashboard.

Traces (optional but valuable)

Wrap sub-steps in spans: embed, retrieve, llm_call, tool_weather. See which span dominates p95.

OpenTelemetry integrates with most hosts.

Error handling patterns

Upstream timeout

typescript

catch (e) {
  if (e.name === "AbortError") {
    return Response.json(
      { error: "The AI service timed out. Please try a shorter question." },
      { status: 504 }
    );
  }
  logger.error({ requestId, err: String(e) });
  return Response.json({ error: "Something went wrong." }, { status: 500 });
}

Graceful degradation

Tool API down → answer from RAG only, show banner “live data unavailable”.
Redis down → skip cache, log warning, continue (higher cost OK briefly).

Retries

Idempotent LLM read calls: retry 429/503 with backoff.
Tool writes: no automatic retry without idempotency keys.

User trust on failure

Honest messages beat silent hangs. Show request ID in support-facing errors so users can report issues you can grep.

Checkpoint

Describe what you would log for one RAG chat request and one alert you'd set on llm_errors_total.

What's next

Module 10 quiz