Project: production-ready GenAI app
Before we begin
This is the course capstone. Combine your Module 7 RAG chatbot and Module 8 agent patterns into one Next.js app — then add what separates demos from portfolio pieces: caching, rate limits, logging, and error handling.
Figure
What you are building
How this connects to Module 10
| Lesson | Where you use it |
|---|---|
| Caching | Redis keyed by query + INDEX_VERSION |
| Rate limiting | Per-user token bucket on /api/chat |
| Cost / tokens | Log inputTokens + outputTokens per request |
| Monitoring | JSON logs + /admin/metrics counters |
| Error handling | Timeouts, graceful 502, cache bypass if Redis down |
Folder layout:
production-genai/
app/
api/chat/route.ts
api/admin/metrics/route.ts
chat-lab/page.tsx
lib/
redis.ts
rateLimit.ts
logger.ts
rag.ts # from Module 7
agent.ts # from Module 8
.env.local # OPENAI_API_KEY, REDIS_URL, INDEX_VERSIONWhat you will build
- Unified chat UI — streaming answers with citations (RAG mode).
- Agent mode toggle — e.g. “research this topic” runs tool loop with visible steps.
- Redis cache — completion cache keyed by query + index version.
- Rate limiting — per-user or per-IP on
/api/chat. - Structured logging — JSON per request with tokens and latency.
- Metrics page — simple
/admin/metricsor dashboard card (request count, cache hit %, avg latency). - Graceful errors — timeouts and upstream failures show clear UI messages.
Estimated time: 6–10 hours.
Before you start
- Finish Module 10 quiz.
- Have Module 7 RAG index code and Module 8 agent/tool code available to merge or copy.
- Redis running locally (
docker run -p 6379:6379 redis) or Upstash URL in.env.local.
Suggested folder: extend your existing Next.js app or create production-genai/.
Step 1 — Architecture (document in README)
User → POST /api/chat
→ rate limit check (Redis INCR + TTL)
→ cache lookup (hash of last user message + mode + INDEX_VERSION)
→ miss:
mode=rag → retrieve chunks → LLM stream
mode=agent → tool loop (max 8 steps)
→ store cache (TTL 1h)
→ structured log + metrics counters
→ JSON response (or SSE stream)Env vars:
| Variable | Purpose |
|---|---|
OPENAI_API_KEY | Chat + embeddings |
REDIS_URL | redis://127.0.0.1:6379 or Upstash |
INDEX_VERSION | Bump when you rebuild FAISS index |
RATE_LIMIT_PER_MIN | e.g. 30 |
Step 2 — Redis + cache helpers
// lib/redis.ts
import { createClient } from "redis";
const globalForRedis = globalThis as unknown as { redis?: ReturnType<typeof createClient> };
export async function getRedis() {
if (!globalForRedis.redis) {
globalForRedis.redis = createClient({ url: process.env.REDIS_URL });
globalForRedis.redis.on("error", (e) => console.error("redis", e));
await globalForRedis.redis.connect();
}
return globalForRedis.redis;
}
export function buildCacheKey(messages: { role: string; content: string }[], mode: string) {
const lastUser = [...messages].reverse().find((m) => m.role === "user")?.content ?? "";
const normalized = lastUser.trim().toLowerCase().replace(/\s+/g, " ");
const version = process.env.INDEX_VERSION ?? "v1";
return `chat:${mode}:${version}:${hashString(normalized)}`;
}
function hashString(s: string) {
let h = 0;
for (let i = 0; i < s.length; i++) h = (Math.imul(31, h) + s.charCodeAt(i)) | 0;
return h.toString(36);
}Why include INDEX_VERSION? After re-indexing blog posts, old cached answers would cite stale chunks without it.
Step 3 — Rate limiting
// lib/rateLimit.ts
import { getRedis } from "./redis";
export async function rateLimit(userId: string, limit = 30, windowSec = 60) {
const redis = await getRedis();
const key = `rl:${userId}:${Math.floor(Date.now() / 1000 / windowSec)}`;
const count = await redis.incr(key);
if (count === 1) await redis.expire(key, windowSec);
return { ok: count <= limit, remaining: Math.max(0, limit - count) };
}
export function getUserId(req: Request) {
return req.headers.get("x-user-id") ?? "anon";
}Return 429 with { error: "Rate limit exceeded", retryAfterSec: 60 } — show friendly banner in UI.
Step 4 — Structured logging
// lib/logger.ts
export function logChatEvent(event: Record<string, unknown>) {
console.log(JSON.stringify({ ts: new Date().toISOString(), ...event }));
}
export async function bumpMetric(name: string) {
try {
const redis = await getRedis();
await redis.incr(`metrics:${name}`);
} catch {
// Redis down — don't fail the request
}
}Every chat request should log: requestId, mode, cacheHit, latencyMs, inputTokens, outputTokens, userId.
Step 5 — Unified /api/chat route
// app/api/chat/route.ts
import { getRedis, buildCacheKey } from "@/lib/redis";
import { rateLimit, getUserId } from "@/lib/rateLimit";
import { logChatEvent, bumpMetric } from "@/lib/logger";
import { runRagChat } from "@/lib/rag";
import { runAgentLoop } from "@/lib/agent";
export async function POST(req: Request) {
const requestId = crypto.randomUUID();
const start = Date.now();
const userId = getUserId(req);
const limited = await rateLimit(userId, Number(process.env.RATE_LIMIT_PER_MIN ?? 30));
if (!limited.ok) {
return Response.json({ error: "Rate limit exceeded" }, { status: 429 });
}
const { messages, mode = "rag" } = await req.json();
const cacheKey = buildCacheKey(messages, mode);
try {
const redis = await getRedis();
const cached = await redis.get(cacheKey);
if (cached) {
await bumpMetric("cache_hits");
await bumpMetric("requests");
logChatEvent({ requestId, userId, mode, cacheHit: true, latencyMs: Date.now() - start });
return Response.json({ ...JSON.parse(cached), cached: true });
}
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 30_000);
const result =
mode === "agent"
? await runAgentLoop(messages, { maxSteps: 8, signal: controller.signal })
: await runRagChat(messages, { signal: controller.signal });
clearTimeout(timeout);
await redis.set(cacheKey, JSON.stringify(result), { EX: 3600 });
await bumpMetric("requests");
logChatEvent({
requestId,
userId,
mode,
cacheHit: false,
latencyMs: Date.now() - start,
inputTokens: result.usage?.inputTokens,
outputTokens: result.usage?.outputTokens,
});
return Response.json({ ...result, cached: false });
} catch (err) {
const isTimeout = err instanceof Error && err.name === "AbortError";
logChatEvent({ requestId, userId, mode, error: isTimeout ? "timeout" : "upstream", latencyMs: Date.now() - start });
return Response.json(
{ error: isTimeout ? "Request timed out — try a shorter question." : "Service temporarily unavailable." },
{ status: isTimeout ? 504 : 502 },
);
}
}Copy runRagChat from Module 7 and runAgentLoop from Module 8 into lib/rag.ts and lib/agent.ts.
Step 6 — Metrics endpoint
// app/api/admin/metrics/route.ts
import { getRedis } from "@/lib/redis";
export async function GET() {
const redis = await getRedis();
const [requests, cacheHits] = await Promise.all([
redis.get("metrics:requests"),
redis.get("metrics:cache_hits"),
]);
const reqN = Number(requests ?? 0);
const hitN = Number(cacheHits ?? 0);
return Response.json({
requests: reqN,
cacheHits: hitN,
cacheHitRate: reqN ? hitN / reqN : 0,
});
}Protect with auth in real deploy — for the course, localhost-only is fine.
Step 7 — UI polish
app/chat-lab/page.tsx:
- Mode toggle: RAG vs Agent.
- Streaming tokens (optional SSE) for RAG answers.
- Citation chips from Module 7.
- Agent trace panel from Module 8.
- Badge “cached” when
response.cached === true. - Error banners for 429 / 504 — never blank screen.
Prove cache works:
- Ask "What is on-device AI?"
- Ask the same question again — second response faster + cached badge.
- Check server log:
cacheHit: true.
Prove rate limit:
Send 31 rapid requests (script or browser devtools) — 31st returns 429.
Step 8 — Error handling checklist
| Failure | Behavior |
|---|---|
| LLM timeout (30s) | 504 + user message |
| Wrong API key | 502 + log requestId (no key in log) |
| Redis down | Skip cache; log once; still answer |
| Agent tool API fail | Partial answer + warning in UI |
| RAG empty retrieval | "I don't have that in the docs" — no hallucination |
Test deliberately: set invalid API key → user sees error, log contains requestId for debugging.
Acceptance criteria
- RAG answers include at least one citation
- Agent mode completes a tool call with visible steps
- Repeat question hits cache (prove via log or UI badge)
- 31st rapid request returns 429
- Timeout shows user-friendly message, not blank screen
- README with architecture diagram + env setup + demo GIF
Deliverables
- Git repo (or folder) with working
/api/chat - README explaining cost/limit choices
- Optional: deploy to Vercel with Redis env vars
What's next
Congratulations — you completed the full AI: From Basics to GenAI track.
You moved from math intuition → ML → neural nets → transformers → RAG → agents → production. That arc is what strong junior AI engineer portfolios demonstrate.
Return to the AI course curriculum anytime, or revisit any module project to extend features.