Question 1 of 25
Model serving in a GenAI product usually means:
Question 2 of 25
Why do many teams start with a hosted LLM API before self-hosting?
Question 3 of 25
Why is caching important in LLM apps?
Question 4 of 25
A good Redis cache key for a RAG answer might include:
Question 5 of 25
Rate limiting an LLM API route primarily protects against:
Question 6 of 25
A token bucket rate limiter allows:
Question 7 of 25
How can you reduce token cost in production?
Question 8 of 25
Prompt caching (provider-side) helps when:
Question 9 of 25
What is often the latency bottleneck in GenAI apps?
Question 10 of 25
Streaming responses to the UI mainly improves:
Question 11 of 25
Production monitoring for LLM apps should track:
Question 12 of 25
Structured logging for each chat request should include:
Question 13 of 25
When an upstream LLM API times out, good error handling:
Question 14 of 25
Graceful degradation might mean:
Question 15 of 25
The Module 10 capstone combines RAG + agents + UI with production concerns. The main goal is:
Question 16 of 25
LLM API keys should live:
Question 17 of 25
You re-index blog posts for RAG. Cache keys must:
Question 18 of 25
HTTP 429 Too Many Requests tells the client:
Question 19 of 25
Output tokens are often more expensive per token because:
Question 20 of 25
Model routing in production might:
Question 21 of 25
Time to first token (TTFT) measures:
Question 22 of 25
A sudden drop in cache hit ratio after deploy might indicate:
Question 23 of 25
Exponential backoff on retries means:
Question 24 of 25
Streaming LLM responses to the browser requires:
Question 25 of 25
Including requestId in logs and user-facing errors helps: