Module 10 quiz and review

Before we begin

Test serving, caching, rate limits, cost, latency, monitoring, and production trade-offs before the capstone project. Aim for at least 19 out of 25.

Multiple choice quiz

Interactive quiz

Pick one answer per question. Feedback appears immediately — take your time before clicking.

0 / 25 correct·0 answered

Question 1 of 25
Model serving in a GenAI product usually means:
Answer options for question 1
Question 2 of 25
Why do many teams start with a hosted LLM API before self-hosting?
Answer options for question 2
Question 3 of 25
Why is caching important in LLM apps?
Answer options for question 3
Question 4 of 25
A good Redis cache key for a RAG answer might include:
Answer options for question 4
Question 5 of 25
Rate limiting an LLM API route primarily protects against:
Answer options for question 5
Question 6 of 25
A token bucket rate limiter allows:
Answer options for question 6
Question 7 of 25
How can you reduce token cost in production?
Answer options for question 7
Question 8 of 25
Prompt caching (provider-side) helps when:
Answer options for question 8
Question 9 of 25
What is often the latency bottleneck in GenAI apps?
Answer options for question 9
Question 10 of 25
Streaming responses to the UI mainly improves:
Answer options for question 10
Question 11 of 25
Production monitoring for LLM apps should track:
Answer options for question 11
Question 12 of 25
Structured logging for each chat request should include:
Answer options for question 12
Question 13 of 25
When an upstream LLM API times out, good error handling:
Answer options for question 13
Question 14 of 25
Graceful degradation might mean:
Answer options for question 14
Question 15 of 25
The Module 10 capstone combines RAG + agents + UI with production concerns. The main goal is:
Answer options for question 15
Question 16 of 25
LLM API keys should live:
Answer options for question 16
Question 17 of 25
You re-index blog posts for RAG. Cache keys must:
Answer options for question 17
Question 18 of 25
HTTP 429 Too Many Requests tells the client:
Answer options for question 18
Question 19 of 25
Output tokens are often more expensive per token because:
Answer options for question 19
Question 20 of 25
Model routing in production might:
Answer options for question 20
Question 21 of 25
Time to first token (TTFT) measures:
Answer options for question 21
Question 22 of 25
A sudden drop in cache hit ratio after deploy might indicate:
Answer options for question 22
Question 23 of 25
Exponential backoff on retries means:
Answer options for question 23
Question 24 of 25
Streaming LLM responses to the browser requires:
Answer options for question 24
Question 25 of 25
Including requestId in logs and user-facing errors helps:
Answer options for question 25

After the quiz

19/25 or higher? Start the production capstone project.

Checklist:

I can explain why caching matters for LLM apps.
I know three ways to reduce token cost.
I can name the usual GenAI latency bottlenecks.
I know what to log (and not log) per request.

What's next

Project: production-ready GenAI app