← Back to curriculum

Module 10 — Production & scaling

Module 10 quiz & review

25 interactive questions on serving, caching, rate limits, cost, latency, monitoring, and production trade-offs.

~55 min read + exercises

Module 10 quiz and review

Before we begin

Test serving, caching, rate limits, cost, latency, monitoring, and production trade-offs before the capstone project. Aim for at least 19 out of 25.


Multiple choice quiz

Interactive quiz

Pick one answer per question. Feedback appears immediately — take your time before clicking.

0 / 25 correct·0 answered
  1. Question 1 of 25

    Model serving in a GenAI product usually means:

    Answer options for question 1
  2. Question 2 of 25

    Why do many teams start with a hosted LLM API before self-hosting?

    Answer options for question 2
  3. Question 3 of 25

    Why is caching important in LLM apps?

    Answer options for question 3
  4. Question 4 of 25

    A good Redis cache key for a RAG answer might include:

    Answer options for question 4
  5. Question 5 of 25

    Rate limiting an LLM API route primarily protects against:

    Answer options for question 5
  6. Question 6 of 25

    A token bucket rate limiter allows:

    Answer options for question 6
  7. Question 7 of 25

    How can you reduce token cost in production?

    Answer options for question 7
  8. Question 8 of 25

    Prompt caching (provider-side) helps when:

    Answer options for question 8
  9. Question 9 of 25

    What is often the latency bottleneck in GenAI apps?

    Answer options for question 9
  10. Question 10 of 25

    Streaming responses to the UI mainly improves:

    Answer options for question 10
  11. Question 11 of 25

    Production monitoring for LLM apps should track:

    Answer options for question 11
  12. Question 12 of 25

    Structured logging for each chat request should include:

    Answer options for question 12
  13. Question 13 of 25

    When an upstream LLM API times out, good error handling:

    Answer options for question 13
  14. Question 14 of 25

    Graceful degradation might mean:

    Answer options for question 14
  15. Question 15 of 25

    The Module 10 capstone combines RAG + agents + UI with production concerns. The main goal is:

    Answer options for question 15
  16. Question 16 of 25

    LLM API keys should live:

    Answer options for question 16
  17. Question 17 of 25

    You re-index blog posts for RAG. Cache keys must:

    Answer options for question 17
  18. Question 18 of 25

    HTTP 429 Too Many Requests tells the client:

    Answer options for question 18
  19. Question 19 of 25

    Output tokens are often more expensive per token because:

    Answer options for question 19
  20. Question 20 of 25

    Model routing in production might:

    Answer options for question 20
  21. Question 21 of 25

    Time to first token (TTFT) measures:

    Answer options for question 21
  22. Question 22 of 25

    A sudden drop in cache hit ratio after deploy might indicate:

    Answer options for question 22
  23. Question 23 of 25

    Exponential backoff on retries means:

    Answer options for question 23
  24. Question 24 of 25

    Streaming LLM responses to the browser requires:

    Answer options for question 24
  25. Question 25 of 25

    Including requestId in logs and user-facing errors helps:

    Answer options for question 25

After the quiz

19/25 or higher? Start the production capstone project.

Checklist:

  • I can explain why caching matters for LLM apps.
  • I know three ways to reduce token cost.
  • I can name the usual GenAI latency bottlenecks.
  • I know what to log (and not log) per request.

What's next

Project: production-ready GenAI app