What is an AI agent?
Before we begin
Module 7 taught you to build a RAG chatbot: one user question → retrieve docs → one LLM answer. That pattern is powerful — but many real tasks are not one shot.
“Plan a 4-day trip to Rome with kid-friendly activities under $2,000.”
A chatbot might guess an itinerary from training data. An agent will look up weather, check flight APIs, search events, notice rain on day 2, and revise the plan — across many LLM calls and tool executions.
An AI agent = an LLM inside a control loop that can act on the world (via tools), observe results, and continue until a goal is met or limits are hit.
Figure
The agent loop
What you will learn
- Define agent vs chatbot vs workflow vs plain LLM.
- Name every part of the observe–think–act loop.
- Walk through a worked example (travel planning) step by step.
- Decide when agents are worth the complexity — and when they are not.
- List failure modes you must design for in production.
Before this lesson
- Module 8 welcome — includes a key terms table (DAG, ReAct, MCP, etc.)
- Module 7 — LLM basics
- Module 7 — RAG chatbot project (helpful context)
Four ways to use an LLM (do not confuse them)
| Pattern | What happens | Example |
|---|---|---|
| Plain LLM | Prompt in → text out | “Explain what RAG means.” |
| RAG chatbot | Retrieve docs → prompt with context → answer | “What is our refund policy?” |
| Workflow | You fixed the steps; LLM may fill one node | Ingest PDF → chunk → embed → query (fixed pipeline — see DAG in welcome) |
| Agent | Model chooses next step within tool rules | “Book the cheapest dry-weather weekend for my family.” |
Key distinction: in a workflow, the engineer encoded the path. In an agent, the model discovers the path within constraints you define (available tools, max steps, safety rules).
Many production systems are hybrids: a workflow skeleton with agent nodes only where flexibility is needed.
Anatomy of an agent
Every agent system has these pieces:
| Component | Role |
|---|---|
| LLM | Reasons about the goal, picks tools, synthesizes final answer |
| Tools | Functions your code runs — APIs, DB queries, calculators, file reads |
| Control loop | Your app: call LLM → if tool request → execute → feed result back → repeat |
| Memory (optional) | Session history, user prefs, scratchpad state (Lesson 4) |
| Guardrails | Max iterations, timeouts, allowlist (only approved tools/paths), human approval (Lesson 8) |
Without the control loop, you only have a chatbot that talks about calling APIs. The loop is what makes it an agent.
The observe–think–act loop (detailed)
Each iteration of the loop:
- Observe — What does the model see? User message, prior tool results, memory snippets, system rules.
- Think — Model decides: answer now, or call a tool? (May emit “thought” text — see ReAct in Lesson 3.)
- Act — Your app runs the tool with validated arguments.
- Observe again — Tool result is appended to context. Loop continues.
Stop conditions:
- Model returns a final answer (no more tool calls).
- Max steps reached (e.g. 10 iterations).
- Timeout or token budget exhausted.
- Human approves or cancels (high-risk flows).
Worked example — “Weekend in Rome with kids”
User: “Plan Sat–Sun in Rome, kid-friendly, avoid rain if possible.”
| Step | Type | Content |
|---|---|---|
| 1 | Think | “I need weather for Rome this weekend.” |
| 2 | Act | get_weather(city="Rome", date="2025-06-28") |
| 3 | Observe | {"sat": "rain", "sun": "clear"} |
| 4 | Think | “Rain Saturday — indoor options; Sunday outdoor.” |
| 5 | Act | search_events(city="Rome", tags=["family"], day="sunday") |
| 6 | Observe | [{"name": "Explora Museum", ...}] |
| 7 | Think | “Enough data — draft itinerary.” |
| 8 | Final | Structured 2-day plan citing weather + events |
8 LLM-related steps vs 1 for a simple RAG question. That is the trade-off: flexibility and freshness vs latency and cost.
Agent vs workflow (exam style)
DAG = Directed Acyclic Graph — a fixed flowchart of steps (A → B → C) with no loops. A workflow is usually implemented as a DAG.
| Workflow | Agent | |
|---|---|---|
| Who picks the next step? | Engineer (fixed DAG) | Model (within tool set) |
| Predictability | High | Lower |
| Handles surprises | Poor — branch per edge case | Better — replan on failure |
| Latency | Often lower (fixed path) | Higher (variable steps) |
| Debugging | Easier — same path every time | Harder — traces vary |
| Best for | Compliance pipelines (e.g. verify identity step-by-step), data import jobs | Open-ended research, planning, coding |
Interview answer: “We use a workflow when steps are known and audited; we add an agent when the user goal is underspecified or depends on live tool results.”
When agents help
Use an agent when all or most of these are true:
- Task needs live external data (weather, inventory, prices, calendar).
- Intermediate results change the next step (rain → indoor activities).
- User goal is underspecified (“something fun near me”).
- Steps are hard to enumerate in advance (debugging code, research).
When not to use an agent
| Situation | Better approach |
|---|---|
| Single doc Q&A with good retrieval | Module 7 RAG — one retrieval + one LLM call |
| Strict regulatory sequence (every step audited) | Deterministic workflow + human sign-off |
| Sub-200ms latency requirement | Precomputed answers or tiny model, not multi-step agent |
| Tiny tool set, fixed order always | Workflow with one LLM summarization node |
| Cost-sensitive at huge scale | Cache tool results; agent only when cache misses |
Rule of thumb: if you can draw a fixed flowchart that covers 95% of cases, start with a workflow. Add agentic loops only where the chart would have dozens of “if API fails” branches.
How agents differ from “plain LLMs”
A plain LLM (Module 7) predicts text from context. It has no guaranteed access to:
- Today’s weather
- Your private database
- Whether a file exists on disk
It may hallucinate those facts. Agents reduce that risk by grounding decisions in tool observations — though the model can still misinterpret tool output.
| Plain LLM | Agent | |
|---|---|---|
| Facts | From training cutoff + prompt | From tool results when used correctly |
| Actions | Text only | Can trigger real side effects |
| Multi-step | Simulated in one reply | Actually executed step by step |
| Risk | Wrong facts | Wrong facts and wrong API calls |
Failure modes (design for these on day one)
Runaway loops
Model keeps calling tools without finishing — burns tokens and money.
Mitigation: max_iterations=10, cost cap per session, detect repeated identical tool calls.
Wrong tool arguments
get_weather(city="Pariss") → 404. Model may retry or give up.
Mitigation: validate args in your code before HTTP; return clear error strings to the model (Lesson 2).
Silent tool failures
API timeout returns empty; model invents data.
Mitigation: always return structured errors; never swallow exceptions without logging.
Over-trust
User assumes the agent “checked” flights when it only guessed.
Mitigation: show reasoning trace and which tools ran; cite tool timestamps in UI (your travel project).
Prompt injection via tools
Malicious webpage content in a fetch_url tool becomes part of context.
Mitigation: sanitize tool outputs; separate system rules from untrusted content.
Real-world agent examples
| Domain | Goal | Typical tools |
|---|---|---|
| Travel | Itinerary | Weather, maps, events, hotels |
| Coding | Fix bug | Read file, run tests, edit, git |
| Support | Resolve ticket | CRM lookup, refund API, knowledge base |
| Analytics | Answer data question | SQL, chart render, export CSV |
Your Module 8 project is travel — the same loop pattern applies everywhere.
Check yourself
Before Lesson 2, you should be able to answer:
- Name the four components of an agent (LLM, tools, loop, …).
- Why is RAG alone not enough for “plan my trip”?
- What stops an agent from looping forever?
- Give one case where a workflow beats an agent.
What's next
Lesson 2 — Tools & function calling — how the model actually requests actions and how your app runs them safely.