Multi-agent systems & orchestration
Before we begin
One agent with fifteen tools and a 3,000-token system prompt will pick wrong tools, ignore rules at the bottom of the prompt, and cost a fortune each turn.
Multi-agent design gives narrow roles — each with fewer tools, clearer instructions, and separate evals.
Figure
Travel planner — multi-agent pipeline
What you will learn
- Compare single-agent vs multi-agent architectures.
- Use LangChain building blocks and LangGraph for cycles and state.
- Implement handoffs, shared state, and conditional edges.
- Know when multi-agent is overkill.
- Instrument observability per agent node.
Before this lesson
Why multiple agents?
| Problem with one mega-agent | Multi-agent fix |
|---|---|
| Tool selection errors | Executor only sees 3–5 tools |
| Prompt too long | Split rules per role |
| Hard to debug | Trace per node |
| Cannot A/B one role | Swap planner model only |
| Different model tiers | Cheap planner, strong executor |
Cost trade-off: more LLM calls — justify with higher success rate on eval set (Lesson 7).
Common multi-agent patterns
| Pattern | Flow | Example |
|---|---|---|
| Sequential | A → B → C | Research → Write → Edit |
| Planner / workers | Manager assigns subtasks | Plan → execute weather, execute places |
| Handoff / triage | Router picks specialist | Support → billing vs technical |
| Supervisor | Boss reviews worker output | Critic agent rejects weak itinerary |
| Debate | Two agents critique | High cost — rare in production |
Travel project (recommended):
- Planner — JSON steps, no API keys in prompt
- Executor — tools only
- Synthesizer (optional third) — final prose from scratchpad
- Memory service — sidecar read/write, not an LLM
Figure
Planner + executor split
LangChain — building blocks
LangChain provides composable pieces:
| Piece | Use |
|---|---|
| Chat models | OpenAI, Anthropic, Ollama wrappers |
| Prompt templates | Versioned planner/executor prompts |
| Tools | @tool decorators wrapping your Python functions |
| Retrievers | Plug Module 7 RAG into an agent |
| Runnable sequences | `prompt |
Good for: linear pipelines, quick prototypes, simple tool loops.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
planner_prompt = ChatPromptTemplate.from_messages([
("system", "Output JSON plan only. Tools hints: {tool_hints}"),
("human", "{goal}"),
])
planner = planner_prompt | ChatOpenAI(model="gpt-4.1-mini")LangChain alone struggles with arbitrary cycles (retry → replan → execute again) — that is where LangGraph helps.
LangGraph — control flow as a graph
LangGraph models agents as a state machine — a graph of steps where each step updates shared state (data):
- Nodes = functions (planner, executor, synthesize)
- Edges = fixed or conditional transitions
- State = shared dict (scratchpad, messages, plan, trace)
Why graphs matter: agents need cycles. Linear chains cannot say “if tool failed, go back to planner.”
Conceptual graph (travel)
from langgraph.graph import StateGraph, END
graph = StateGraph(TravelState)
graph.add_node("plan", plan_node)
graph.add_node("execute", execute_node)
graph.add_node("synthesize", synthesize_node)
graph.set_entry_point("plan")
graph.add_edge("plan", "execute")
graph.add_conditional_edges(
"execute",
route_after_execute,
{
"continue": "execute", # more steps in plan
"replan": "plan", # failure — new plan
"done": "synthesize",
},
)
graph.add_edge("synthesize", END)
app = graph.compile()Install: pip install langgraph langchain-openai
Shared state (typed sketch)
from typing import TypedDict, Annotated
import operator
class TravelState(TypedDict):
user_goal: str
plan_steps: list
current_step_idx: int
scratchpad: dict
trace: Annotated[list, operator.add] # append-only log
final_answer: strEach node returns partial updates to state — LangGraph merges them. This is the scratchpad pattern from Lesson 4, formalized.
Handoffs between agents
A handoff passes structured state from one agent role to the next (see welcome glossary).
Handoff rules:
- Pass structured state, not full message history.
- Receiving agent gets role-specific system prompt only.
- Include
completed_step_idsso work is not duplicated. - Log handoff as trace event for debugging.
Triage example:
Router agent → {"specialist": "executor", "reason": "needs live weather"}Router has no weather tools — cannot hallucinate forecasts.
When multi-agent beats one agent
Yes when:
- Tool catalog > ~8 tools or multiple domains (calendar + maps + email).
- Eval shows systematic tool misuse with single prompt.
- Teams maintain separate prompts per squad (content vs data).
No when:
- Simple RAG Q&A — Module 7 architecture wins.
- Fixed 3-step pipeline always — use workflow, not debate agents.
- Latency budget < 2s — multi-agent overhead hurts.
Observability per node
Log for each graph node execution:
{
"node": "execute",
"step_idx": 2,
"llm_tokens_in": 890,
"llm_tokens_out": 120,
"tool_calls": [{"name": "get_weather", "ok": true, "ms": 340}],
"state_snapshot_hash": "a1b2c3"
}Dashboards: per-node error rate, average steps to success, cost per successful trip plan.
“Agent went in circles” → inspect trace for execute → replan → execute loops without scratchpad updates.
Orchestration glossary (interview)
| Term | Meaning |
|---|---|
| Orchestration | Running multiple LLM calls, tools, and state updates in the right order |
| Agent | LLM + loop + tools |
| Multi-agent | Multiple roles with handoffs |
| LangGraph | Graph-based orchestration with cycles |
| State | Shared scratchpad across nodes |
Travel project mapping
| Project component | Lesson concept |
|---|---|
| Planner API route | plan node |
| Tool runner | execute node |
| MongoDB prefs | Memory sidecar |
| UI reasoning trace | trace list in state |
| Retry on 404 city | conditional edge to execute or plan |
Check yourself
- Draw a 3-node graph for plan → execute → synthesize.
- When would you add a
replanedge? - LangChain vs LangGraph — one sentence each.
- Why is shared state better than passing full chat between agents?