Multi-agent systems & orchestration

Before we begin

One agent with fifteen tools and a 3,000-token system prompt will pick wrong tools, ignore rules at the bottom of the prompt, and cost a fortune each turn.

Multi-agent design gives narrow roles — each with fewer tools, clearer instructions, and separate evals.

Figure

Travel planner — multi-agent pipeline

User → planner → executor → memory → UI trace (your project architecture).

What you will learn

Compare single-agent vs multi-agent architectures.
Use LangChain building blocks and LangGraph for cycles and state.
Implement handoffs, shared state, and conditional edges.
Know when multi-agent is overkill.
Instrument observability per agent node.

Before this lesson

Why multiple agents?

Problem with one mega-agent	Multi-agent fix
Tool selection errors	Executor only sees 3–5 tools
Prompt too long	Split rules per role
Hard to debug	Trace per node
Cannot A/B one role	Swap planner model only
Different model tiers	Cheap planner, strong executor

Cost trade-off: more LLM calls — justify with higher success rate on eval set (Lesson 7).

Common multi-agent patterns

Pattern	Flow	Example
Sequential	A → B → C	Research → Write → Edit
Planner / workers	Manager assigns subtasks	Plan → execute weather, execute places
Handoff / triage	Router picks specialist	Support → billing vs technical
Supervisor	Boss reviews worker output	Critic agent rejects weak itinerary
Debate	Two agents critique	High cost — rare in production

Travel project (recommended):

Planner — JSON steps, no API keys in prompt
Executor — tools only
Synthesizer (optional third) — final prose from scratchpad
Memory service — sidecar read/write, not an LLM

Figure

Planner + executor split

Same split as Lesson 3 — now explicit multi-agent deployment.

LangChain — building blocks

LangChain provides composable pieces:

Piece	Use
Chat models	OpenAI, Anthropic, Ollama wrappers
Prompt templates	Versioned planner/executor prompts
Tools	`@tool` decorators wrapping your Python functions
Retrievers	Plug Module 7 RAG into an agent
Runnable sequences	`prompt

Good for: linear pipelines, quick prototypes, simple tool loops.

python

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
 
planner_prompt = ChatPromptTemplate.from_messages([
    ("system", "Output JSON plan only. Tools hints: {tool_hints}"),
    ("human", "{goal}"),
])
planner = planner_prompt | ChatOpenAI(model="gpt-4.1-mini")

LangChain alone struggles with arbitrary cycles (retry → replan → execute again) — that is where LangGraph helps.

LangGraph — control flow as a graph

LangGraph models agents as a state machine — a graph of steps where each step updates shared state (data):

Nodes = functions (planner, executor, synthesize)
Edges = fixed or conditional transitions
State = shared dict (scratchpad, messages, plan, trace)

Why graphs matter: agents need cycles. Linear chains cannot say “if tool failed, go back to planner.”

Conceptual graph (travel)

python

from langgraph.graph import StateGraph, END
 
graph = StateGraph(TravelState)
graph.add_node("plan", plan_node)
graph.add_node("execute", execute_node)
graph.add_node("synthesize", synthesize_node)
 
graph.set_entry_point("plan")
graph.add_edge("plan", "execute")
graph.add_conditional_edges(
    "execute",
    route_after_execute,
    {
        "continue": "execute",   # more steps in plan
        "replan": "plan",        # failure — new plan
        "done": "synthesize",
    },
)
graph.add_edge("synthesize", END)
app = graph.compile()

Install: pip install langgraph langchain-openai

Shared state (typed sketch)

python

from typing import TypedDict, Annotated
import operator
 
class TravelState(TypedDict):
    user_goal: str
    plan_steps: list
    current_step_idx: int
    scratchpad: dict
    trace: Annotated[list, operator.add]  # append-only log
    final_answer: str

Each node returns partial updates to state — LangGraph merges them. This is the scratchpad pattern from Lesson 4, formalized.

Handoffs between agents

A handoff passes structured state from one agent role to the next (see welcome glossary).

Handoff rules:

Pass structured state, not full message history.
Receiving agent gets role-specific system prompt only.
Include completed_step_ids so work is not duplicated.
Log handoff as trace event for debugging.

Triage example:

text

Router agent → {"specialist": "executor", "reason": "needs live weather"}

Router has no weather tools — cannot hallucinate forecasts.

When multi-agent beats one agent

Yes when:

Tool catalog > ~8 tools or multiple domains (calendar + maps + email).
Eval shows systematic tool misuse with single prompt.
Teams maintain separate prompts per squad (content vs data).

No when:

Simple RAG Q&A — Module 7 architecture wins.
Fixed 3-step pipeline always — use workflow, not debate agents.
Latency budget < 2s — multi-agent overhead hurts.

Observability per node

Log for each graph node execution:

json

{
  "node": "execute",
  "step_idx": 2,
  "llm_tokens_in": 890,
  "llm_tokens_out": 120,
  "tool_calls": [{"name": "get_weather", "ok": true, "ms": 340}],
  "state_snapshot_hash": "a1b2c3"
}

Dashboards: per-node error rate, average steps to success, cost per successful trip plan.

“Agent went in circles” → inspect trace for execute → replan → execute loops without scratchpad updates.

Orchestration glossary (interview)

Term	Meaning
Orchestration	Running multiple LLM calls, tools, and state updates in the right order
Agent	LLM + loop + tools
Multi-agent	Multiple roles with handoffs
LangGraph	Graph-based orchestration with cycles
State	Shared scratchpad across nodes

Travel project mapping

Project component	Lesson concept
Planner API route	`plan` node
Tool runner	`execute` node
MongoDB prefs	Memory sidecar
UI reasoning trace	`trace` list in state
Retry on 404 city	conditional edge to `execute` or `plan`

Check yourself

Draw a 3-node graph for plan → execute → synthesize.
When would you add a replan edge?
LangChain vs LangGraph — one sentence each.
Why is shared state better than passing full chat between agents?

What's next

Lesson 7 — Evals for AI apps