← Back to curriculum

Module 8 — Agentic AI

MCP & context engineering

Model Context Protocol servers and clients, context stack layers, token budgeting, handoffs, and security.

~90 min read + exercises

MCP & context engineering

Before we begin

Lesson 2 taught ad-hoc tools — JSON schemas you register per app. Lesson 4 taught memory. This lesson connects both: what the model sees each turn and how Model Context Protocol (MCP) standardizes access to files, APIs, and databases across tools.

Context engineering is the discipline of assembling the right information into the prompt — every iteration — without blowing the token budget or leaking secrets.

Figure

Context layers per agent turn

LLMtool JSONAppvalidateAPIweatherLLManswer
System + task + memory + retrieval + tool history — assembled deliberately.

What you will learn

  • Explain MCP servers, clients, resources, and tools.
  • Build a context stack for each agent iteration.
  • Apply token budgeting and rolling summaries.
  • Design multi-agent handoffs with structured state.
  • Secure MCP and tool access with least privilege.

Before this lesson


What is MCP?

Model Context Protocol (MCP) is an open standard for connecting AI applications to external systems through a common interface.

PieceRole
MCP serverExposes tools (actions) and resources (readable data)
MCP clientYour agent host — discovers capabilities, invokes tools
TransportOften stdio (local process pipes) or HTTP/SSE (remote server pushing events over the web)

Examples of MCP servers:

  • Filesystem — read/write project files (with path allowlists)
  • GitHub — issues, PRs, code search
  • Postgres — read-only SQL with guardrails
  • Slack — post messages (with channel allowlist — only approved channels)

Why MCP vs hand-rolled tools?

  • Reuse — same server works in Claude Desktop, Cursor, your Next.js API
  • Discovery — client lists tools at runtime
  • Ecosystem — community servers for common SaaS

You still validate every call in production — MCP does not replace your security layer.


MCP mental model

text
Your Agent App (MCP client)
    ├── connects to → Weather MCP server (custom or HTTP wrapper)
    ├── connects to → Filesystem MCP server
    └── connects to → Postgres MCP server (read-only role)
 
Each turn:
  1. Client lists available tools from connected servers
  2. LLM picks tool + server **namespace** (a prefix that groups tools, e.g. `weather/` vs `maps/`)
  3. Client routes invocation to correct MCP server
  4. Result returns as observation text

Namespacing: weather/get_forecast vs maps/geocode avoids collisions when merging many servers.


Context engineering — the full stack

Each LLM call should assemble layers in a consistent order:

LayerContentsTypical size
1. SystemRole, safety rules, output format, tool rules200–800 tokens
2. PoliciesCompany travel policy (RAG snippet) if needed0–1500 tokens
3. Long-term memoryUser profile fields50–300 tokens
4. ScratchpadStructured session state (JSON)100–500 tokens
5. PlanCurrent planner steps100–400 tokens
6. Recent messagesLast few user/assistant turnsvariable
7. Tool traceSummarized observations from this sessionvariable
8. Current task“Execute step 2: …”50–200 tokens

Anti-pattern: paste entire chat + all tool JSON since session start — quality drops as models attend to noise.


Token budgeting (worked example)

Assume 8k token context budget for a small model route:

ReserveTokens
Model output1,500
System + policies1,200
Memory + scratchpad400
Remaining for history + tools4,900

If tool trace exceeds 3k tokens → summarize older observations into 300 tokens before the next call.

text
[Tool history summary]
Weather: Sat rain 19°C, Sun clear 26°C.
Places: Explora Museum, Vatican (hours not verified).
 
[Current step only]
search_places(city=Rome, tags=family, indoor=true)

Module 10 covers cost dashboards; this lesson is per-call discipline.


Rolling summary pattern

Every N tool calls (e.g. N=5):

  1. Call a cheap model or heuristic: “Summarize these observations in ≤200 tokens.”
  2. Replace raw observations 1–5 with the summary block.
  3. Keep observations 6+ in full until next summary.

Users keep continuity; context stays bounded.


MCP + memory agent loop (end-to-end)

text
1. User: "Plan Rome weekend — remember I prefer trains"
2. Load long-term memory → transport_pref: train
3. MCP client: list tools (weather, maps, places, calendar)
4. Planner LLM → JSON plan (no tool execution)
5. For each plan step:
     a. Build context stack (system + memory + scratchpad + step)
     b. Executor LLM → MCP tool call
     c. MCP server runs → observation
     d. Update scratchpad + trim context
6. Final synthesizer LLM → user-facing itinerary
7. Optionally persist trip summary to long-term vector memory

This is the architecture your travel project implements — with or without formal MCP libraries (you can wrap APIs as MCP-style tools first).


Multi-agent context handoffs

A handoff is when one agent finishes its slice of work and passes a compact packet to the next agent.

Bad handoff: entire raw trace (50k tokens, conflicting instructions).

Good handoff — structured packet:

json
{
  "goal": "2-day Rome itinerary, kid-friendly",
  "constraints": ["train preferred", "vegetarian food"],
  "facts": {
    "weather": {"sat": "rain", "sun": "clear"},
    "venues": ["Explora Museum"]
  },
  "open_questions": ["budget not confirmed"],
  "completed_step_ids": [1, 2]
}

Agent B gets a fresh system prompt for writing + the packet — not A’s full chain-of-thought.

LangGraph implements handoffs as state updates on graph edges (Lesson 6).


Context engineering for RAG + agents

When the agent needs company policy:

  • Retrieve top-3 chunks (Module 7) only for this turn
  • Insert as ## Policy context between system and memory
  • Do not mix policy chunks with tool JSON in one unlabeled blob

Citation rule in system prompt: “If policy context conflicts with user request, cite policy and escalate.”


Security

RiskMitigation
MCP filesystem reads /etc/passwdPath allowlist — only files under an approved folder
SQL MCP writesRead-only DB user; no DDL
Tool output injectionWrap untrusted text in delimiters; tell model “untrusted”
Cross-user memoryEnforce user_id on every memory fetch
Secret exfiltrationBlock tools that POST to arbitrary URLs

Log: user_id, tool_name, server, latency_ms, success — not full PII (personal data) payloads.


MCP in development vs production

DevProduction
Local stdio MCP servers on laptopHosted MCP gateways with auth
Broad filesystem accessSandboxed per tenant
Console loggingStructured traces (Lesson 8)

Start with two tools and one MCP server; expand after evals pass (Lesson 7).


Check yourself

  1. List five context layers in order.
  2. Why rolling summaries?
  3. What belongs in a handoff packet vs full chat history?
  4. How does MCP differ from registering JSON tools directly?

What's next

Lesson 6 — Multi-agent systems & orchestration