MCP & context engineering

Before we begin

Lesson 2 taught ad-hoc tools — JSON schemas you register per app. Lesson 4 taught memory. This lesson connects both: what the model sees each turn and how Model Context Protocol (MCP) standardizes access to files, APIs, and databases across tools.

Context engineering is the discipline of assembling the right information into the prompt — every iteration — without blowing the token budget or leaking secrets.

Figure

Context layers per agent turn

System + task + memory + retrieval + tool history — assembled deliberately.

What you will learn

Explain MCP servers, clients, resources, and tools.
Build a context stack for each agent iteration.
Apply token budgeting and rolling summaries.
Design multi-agent handoffs with structured state.
Secure MCP and tool access with least privilege.

Before this lesson

What is MCP?

Model Context Protocol (MCP) is an open standard for connecting AI applications to external systems through a common interface.

Piece	Role
MCP server	Exposes tools (actions) and resources (readable data)
MCP client	Your agent host — discovers capabilities, invokes tools
Transport	Often stdio (local process pipes) or HTTP/SSE (remote server pushing events over the web)

Examples of MCP servers:

Filesystem — read/write project files (with path allowlists)
GitHub — issues, PRs, code search
Postgres — read-only SQL with guardrails
Slack — post messages (with channel allowlist — only approved channels)

Why MCP vs hand-rolled tools?

Reuse — same server works in Claude Desktop, Cursor, your Next.js API
Discovery — client lists tools at runtime
Ecosystem — community servers for common SaaS

You still validate every call in production — MCP does not replace your security layer.

MCP mental model

text

Your Agent App (MCP client)
    ├── connects to → Weather MCP server (custom or HTTP wrapper)
    ├── connects to → Filesystem MCP server
    └── connects to → Postgres MCP server (read-only role)
 
Each turn:
  1. Client lists available tools from connected servers
  2. LLM picks tool + server **namespace** (a prefix that groups tools, e.g. `weather/` vs `maps/`)
  3. Client routes invocation to correct MCP server
  4. Result returns as observation text

Namespacing: weather/get_forecast vs maps/geocode avoids collisions when merging many servers.

Context engineering — the full stack

Each LLM call should assemble layers in a consistent order:

Layer	Contents	Typical size
1. System	Role, safety rules, output format, tool rules	200–800 tokens
2. Policies	Company travel policy (RAG snippet) if needed	0–1500 tokens
3. Long-term memory	User profile fields	50–300 tokens
4. Scratchpad	Structured session state (JSON)	100–500 tokens
5. Plan	Current planner steps	100–400 tokens
6. Recent messages	Last few user/assistant turns	variable
7. Tool trace	Summarized observations from this session	variable
8. Current task	“Execute step 2: …”	50–200 tokens

Anti-pattern: paste entire chat + all tool JSON since session start — quality drops as models attend to noise.

Token budgeting (worked example)

Assume 8k token context budget for a small model route:

Reserve	Tokens
Model output	1,500
System + policies	1,200
Memory + scratchpad	400
Remaining for history + tools	4,900

If tool trace exceeds 3k tokens → summarize older observations into 300 tokens before the next call.

text

[Tool history summary]
Weather: Sat rain 19°C, Sun clear 26°C.
Places: Explora Museum, Vatican (hours not verified).
 
[Current step only]
search_places(city=Rome, tags=family, indoor=true)

Module 10 covers cost dashboards; this lesson is per-call discipline.

Rolling summary pattern

Every N tool calls (e.g. N=5):

Call a cheap model or heuristic: “Summarize these observations in ≤200 tokens.”
Replace raw observations 1–5 with the summary block.
Keep observations 6+ in full until next summary.

Users keep continuity; context stays bounded.

MCP + memory agent loop (end-to-end)

text

1. User: "Plan Rome weekend — remember I prefer trains"
2. Load long-term memory → transport_pref: train
3. MCP client: list tools (weather, maps, places, calendar)
4. Planner LLM → JSON plan (no tool execution)
5. For each plan step:
     a. Build context stack (system + memory + scratchpad + step)
     b. Executor LLM → MCP tool call
     c. MCP server runs → observation
     d. Update scratchpad + trim context
6. Final synthesizer LLM → user-facing itinerary
7. Optionally persist trip summary to long-term vector memory

This is the architecture your travel project implements — with or without formal MCP libraries (you can wrap APIs as MCP-style tools first).

Multi-agent context handoffs

A handoff is when one agent finishes its slice of work and passes a compact packet to the next agent.

Bad handoff: entire raw trace (50k tokens, conflicting instructions).

Good handoff — structured packet:

json

{
  "goal": "2-day Rome itinerary, kid-friendly",
  "constraints": ["train preferred", "vegetarian food"],
  "facts": {
    "weather": {"sat": "rain", "sun": "clear"},
    "venues": ["Explora Museum"]
  },
  "open_questions": ["budget not confirmed"],
  "completed_step_ids": [1, 2]
}

Agent B gets a fresh system prompt for writing + the packet — not A’s full chain-of-thought.

LangGraph implements handoffs as state updates on graph edges (Lesson 6).

Context engineering for RAG + agents

When the agent needs company policy:

Retrieve top-3 chunks (Module 7) only for this turn
Insert as ## Policy context between system and memory
Do not mix policy chunks with tool JSON in one unlabeled blob

Citation rule in system prompt: “If policy context conflicts with user request, cite policy and escalate.”

Security

Risk	Mitigation
MCP filesystem reads `/etc/passwd`	Path allowlist — only files under an approved folder
SQL MCP writes	Read-only DB user; no DDL
Tool output injection	Wrap untrusted text in delimiters; tell model “untrusted”
Cross-user memory	Enforce `user_id` on every memory fetch
Secret exfiltration	Block tools that POST to arbitrary URLs

Log: user_id, tool_name, server, latency_ms, success — not full PII (personal data) payloads.

MCP in development vs production

Dev	Production
Local stdio MCP servers on laptop	Hosted MCP gateways with auth
Broad filesystem access	Sandboxed per tenant
Console logging	Structured traces (Lesson 8)

Start with two tools and one MCP server; expand after evals pass (Lesson 7).

Check yourself

List five context layers in order.
Why rolling summaries?
What belongs in a handoff packet vs full chat history?
How does MCP differ from registering JSON tools directly?

What's next

Lesson 6 — Multi-agent systems & orchestration