MCP & context engineering
Before we begin
Lesson 2 taught ad-hoc tools — JSON schemas you register per app. Lesson 4 taught memory. This lesson connects both: what the model sees each turn and how Model Context Protocol (MCP) standardizes access to files, APIs, and databases across tools.
Context engineering is the discipline of assembling the right information into the prompt — every iteration — without blowing the token budget or leaking secrets.
Figure
Context layers per agent turn
What you will learn
- Explain MCP servers, clients, resources, and tools.
- Build a context stack for each agent iteration.
- Apply token budgeting and rolling summaries.
- Design multi-agent handoffs with structured state.
- Secure MCP and tool access with least privilege.
Before this lesson
What is MCP?
Model Context Protocol (MCP) is an open standard for connecting AI applications to external systems through a common interface.
| Piece | Role |
|---|---|
| MCP server | Exposes tools (actions) and resources (readable data) |
| MCP client | Your agent host — discovers capabilities, invokes tools |
| Transport | Often stdio (local process pipes) or HTTP/SSE (remote server pushing events over the web) |
Examples of MCP servers:
- Filesystem — read/write project files (with path allowlists)
- GitHub — issues, PRs, code search
- Postgres — read-only SQL with guardrails
- Slack — post messages (with channel allowlist — only approved channels)
Why MCP vs hand-rolled tools?
- Reuse — same server works in Claude Desktop, Cursor, your Next.js API
- Discovery — client lists tools at runtime
- Ecosystem — community servers for common SaaS
You still validate every call in production — MCP does not replace your security layer.
MCP mental model
Your Agent App (MCP client)
├── connects to → Weather MCP server (custom or HTTP wrapper)
├── connects to → Filesystem MCP server
└── connects to → Postgres MCP server (read-only role)
Each turn:
1. Client lists available tools from connected servers
2. LLM picks tool + server **namespace** (a prefix that groups tools, e.g. `weather/` vs `maps/`)
3. Client routes invocation to correct MCP server
4. Result returns as observation textNamespacing: weather/get_forecast vs maps/geocode avoids collisions when merging many servers.
Context engineering — the full stack
Each LLM call should assemble layers in a consistent order:
| Layer | Contents | Typical size |
|---|---|---|
| 1. System | Role, safety rules, output format, tool rules | 200–800 tokens |
| 2. Policies | Company travel policy (RAG snippet) if needed | 0–1500 tokens |
| 3. Long-term memory | User profile fields | 50–300 tokens |
| 4. Scratchpad | Structured session state (JSON) | 100–500 tokens |
| 5. Plan | Current planner steps | 100–400 tokens |
| 6. Recent messages | Last few user/assistant turns | variable |
| 7. Tool trace | Summarized observations from this session | variable |
| 8. Current task | “Execute step 2: …” | 50–200 tokens |
Anti-pattern: paste entire chat + all tool JSON since session start — quality drops as models attend to noise.
Token budgeting (worked example)
Assume 8k token context budget for a small model route:
| Reserve | Tokens |
|---|---|
| Model output | 1,500 |
| System + policies | 1,200 |
| Memory + scratchpad | 400 |
| Remaining for history + tools | 4,900 |
If tool trace exceeds 3k tokens → summarize older observations into 300 tokens before the next call.
[Tool history summary]
Weather: Sat rain 19°C, Sun clear 26°C.
Places: Explora Museum, Vatican (hours not verified).
[Current step only]
search_places(city=Rome, tags=family, indoor=true)Module 10 covers cost dashboards; this lesson is per-call discipline.
Rolling summary pattern
Every N tool calls (e.g. N=5):
- Call a cheap model or heuristic: “Summarize these observations in ≤200 tokens.”
- Replace raw observations 1–5 with the summary block.
- Keep observations 6+ in full until next summary.
Users keep continuity; context stays bounded.
MCP + memory agent loop (end-to-end)
1. User: "Plan Rome weekend — remember I prefer trains"
2. Load long-term memory → transport_pref: train
3. MCP client: list tools (weather, maps, places, calendar)
4. Planner LLM → JSON plan (no tool execution)
5. For each plan step:
a. Build context stack (system + memory + scratchpad + step)
b. Executor LLM → MCP tool call
c. MCP server runs → observation
d. Update scratchpad + trim context
6. Final synthesizer LLM → user-facing itinerary
7. Optionally persist trip summary to long-term vector memoryThis is the architecture your travel project implements — with or without formal MCP libraries (you can wrap APIs as MCP-style tools first).
Multi-agent context handoffs
A handoff is when one agent finishes its slice of work and passes a compact packet to the next agent.
Bad handoff: entire raw trace (50k tokens, conflicting instructions).
Good handoff — structured packet:
{
"goal": "2-day Rome itinerary, kid-friendly",
"constraints": ["train preferred", "vegetarian food"],
"facts": {
"weather": {"sat": "rain", "sun": "clear"},
"venues": ["Explora Museum"]
},
"open_questions": ["budget not confirmed"],
"completed_step_ids": [1, 2]
}Agent B gets a fresh system prompt for writing + the packet — not A’s full chain-of-thought.
LangGraph implements handoffs as state updates on graph edges (Lesson 6).
Context engineering for RAG + agents
When the agent needs company policy:
- Retrieve top-3 chunks (Module 7) only for this turn
- Insert as
## Policy contextbetween system and memory - Do not mix policy chunks with tool JSON in one unlabeled blob
Citation rule in system prompt: “If policy context conflicts with user request, cite policy and escalate.”
Security
| Risk | Mitigation |
|---|---|
MCP filesystem reads /etc/passwd | Path allowlist — only files under an approved folder |
| SQL MCP writes | Read-only DB user; no DDL |
| Tool output injection | Wrap untrusted text in delimiters; tell model “untrusted” |
| Cross-user memory | Enforce user_id on every memory fetch |
| Secret exfiltration | Block tools that POST to arbitrary URLs |
Log: user_id, tool_name, server, latency_ms, success — not full PII (personal data) payloads.
MCP in development vs production
| Dev | Production |
|---|---|
| Local stdio MCP servers on laptop | Hosted MCP gateways with auth |
| Broad filesystem access | Sandboxed per tenant |
| Console logging | Structured traces (Lesson 8) |
Start with two tools and one MCP server; expand after evals pass (Lesson 7).
Check yourself
- List five context layers in order.
- Why rolling summaries?
- What belongs in a handoff packet vs full chat history?
- How does MCP differ from registering JSON tools directly?