Agent memory — short and long term
Before we begin
LLMs only see what fits in the context window (Module 6). A multi-step agent run can consume thousands of tokens in tool traces alone. Without a memory strategy, you either truncate important facts or pay for huge prompts every turn.
Memory is not magic storage — it is what you choose to load into context each iteration, plus what you persist outside the model.
Figure
Two memory layers
What you will learn
- Distinguish short-term, long-term, and scratchpad memory.
- Choose what to store where — and what never to store.
- Inject memory into planner/executor prompts safely.
- Apply summarization and vector recall when history grows.
- Connect memory to RAG (Module 7) without duplicating it.
Before this lesson
Why agents need memory
| Problem | Without memory | With memory |
|---|---|---|
| User said “I hate flying” last week | Agent suggests flights again | Load transport_pref: train from profile |
| 15 tool calls in one session | Context overflow / truncation | Summarize early steps; keep scratchpad |
| Multi-agent handoff | Executor loses planner context | Shared state object — one JSON dict all agents read/write (Lesson 6) |
| Personalization | Generic answers | Long-term prefs in DB |
Chat history alone is weak memory — it is unbounded, noisy, and expensive. Production agents use structured memory layers.
Short-term memory (session)
What it holds:
- Current user messages
- Tool calls and observations from this session
- Planner JSON and executor traces
- Optional scratchpad (structured notes)
Where it lives: in-memory list or Redis session keyed by session_id; passed to each LLM call after trimming.
Trimming strategies:
| Strategy | When |
|---|---|
| Keep last N messages | Simple chat; N = 20–40 |
| Keep last N tokens | Token-budget aware (Module 10) |
| Summarize middle | Long research sessions — compress turns 5–20 into 200-token recap |
| Drop raw HTTP | Always — keep observation summaries only |
Example trim: after step 10, replace tool JSON blobs with:
Steps 1–5 summary: Fetched Rome weather (Sat rain, Sun clear); shortlisted 3 museums.Long-term memory (cross-session)
What it holds:
- User preferences (
home_airport,dietary,budget_tier) - Past trip summaries (not full chat logs)
- Explicit “remember that…” facts the user opted into
Where it lives:
- MongoDB / Postgres —
users+memoriestable keyed byuser_id - JSON file — demos only
- Vector store — semantic “remember when we discussed Kyoto?”
Load strategy (critical):
def load_memory_for_prompt(user_id: str, query: str) -> str:
profile = db.get_user_profile(user_id) # small fixed fields
relevant = vector_search(user_id, query, k=3) # optional semantic
return format_memory_block(profile, relevant)Do not dump 500 past messages into every prompt.
Scratchpad pattern
A scratchpad is structured working memory shared between planner and executor:
{
"destination": "Rome",
"dates": {"start": "2025-06-28", "end": "2025-06-29"},
"weather": {"sat": "rain", "sun": "clear"},
"candidates": ["Explora Museum", "Villa Borghese"],
"budget": "mid"
}Who writes it:
- Planner sets initial fields from user message.
- Executor updates after each tool observation.
- Final agent reads scratchpad to write the itinerary.
In LangGraph, this is the state dict mutated by each node — the canonical multi-agent pattern (Lesson 6).
Memory vs RAG
Both put text into context — different jobs:
| RAG | Agent memory | |
|---|---|---|
| Source | Company docs, wiki, PDFs | User-specific facts and session state |
| Updates | Re-index documents | Per user / per session |
| Retrieval | Similarity to question | Profile fields + optional semantic user history |
| Module | Module 7 | This lesson |
A travel agent might RAG your “company travel policy PDF” and memory the user’s “no red-eye flights.”
Vector memory (semantic)
When users say “like last time in Barcelona,” embed past trip summaries and retrieve top matches:
- On trip complete, summarize to 100 tokens → embed → store with
user_id. - On new query, embed query → nearest neighbor summaries → inject into planner prompt.
Same embedding intuition as Module 1 dot products — higher dimension.
Caution: retrieve summaries, not raw chats with PII (personally identifiable information — names, emails, etc.).
Privacy and compliance
| Rule | Implementation |
|---|---|
| Consent | Toggle “save preferences”; GDPR (EU privacy law) delete endpoint |
| TTL (time to live) | Expire session memory after 24h; anonymize old trips |
| Never store | API keys, passport numbers, full payment cards in memory tables |
| Audit | Log what memory was loaded for a given answer (support tickets) |
| Minimize | Store budget_tier: mid not full bank statements |
Worked example — memory in prompts
Long-term (from DB):
User profile: prefers trains over flights; vegetarian; traveling with child age 7.Short-term (this session scratchpad):
{"city": "Rome", "weather_sat": "rain"}Planner prompt tail:
Profile: {long_term}
Scratchpad: {scratchpad_json}
Goal: {user_message}Executor sees scratchpad + current step only — smaller, focused context.
Common mistakes
| Mistake | Fix |
|---|---|
| Entire chat history every call | Summarize + scratchpad |
| Storing tool errors forever | Clear or decay stale observations |
| No user_id isolation | Cross-user memory leak — catastrophic |
| Memory without eval | Test “user said X last session” cases (Lesson 7) |
Travel project checklist
- MongoDB or file store for
user_id → preferences - Session scratchpad updated after each tool
- Planner reads profile; executor reads scratchpad
- UI does not expose other users’ data
What's next
Lesson 5 — MCP & context engineering — standardizing tools and assembling context each turn.