Agent memory — short and long term

Before we begin

LLMs only see what fits in the context window (Module 6). A multi-step agent run can consume thousands of tokens in tool traces alone. Without a memory strategy, you either truncate important facts or pay for huge prompts every turn.

Memory is not magic storage — it is what you choose to load into context each iteration, plus what you persist outside the model.

Figure

Two memory layers

Session context (ephemeral) vs persisted profile (cross-visit).

What you will learn

Distinguish short-term, long-term, and scratchpad memory.
Choose what to store where — and what never to store.
Inject memory into planner/executor prompts safely.
Apply summarization and vector recall when history grows.
Connect memory to RAG (Module 7) without duplicating it.

Before this lesson

Why agents need memory

Problem	Without memory	With memory
User said “I hate flying” last week	Agent suggests flights again	Load `transport_pref: train` from profile
15 tool calls in one session	Context overflow / truncation	Summarize early steps; keep scratchpad
Multi-agent handoff	Executor loses planner context	Shared state object — one JSON dict all agents read/write (Lesson 6)
Personalization	Generic answers	Long-term prefs in DB

Chat history alone is weak memory — it is unbounded, noisy, and expensive. Production agents use structured memory layers.

Short-term memory (session)

What it holds:

Current user messages
Tool calls and observations from this session
Planner JSON and executor traces
Optional scratchpad (structured notes)

Where it lives: in-memory list or Redis session keyed by session_id; passed to each LLM call after trimming.

Trimming strategies:

Strategy	When
Keep last N messages	Simple chat; N = 20–40
Keep last N tokens	Token-budget aware (Module 10)
Summarize middle	Long research sessions — compress turns 5–20 into 200-token recap
Drop raw HTTP	Always — keep observation summaries only

Example trim: after step 10, replace tool JSON blobs with:

text

Steps 1–5 summary: Fetched Rome weather (Sat rain, Sun clear); shortlisted 3 museums.

Long-term memory (cross-session)

What it holds:

User preferences (home_airport, dietary, budget_tier)
Past trip summaries (not full chat logs)
Explicit “remember that…” facts the user opted into

Where it lives:

MongoDB / Postgres — users + memories table keyed by user_id
JSON file — demos only
Vector store — semantic “remember when we discussed Kyoto?”

Load strategy (critical):

python

def load_memory_for_prompt(user_id: str, query: str) -> str:
    profile = db.get_user_profile(user_id)  # small fixed fields
    relevant = vector_search(user_id, query, k=3)  # optional semantic
    return format_memory_block(profile, relevant)

Do not dump 500 past messages into every prompt.

Scratchpad pattern

A scratchpad is structured working memory shared between planner and executor:

json

{
  "destination": "Rome",
  "dates": {"start": "2025-06-28", "end": "2025-06-29"},
  "weather": {"sat": "rain", "sun": "clear"},
  "candidates": ["Explora Museum", "Villa Borghese"],
  "budget": "mid"
}

Who writes it:

Planner sets initial fields from user message.
Executor updates after each tool observation.
Final agent reads scratchpad to write the itinerary.

In LangGraph, this is the state dict mutated by each node — the canonical multi-agent pattern (Lesson 6).

Memory vs RAG

Both put text into context — different jobs:

	RAG	Agent memory
Source	Company docs, wiki, PDFs	User-specific facts and session state
Updates	Re-index documents	Per user / per session
Retrieval	Similarity to question	Profile fields + optional semantic user history
Module	Module 7	This lesson

A travel agent might RAG your “company travel policy PDF” and memory the user’s “no red-eye flights.”

Vector memory (semantic)

When users say “like last time in Barcelona,” embed past trip summaries and retrieve top matches:

On trip complete, summarize to 100 tokens → embed → store with user_id.
On new query, embed query → nearest neighbor summaries → inject into planner prompt.

Same embedding intuition as Module 1 dot products — higher dimension.

Caution: retrieve summaries, not raw chats with PII (personally identifiable information — names, emails, etc.).

Privacy and compliance

Rule	Implementation
Consent	Toggle “save preferences”; GDPR (EU privacy law) delete endpoint
TTL (time to live)	Expire session memory after 24h; anonymize old trips
Never store	API keys, passport numbers, full payment cards in memory tables
Audit	Log what memory was loaded for a given answer (support tickets)
Minimize	Store `budget_tier: mid` not full bank statements

Worked example — memory in prompts

Long-term (from DB):

text

User profile: prefers trains over flights; vegetarian; traveling with child age 7.

Short-term (this session scratchpad):

json

{"city": "Rome", "weather_sat": "rain"}

Planner prompt tail:

text

Profile: {long_term}
Scratchpad: {scratchpad_json}
Goal: {user_message}

Executor sees scratchpad + current step only — smaller, focused context.

Common mistakes

Mistake	Fix
Entire chat history every call	Summarize + scratchpad
Storing tool errors forever	Clear or decay stale observations
No user_id isolation	Cross-user memory leak — catastrophic
Memory without eval	Test “user said X last session” cases (Lesson 7)

Travel project checklist

MongoDB or file store for user_id → preferences
Session scratchpad updated after each tool
Planner reads profile; executor reads scratchpad
UI does not expose other users’ data

What's next

Lesson 5 — MCP & context engineering — standardizing tools and assembling context each turn.