← Back to curriculum

Module 8 — Agentic AI

Agent memory — short & long term

Session vs persisted memory, scratchpads, summarization, vector recall, memory vs RAG, and privacy rules.

~85 min read + exercises

Agent memory — short and long term

Before we begin

LLMs only see what fits in the context window (Module 6). A multi-step agent run can consume thousands of tokens in tool traces alone. Without a memory strategy, you either truncate important facts or pay for huge prompts every turn.

Memory is not magic storage — it is what you choose to load into context each iteration, plus what you persist outside the model.

Figure

Two memory layers

Short-termchat + scratchpadLong-termprefs in DB
Session context (ephemeral) vs persisted profile (cross-visit).

What you will learn

  • Distinguish short-term, long-term, and scratchpad memory.
  • Choose what to store where — and what never to store.
  • Inject memory into planner/executor prompts safely.
  • Apply summarization and vector recall when history grows.
  • Connect memory to RAG (Module 7) without duplicating it.

Before this lesson


Why agents need memory

ProblemWithout memoryWith memory
User said “I hate flying” last weekAgent suggests flights againLoad transport_pref: train from profile
15 tool calls in one sessionContext overflow / truncationSummarize early steps; keep scratchpad
Multi-agent handoffExecutor loses planner contextShared state object — one JSON dict all agents read/write (Lesson 6)
PersonalizationGeneric answersLong-term prefs in DB

Chat history alone is weak memory — it is unbounded, noisy, and expensive. Production agents use structured memory layers.


Short-term memory (session)

What it holds:

  • Current user messages
  • Tool calls and observations from this session
  • Planner JSON and executor traces
  • Optional scratchpad (structured notes)

Where it lives: in-memory list or Redis session keyed by session_id; passed to each LLM call after trimming.

Trimming strategies:

StrategyWhen
Keep last N messagesSimple chat; N = 20–40
Keep last N tokensToken-budget aware (Module 10)
Summarize middleLong research sessions — compress turns 5–20 into 200-token recap
Drop raw HTTPAlways — keep observation summaries only

Example trim: after step 10, replace tool JSON blobs with:

text
Steps 1–5 summary: Fetched Rome weather (Sat rain, Sun clear); shortlisted 3 museums.

Long-term memory (cross-session)

What it holds:

  • User preferences (home_airport, dietary, budget_tier)
  • Past trip summaries (not full chat logs)
  • Explicit “remember that…” facts the user opted into

Where it lives:

  • MongoDB / Postgresusers + memories table keyed by user_id
  • JSON file — demos only
  • Vector store — semantic “remember when we discussed Kyoto?”

Load strategy (critical):

python
def load_memory_for_prompt(user_id: str, query: str) -> str:
    profile = db.get_user_profile(user_id)  # small fixed fields
    relevant = vector_search(user_id, query, k=3)  # optional semantic
    return format_memory_block(profile, relevant)

Do not dump 500 past messages into every prompt.


Scratchpad pattern

A scratchpad is structured working memory shared between planner and executor:

json
{
  "destination": "Rome",
  "dates": {"start": "2025-06-28", "end": "2025-06-29"},
  "weather": {"sat": "rain", "sun": "clear"},
  "candidates": ["Explora Museum", "Villa Borghese"],
  "budget": "mid"
}

Who writes it:

  • Planner sets initial fields from user message.
  • Executor updates after each tool observation.
  • Final agent reads scratchpad to write the itinerary.

In LangGraph, this is the state dict mutated by each node — the canonical multi-agent pattern (Lesson 6).


Memory vs RAG

Both put text into context — different jobs:

RAGAgent memory
SourceCompany docs, wiki, PDFsUser-specific facts and session state
UpdatesRe-index documentsPer user / per session
RetrievalSimilarity to questionProfile fields + optional semantic user history
ModuleModule 7This lesson

A travel agent might RAG your “company travel policy PDF” and memory the user’s “no red-eye flights.”


Vector memory (semantic)

When users say “like last time in Barcelona,” embed past trip summaries and retrieve top matches:

  1. On trip complete, summarize to 100 tokens → embed → store with user_id.
  2. On new query, embed query → nearest neighbor summaries → inject into planner prompt.

Same embedding intuition as Module 1 dot products — higher dimension.

Caution: retrieve summaries, not raw chats with PII (personally identifiable information — names, emails, etc.).


Privacy and compliance

RuleImplementation
ConsentToggle “save preferences”; GDPR (EU privacy law) delete endpoint
TTL (time to live)Expire session memory after 24h; anonymize old trips
Never storeAPI keys, passport numbers, full payment cards in memory tables
AuditLog what memory was loaded for a given answer (support tickets)
MinimizeStore budget_tier: mid not full bank statements

Worked example — memory in prompts

Long-term (from DB):

text
User profile: prefers trains over flights; vegetarian; traveling with child age 7.

Short-term (this session scratchpad):

json
{"city": "Rome", "weather_sat": "rain"}

Planner prompt tail:

text
Profile: {long_term}
Scratchpad: {scratchpad_json}
Goal: {user_message}

Executor sees scratchpad + current step only — smaller, focused context.


Common mistakes

MistakeFix
Entire chat history every callSummarize + scratchpad
Storing tool errors foreverClear or decay stale observations
No user_id isolationCross-user memory leak — catastrophic
Memory without evalTest “user said X last session” cases (Lesson 7)

Travel project checklist

  • MongoDB or file store for user_id → preferences
  • Session scratchpad updated after each tool
  • Planner reads profile; executor reads scratchpad
  • UI does not expose other users’ data

What's next

Lesson 5 — MCP & context engineering — standardizing tools and assembling context each turn.