In this issue (10 sections)
A team at a mid-size SaaS company spent three weeks building a vector memory system for their AI assistant. They set up an embedding pipeline, provisioned a vector database, implemented HNSW indexing, built a semantic search layer, and added a reranking step. They wrote tests. They tuned similarity thresholds. Three weeks of careful engineering work.
The agent needed to remember five facts about each user: name, timezone, language preference, notification settings, and plan tier.
A Postgres table with five columns and one row per user would have solved the problem. Two hours of work. Zero ongoing maintenance. The vector infrastructure they built solved a categorically different problem — one they did not have.
This is not a memory problem. It is a pattern selection problem. The team had cargo-culted a popular architecture without asking the prior question: can you enumerate every fact this agent needs to remember? Yes, they could — five of them, clearly defined. If you can enumerate the facts, you do not need semantic search over a vector index. You need a lookup table.
The question is not how to implement memory. It is which implementation your use case actually requires.
Issues 2, 3, and 4 covered the WRITE, MANAGE, and READ phases of the memory loop — how to extract facts, resolve contradictions, and retrieve the right memories at query time. This issue steps back to ask which of the four patterns those phases apply to, and when.
The Four Patterns
Four patterns cover the full range of memory architectures used in production agents. They are ordered by complexity, not preference. The right pattern depends on three questions: How many facts does your agent need to remember? Do they persist across sessions? Does the agent need to answer open-ended questions about its own history?
Pattern 1 — In-Session Compression: No persistent storage. The context window is the only memory. Rolling summaries handle context limits. The agent forgets everything when the session ends.
Pattern 2 — KV Fact Store: Structured storage of typed facts. Entity/attribute/value triples in a relational database. Direct lookup by key. Cross-session persistence. No embeddings required.
Pattern 3 — Vector Semantic Recall: Embeddings stored alongside structured facts. Semantic search over memory when the query does not map to a known key. Layered on top of Pattern 2, not a replacement for it.
Pattern 4 — Episodic Structured Log: Full conversation history stored as timestamped entries with semantic retrieval. The agent can reason about its own past. The canonical example is Generative Agents (Park et al. 2023). Required only for long-horizon agents with continuity requirements over months of interaction.
Each pattern costs more to build and maintain than the one before it. Each upgrade is only justified when the previous pattern’s limits are actually hit.
MEMORY ARCHITECTURE COMPLEXITY AXIS
═══════════════════════════════════════════════════════════════════════
Pattern 1 Pattern 2 Pattern 3 Pattern 4
In-Session KV Fact Store Vector Semantic Episodic Log
Compression Recall
──────────────────────────────────────────────────────────────────────►
Complexity / Cost
Storage: None Relational DB Relational DB Relational DB
(5–7 cols) + Embeddings + Full history
Cross-session: No Yes Yes Yes
Semantic
search: No No Yes Yes
Facts/user: N/A 1–100 100–10,000 Unbounded
Contradiction
detection: N/A Deterministic Deterministic Semantic (slow)
(entity+attr) (entity+attr)
Relative
cost: $ $$ $$$ $$$$ Pattern 1: In-Session Compression
Pattern 1 is what every agent does by default, whether it chooses to or not. The context window is the memory system. Conversation history accumulates until it approaches the limit, at which point rolling summarization compresses the oldest turns — a cheap LLM call that replaces N turns with a 3-5 sentence summary.
This is the right starting point for a reason: it works, requires no infrastructure, and is sufficient for a large class of use cases. Customer service agents resolving a single ticket. One-shot coding assistants. Prototypes where cross-session continuity is not a requirement. Any agent where the conversation has a defined start and end, and the user does not expect the agent to remember who they are next time.
Two risks are worth naming explicitly.
Lost-in-the-middle (Liu et al. 2023, arXiv:2307.03172): Language models do not retrieve information uniformly across a long context. Performance is highest for content at the beginning and end of the context window, and degrades significantly for content in the middle — even for explicitly long-context models. A preference stated in turn 3 of a 200-turn conversation is in a risky position.
Summarization drift: Every summarization pass is a lossy compression. A preference that was concrete in turn 3 (“I always use Python for backend work, Go for CLIs”) becomes “User prefers Python” after one compression, then “User has language preferences” after two. If your agent’s responses start feeling generic despite a long shared history, drift is a likely cause. (Forward reference to Issue 6: Failure Modes.)
The upgrade trigger from Pattern 1 to Pattern 2 is simple: the user starts a new session and the agent does not know who they are.
Pattern 2: KV Fact Store
The principle behind Pattern 2 dates to the mid-1950s: if you know what you are looking for, compute where it should be rather than searching for it. The credit is contested — variously attributed to Hans Peter Luhn’s 1953 IBM memo or to H. A. M. Dumey’s 1956 paper in Computers and Automation — but the principle is not. Structure beats brute search for known-key lookups.
Seventy-three years later, the correct approach for storing a user’s timezone preference is still a lookup table. SELECT value FROM memories WHERE entity='user' AND attribute='timezone'. Microseconds. Via a B-tree index. Returns the right answer, not a semantically similar one.
Pattern 2 is a KV fact store. Each memory is a typed triple: who (entity), what property (attribute), and what value (value). The type is one of four: preference, fact, decision, or procedure. The taxonomy matters because different types have different contradiction behavior — a decision can be superseded by a later decision without touching a preference; two procedures can coexist even if they conflict in detail.
This is exactly how Recall implements the core storage layer. The memories table has entity, attribute, and value columns. When a new memory arrives for the same (user_id, entity, attribute), the old memory’s valid_until is set to now and excluded from all future queries. Contradiction detection happens deterministically, without embedding lookup.
Recall’s MemoryType enum defines the four types:
class MemoryType(str, Enum):
PREFERENCE = "preference" # stated likes/dislikes, settings
FACT = "fact" # verifiable information about the user
DECISION = "decision" # choices the user has made
PROCEDURE = "procedure" # steps or workflows the user follows
The case for Pattern 2 over Pattern 3 for structured facts: if you can name the attributes in advance, direct lookup is faster, more deterministic, and gives you contradiction detection as a free side effect. Semantic search over an embedding of “what is the user’s timezone preference” is slower, probabilistic, and will occasionally return the second-best match instead of the definitive one.
When to use Pattern 2: fewer than 100 facts per user, mostly typed preferences and decisions, cross-session personalization without open-ended history queries. This covers the vast majority of production personalization agents.
Pattern 3: Vector Semantic Recall
Pattern 3 is the layer you add when Pattern 2’s structured lookup is not sufficient — not the layer you start with.
The upgrade trigger is a specific query type: “tell me about my past projects,” “what do you know about my work on the auth service,” “remind me what we decided about the API design.” These queries do not map to a known entity/attribute pair. The user is asking the agent to search over memory by semantic relevance, not to retrieve a specific fact by key.
Recall adds Pattern 3 via a single install variant:
pip install "szl-recall[embeddings]"
This loads BAAI/bge-small-en-v1.5 (~500MB on first run) and enables dense vector retrieval. Embeddings are stored in the embedding column of the memories table — a BLOB of L2-normalized float32 values. BM25Plus keyword ranking is fused with dense cosine similarity via Reciprocal Rank Fusion (RRF, k=60). Without the embeddings package installed, Recall falls back to BM25-only — Pattern 2 continues to work fully.
This graceful degradation is the architecture point: Pattern 3 is a layer on top of Pattern 2, not a replacement. Structured facts still use direct lookup. Semantic search applies to the broader memory set when a query cannot be resolved by key.
What you take on when you add Pattern 3:
- Embedding latency: 5-20ms per query on CPU to encode the query vector
- Storage overhead: ~1.5KB per memory entry for a 384-dimension dense vector
- Re-embedding on update: when a memory’s text is revised, its embedding must be regenerated
- Model pinning: changing embedding models requires regenerating all stored embeddings
| Pattern 2 only | Pattern 2 + Pattern 3 | |
|---|---|---|
| Install | pip install szl-recall | pip install szl-recall[embeddings] |
| Schema | entity + attribute + value + valid_until | Same + embedding BLOB column populated |
| Search path | BM25Plus keyword ranking only | BM25Plus + dense vectors, fused via RRF (k=60) |
| Contradiction detection | Deterministic: same entity+attribute → supersede | Same (unaffected by embeddings) |
| Query latency | <5ms (B-tree index) | 5–20ms (embedding encode + ANN search) |
| When to use | ≤100 facts/user, structured preferences | Large fact sets, open-ended history queries |
Pattern 4: Episodic Structured Log
Pattern 4 is the full episodic memory architecture — full conversation history, stored as timestamped diary entries, retrievable by semantic similarity. The canonical example is Generative Agents (Park et al. 2023, arXiv:2304.03442).
In that architecture, each agent maintains a memory stream: a chronological log of every observation in natural language with timestamps. Retrieval scores each memory by three equal-weight components — recency (exponential decay over hours), importance (a 1-10 poignancy score assigned at write time by an LLM), and relevance (cosine similarity to the query). The agent periodically runs a reflection pass: when cumulative importance of recent observations crosses a threshold, it generates a higher-order insight and stores it as a new memory entry.
This architecture is necessary for a specific class of agent: one that needs to reason about its own history. “Last time we talked about this, you were leaning toward FastAPI — have you made a decision?” That question requires that the prior session’s content be retrievable as a specific moment. A rolling summary would not preserve the detail. A KV fact store would not have captured it as a structured fact.
The cost is real: storage grows with every session. For a daily-use agent over 6 months, that is potentially thousands of entries — each needing an embedding. A production deployment at scale requires persistent vector storage, ANN index rebuild cycles, a reflection scheduler, and separate storage for raw observations versus synthesized reflections — none of which is present in the original research prototype.
Most agents do not need Pattern 4. The question is whether your agent’s users have continuity requirements over months of interaction and need the agent to reason about history as a narrative. Research assistants, long-running project agents: yes. Customer service bots, code assistants, scheduling agents: no.
The Decision Matrix
| Pattern | Facts / User | Cross-session | Open-ended History | Relative Cost | Upgrade Trigger |
|---|---|---|---|---|---|
| 1 — In-session compression | None (stateless) | No | No | $ | User returns next session and agent does not know them |
| 2 — KV fact store | 1–100 (enumerable) | Yes | No | $$ | User asks "tell me about my projects" — lookup cannot answer |
| 3 — Vector semantic recall | 100–10,000 | Yes | Yes | $$$ | Agent must reason over its own session history as narrative |
| 4 — Episodic structured log | Unbounded | Yes | Yes (full) | $$$$ | You have hit Pattern 4 — consider cost controls instead |
The upgrade path is linear: 1 → 2 → 3 → 4. Each step adds complexity only when the prior step’s limits are actually reached. Skipping steps is not an efficiency gain — it is premature optimization.
Building Pattern 2: A Complete Implementation
The minimal viable KV memory store is small enough to show completely.
CREATE TABLE memories (
id TEXT PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
entity TEXT NOT NULL,
attribute TEXT NOT NULL,
value TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
valid_until TIMESTAMPTZ
);
CREATE INDEX idx_memories_lookup
ON memories(user_id, entity, attribute)
WHERE valid_until IS NULL;
import sqlite3
from datetime import datetime
def upsert_fact(conn, user_id: str, entity: str, attribute: str, value: str):
"""Store a fact. Supersedes any existing active fact for this entity+attribute."""
now = datetime.utcnow().isoformat()
conn.execute(
"UPDATE memories SET valid_until = ? "
"WHERE user_id = ? AND entity = ? AND attribute = ? AND valid_until IS NULL",
(now, user_id, entity, attribute),
)
conn.execute(
"INSERT INTO memories (user_id, entity, attribute, value, created_at) "
"VALUES (?, ?, ?, ?, ?)",
(user_id, entity, attribute, value, now),
)
conn.commit()
def get_facts(conn, user_id: str, entity: str) -> dict[str, str]:
"""Return all active facts for an entity as {attribute: value}."""
rows = conn.execute(
"SELECT attribute, value FROM memories "
"WHERE user_id = ? AND entity = ? AND valid_until IS NULL",
(user_id, entity),
).fetchall()
return {row[0]: row[1] for row in rows}
Seven columns. One index. Fifteen lines of Python. Works with SQLite for local agents, Postgres for production. Recall is the production version of this — with typed extraction, contradiction detection, scoring, decay, and hybrid search — but the foundation is what you see above.
If you are building a new agent and you need cross-session memory, start here. Add Pattern 3 when a user asks an open-ended history question that this cannot answer.
Failure Mode: Premature Optimization
The 3-week vector store story from the opening is not unusual. It is the default path for teams that discover “agents need memory” and reach directly for the tooling they have seen in demos and benchmarks.
The diagnostic question: Can you enumerate all the things your agent needs to remember? Write them down. If the list has fewer than 100 items and they all have clear attribute names — name, timezone, language preference, plan tier, recent project name — Pattern 2 is correct.
Two detection signals that you have over-built:
- The embedding model is your largest dependency but you never run queries that could not be answered by a key lookup.
- Contradiction detection is complex or absent because your architecture has no structured fields to compare — just semantic similarity between free-text memory entries.
The fix is not to rebuild from scratch. Add entity, attribute, and value columns to your existing memory table, populate them for structured facts, and route key lookups through direct SQL. Semantic search can coexist — keep it for the queries it actually helps.
Earning Pattern 3 or 4 means hitting Pattern 2’s limits first. The limit is when a user asks an open-ended question about their history that direct lookup cannot answer. Not before.
Production Checklist
| Item | Score | |
|---|---|---|
| You can enumerate the facts your agent needs to remember — or have consciously accepted that you cannot and chosen Pattern 3 accordingly. | ||
| You have assigned a pattern (1, 2, 3, or 4) based on the decision matrix before writing implementation code. | ||
| If Pattern 2: schema has at minimum (user_id, entity, attribute, value, valid_until). Contradiction detection is a single UPDATE + INSERT, not a semantic similarity check. | ||
| If Pattern 3: you have a plan for embedding maintenance (re-embedding on text changes, index rebuild schedule) and a fallback to direct lookup for structured facts. | ||
| If Pattern 4: you have justified full episodic logging over Pattern 3 by identifying a specific query type that requires reasoning over session history. | ||
| You have a written upgrade trigger: the exact condition under which you will move from your current pattern to the next one. |
Resources
Memory in AI Systems is a seven-issue series from Sentient Zero Labs. Issue 06 covers failure modes — what goes wrong when you get the pattern right but the implementation wrong. Summarization drift, embedding staleness, and contradiction storms.
Until next issue,
Sentient Zero Labs