In this issue (8 sections)
Four months into a new job, a user built a firm morning routine. Standup at 9 AM, planning calls at 10. Their AI assistant, running a persistent memory system, stored the fact correctly:
entity: "user"
attribute: "preferred_meeting_time"
value: "morning"
confidence: 0.9
valid_until: NULL
The WRITE phase worked.
Two months later, the user changed companies. The new team runs async. No standups. The user’s morning blocks became protected deep-work time. Afternoon calls, 2–4 PM, became the norm. The user mentioned this in passing. The extraction pipeline caught it. Contradiction detection ran. The old morning preference got valid_until set. A new memory was written: value: "afternoon". Still working.
One week later, the old job — a part-time consulting contract — came back. Morning calls resumed for that context. The user mentioned this too. Another new memory: value: "morning". The system auto-resolved again. Set valid_until on the afternoon preference. Morning wins.
Three months pass. The consulting contract winds down to one call a week. The new job is the user’s reality. Afternoon blocks dominate their calendar. But the agent confidently proposes 9 AM slots. It has a memory that says “morning.” The memory was written correctly. Twice.
Nobody managed it.
This is not a write-time failure. The extraction was correct. The schema was correct. The contradiction detection ran correctly — it just ran on facts that kept oscillating, and nobody noticed. The system did everything right at write time and still ended up with a confidently wrong belief state three months later.
This is the MANAGE phase failure. And it is the failure mode that will happen to every production memory system that skips it.
Why Memory Rots
Every production memory system implements WRITE. Most implement READ. Almost none implement MANAGE — not because it is hard, but because it does not feel urgent until the system has been running for 60 days and the first users notice that their assistant “remembers things wrong.”
Memory rot is deterministic. It follows three failure modes, each with a specific root cause and a specific fix:
Contradiction rot: New information conflicts with stored memory. Without resolution, both facts sit in the store. At retrieval time, the agent may confidently serve the wrong one. The fix is contradiction detection on every write.
Recency drift: Time passes, but scores do not update. A memory written 18 months ago with importance: 0.9 scores the same as one written yesterday. The agent has no mechanism to distinguish “this was important long ago” from “this is important now.” The fix is decay scoring on a schedule.
Consolidation bloat: Agentic systems write many observations. “User mentioned preferring Python.” “User used Python in this task.” “User said Python is their go-to language.” Three memories. Same fact. Different phrasings. Together they crowd out genuinely distinct memories. The fix is periodic consolidation.
None of these require complex infrastructure. They require three focused operations running on a schedule. That is the MANAGE phase.
As we built in Issue 2, the WRITE phase produces memories with valid_until IS NULL. The MANAGE phase is what eventually sets those timestamps.
MANAGE PHASE LOOP
memories table (active: valid_until IS NULL)
│
├─────────────────────────────────────────────┐
│ │
▼ │
NEW WRITE ──entity+attr match──▶ CONTRADICTION │
(WRITE phase) CHECK │
│ │
auto-resolve │
(set valid_until │
+ superseded_by) │
│ │
DECAY SCORING │
(every hour) │
│ │
raw = exp(-lambda*age) │
boost = f(access_count) │
decay_score = combined │
│ │
CONSOLIDATION │
(weekly / on-demand) │
│ │
embed -> cluster (0.85) │
LLM merge group │
insert canonical │
set valid_until on N │
│ │
UPDATED MEMORY STORE────────┘
- no active conflicts
- fresh decay_scores
- no redundant entries
│
READ phase (Issue 04) Contradiction Detection
Contradiction detection answers one question: does this new memory conflict with something we already believe?
Most contradictions in memory systems are not about the text of the memory — they are about the structured fact underneath it. “User prefers morning meetings” and “User now prefers afternoon meetings” have different text but conflict on the same claim: the value of user.preferred_meeting_time.
This is why the entity/attribute/value pattern exists in the schema. Without those three fields, contradiction detection requires semantic similarity comparisons — expensive, probabilistic, and prone to false positives. With them, contradiction detection is a SQL query.
Here is how Recall detects contradictions (worker.py, _handle_contradiction):
async def _handle_contradiction(db: Any, memory: dict, user_id: str) -> None:
"""Mark conflicting active memories as superseded when entity+attribute match but value differs."""
if not memory.get("entity") or not memory.get("attribute"):
return
existing = await db.execute_fetchall(
"SELECT id FROM memories WHERE user_id = ? AND entity = ? AND attribute = ? "
"AND valid_until IS NULL AND (value IS NULL OR value != ?)",
(user_id, memory["entity"], memory["attribute"], memory["value"]),
)
for (ex_id,) in existing:
await db.execute(
"UPDATE memories SET valid_until = ?, superseded_by = ? WHERE id = ?",
(memory["created_at"], memory["id"], ex_id),
)
The logic:
- Only check memories with
entityandattributeset — procedure and decision types rarely have these - Query for active memories (
valid_until IS NULL) with the sameuser_id + entity + attributebut a differentvalue - For each match: set
valid_untilto the new memory’s timestamp, setsuperseded_byto the new memory’s ID
The old memory is not deleted — it is timestamped as expired and linked to its replacement. Given any current memory, you can trace the superseded_by chain backward to reconstruct the full history of what the system believed about that entity+attribute pair.
The valid_until and superseded_by columns are in Recall’s schema:
superseded_by TEXT, -- FK to memories.id that superseded this
valid_until TEXT, -- ISO8601 — NULL means still active; set on contradiction
The partial index that makes contradiction detection fast:
CREATE INDEX IF NOT EXISTS idx_memories_entity_attr
ON memories(user_id, entity, attribute)
WHERE valid_until IS NULL;
This index covers only active memories — exactly the set the contradiction query scans.
Auto-resolve vs. escalate: Recall auto-resolves all contradictions today. For most preference updates — “I now prefer TypeScript over JavaScript” — this is correct. But for context-dependent facts that genuinely alternate, auto-resolution creates churn. The anchor story is exactly this. The right response is escalation: flag the entity+attribute pair as contested and surface it for disambiguation. The failure mode and fix are covered below.
| Scenario | Contradiction Type | Recommended Action | Reasoning |
|---|---|---|---|
| User states new preference ("I now prefer TypeScript over JavaScript") | Preference update | Auto-resolve | Unambiguous supersession. New value reflects current state. |
| User corrects a fact ("My timezone is PST, not EST") | Factual correction | Auto-resolve | Definitive correction. Old value was simply wrong. |
| User's context genuinely alternates (morning meetings for one job, afternoons for another) | Ambiguous context change | Escalate to input-required | Auto-resolution creates churn. Needs context-scoped values or user-provided disambiguation. |
Memory Decay
In 1885, Hermann Ebbinghaus published Über das Gedächtnis — the first empirical study of memory retention. Experimenting on himself, he showed that forgetting follows exponential decay: you lose roughly 42% of newly learned material within 20 minutes, 56% within an hour, and 79% within a month — without reinforcement. He also showed the inverse: repeated retrieval increases memory strength and extends the interval before the next reinforcement is needed.
This is the foundational model behind every spaced repetition system. It also maps directly to machine memory.
The intuition: a memory that has never been retrieved since it was written is more likely to be stale. A memory retrieved frequently has demonstrated its ongoing relevance — it should decay slower. Time-based decay alone misses this. A user who pauses their use of the system for six months — on sabbatical, between projects — would return to a memory store where everything has decayed near zero. The facts are still accurate. The scores do not reflect that. Access-count protection prevents this.
Here is Recall’s exact decay formula, from decay.py:
raw_decay = math.exp(-self._lambda * age_days)
access_boost = min(1.0, math.log(1 + ac) / math.log(1 + self._boost_cap))
decay_score = raw_decay + (1 - raw_decay) * access_boost
Where:
age_days= days sincelast_accessed(falls back tocreated_atif never accessed)ac=access_countfor this memorylambdadefaults to0.02(~35-day half-life:ln(2) / 0.02 ≈ 34.7 days)boost_capdefaults to10(at 10 accesses, the memory has full decay protection)
The formula has a clean interpretation: raw_decay is the Ebbinghaus curve. access_boost scales the gap between raw_decay and 1.0 based on access frequency. At zero accesses, access_boost = 0 and decay_score = raw_decay. At boost_cap accesses, access_boost = 1.0 and decay_score = 1.0 — the memory never decays regardless of age.
The DecayWorker class manages the scheduled execution:
class DecayWorker:
"""Scheduled in-process asyncio task that updates decay_score on active memories."""
def __init__(self) -> None:
self._lambda = float(os.environ.get("RECALL_DECAY_LAMBDA", 0.02))
self._boost_cap = float(os.environ.get("RECALL_DECAY_BOOST_CAP", 10))
self._interval = int(os.environ.get("RECALL_DECAY_JOB_INTERVAL", 3600))
async def run_once(self) -> int:
"""Apply decay to all active memories. Returns count of memories updated."""
# Queries all memories WHERE valid_until IS NULL
# Computes raw_decay + access_boost for each
# Updates decay_score column
# Returns count of memories updated
Default interval: 3600 seconds (every hour). The decay_score is then used in hybrid search as a multiplier on the importance component:
effective_importance = (m.get("importance") or 0.5) * (decay if decay is not None else 1.0)
A memory with importance: 0.9 and decay_score: 0.3 has effective importance 0.27. It does not disappear from retrieval — it ranks lower than a fresher memory with the same nominal importance. Decayed memories are deprioritized, not deleted.
Tuning note: the default lambda = 0.02 (35-day half-life) is calibrated for weekly engagement. For high-frequency daily sessions, consider a lower λ (longer half-life). For rare monthly sessions, consider a higher λ.
Consolidation
After enough time, similar memories accumulate. Not contradictions — genuinely related facts that arrived at different times through different conversations.
A user working with an agent for six months might have:
- “User finds long meetings exhausting.”
- “User prefers async communication over synchronous calls.”
- “User mentioned preferring written summaries over verbal updates.”
- “User says they’re most productive in deep work blocks without interruptions.”
Four memories. One underlying preference: minimize synchronous communication load. Each arrived separately. Each is stored correctly. Together they create retrieval noise — and they compete for context window space when the agent needs relevant memories.
Consolidation reduces N similar memories to 1 canonical memory. The originals are superseded (soft-deleted), and the canonical memory captures the synthesized insight.
Recall’s consolidate_memories tool runs this flow:
@mcp.tool()
async def consolidate_memories(
topic: str,
similarity_threshold: float = 0.85,
dry_run: bool = False,
) -> dict:
"""Find semantically similar memories in a topic and merge them into canonical facts.
Requires embeddings extra: pip install 'szl-recall[embeddings]'. Returns a diff."""
The full flow:
- Fetch all active memories for the given
topicwherevalid_until IS NULL - Embed all memory texts using the configured embedding model
- Cluster using greedy cosine similarity at the threshold (default: 0.85). Groups with fewer than 2 members are skipped.
- LLM merge each group with Claude Haiku: produce one canonical memory preserving all distinct information without over-generalizing
- Persist: insert canonical memories, set
valid_untilon originals
The dry_run=True option returns a preview diff — the memories that would be merged and the proposed canonical text — without writing to the database. Always use this before the first production consolidation run.
When to run: on-demand after a long session that produced many memories on a single topic, or on a weekly schedule for active users. Consolidation requires embeddings and LLM calls; it should not run on every write.
The threshold matters. At 0.85, only near-identical phrasings cluster. At 0.70, conceptually related but differently-worded memories merge — appropriate for preferences, risky for facts where two similar-sounding facts might be genuinely distinct. Start at 0.85 and tune downward if redundant memories persist.
Failure Mode: The Contradiction Loop
The anchor story is exactly this failure: the user’s context genuinely alternates between two states (morning meetings for consulting, afternoon blocks for the main job). Each state is correct for its context. The extraction pipeline correctly identifies both. The contradiction detection correctly fires. The result is a memory that keeps toggling — high valid_until churn on the same entity + attribute.
This is not a bug in the contradiction detection logic. It is a signal that auto-resolution is insufficient for this entity+attribute pair.
How to detect contradiction loops:
-- Find entity+attribute pairs with high valid_until churn (potential contradiction loops)
SELECT
entity,
attribute,
COUNT(*) AS superseded_count,
MAX(valid_until) AS last_superseded,
MIN(valid_from) AS first_seen,
GROUP_CONCAT(value, ' -> ') AS value_history
FROM memories
WHERE user_id = :user_id
AND valid_until IS NOT NULL
AND entity IS NOT NULL
AND attribute IS NOT NULL
AND valid_until >= datetime('now', '-30 days')
GROUP BY entity, attribute
HAVING superseded_count >= 3
ORDER BY superseded_count DESC;
A result of 3+ supersessions for the same entity + attribute in 30 days means the system is in a contradiction loop.
The fix is escalation rather than auto-resolution. When the loop is detected:
- Stop auto-resolving for this
entity + attributepair - Surface the conflict to the user or a supervisor agent
- Allow explicit input: “When you mean the consulting context, say X; otherwise assume Y”
- Store the resolution as a meta-memory with higher
importanceandconfidence
This is the input-required pattern: escalate rather than resolve autonomously when auto-resolution has demonstrably failed.
Decision Guide: What to Run When
The MANAGE phase is three operations. They run at different frequencies and on different triggers:
Contradiction detection — runs synchronously on every memory write. It is an indexed SQL lookup that adds under 5ms to the write path. There is no reason to batch or delay it.
Decay scoring — runs on a schedule. Default: every hour. It touches all active memories but is compute-light (pure Python math, no LLM calls). Increasing the interval only introduces staleness between runs, not correctness problems.
Consolidation — runs on demand or on a weekly schedule. It requires embeddings and LLM calls. It is the most expensive operation and should not run continuously.
| Operation | Trigger | Cost | Notes |
|---|---|---|---|
| Contradiction detection | Every write (synchronous) | Very low — indexed SQL | Blocking; runs before INSERT |
| Decay scoring | Every hour (configurable) | Low — Python math only | RECALL_DECAY_JOB_INTERVAL=3600 |
| Consolidation | Weekly or on-demand | High — embeddings + LLM | Use dry_run=True first |
The MANAGE phase is not a continuous maintenance loop. It is three targeted operations, scheduled appropriately, running against a schema designed for them from the start. The full implementation in Recall — all three workers — runs in under 200 lines.
Production Checklist: MANAGE Phase Readiness
| Item | Score | |
|---|---|---|
| Schema has valid_until and superseded_by columns — without these, neither contradiction detection nor consolidation can work correctly. Both columns must be present before the first memory is written. | ||
| Contradiction detection runs on every memory write — synchronously, before the new memory is inserted. Any delay allows the system to briefly hold two conflicting facts as active. | ||
| Decay scoring is on a schedule — the DecayWorker or equivalent is running. Verify that decay_score values in your DB are non-NULL and vary across memories (not all 1.0). | ||
| Decay formula includes access-count protection — time-based decay alone punishes memories from low-frequency users. Verify the formula lifts memories with high access_count toward decay_score = 1.0. | ||
| At least one consolidation pass has run — if your system has been storing memories for more than 30 days, you likely have redundant clusters. Run consolidate_memories with dry_run=True to audit before committing. | ||
| You have a contradiction loop audit query — SQL to find high valid_until churn (3+ supersessions in 30 days) on the same entity + attribute. Run it monthly or after any reported stale-memory incident. |
Resources
Issue 4 covers the READ phase — retrieval that weights decayed memories appropriately and suppresses superseded ones.
Until next issue,
Sentient Zero Labs