Memory in AI Systems Issue 3/7

The MANAGE Phase — The Work Nobody Does

Memory without curation rots. How contradiction detection, decay scoring, and consolidation keep your agent's belief state accurate over time.

May 12, 2026 · 15 min read · Sentient Zero Labs

In this issue (8 sections)

Four months into a new job, a user built a firm morning routine. Standup at 9 AM, planning calls at 10. Their AI assistant, running a persistent memory system, stored the fact correctly:

entity: "user"
attribute: "preferred_meeting_time"
value: "morning"
confidence: 0.9
valid_until: NULL

The WRITE phase worked.

Two months later, the user changed companies. The new team runs async. No standups. The user’s morning blocks became protected deep-work time. Afternoon calls, 2–4 PM, became the norm. The user mentioned this in passing. The extraction pipeline caught it. Contradiction detection ran. The old morning preference got valid_until set. A new memory was written: value: "afternoon". Still working.

One week later, the old job — a part-time consulting contract — came back. Morning calls resumed for that context. The user mentioned this too. Another new memory: value: "morning". The system auto-resolved again. Set valid_until on the afternoon preference. Morning wins.

Three months pass. The consulting contract winds down to one call a week. The new job is the user’s reality. Afternoon blocks dominate their calendar. But the agent confidently proposes 9 AM slots. It has a memory that says “morning.” The memory was written correctly. Twice.

Nobody managed it.

This is not a write-time failure. The extraction was correct. The schema was correct. The contradiction detection ran correctly — it just ran on facts that kept oscillating, and nobody noticed. The system did everything right at write time and still ended up with a confidently wrong belief state three months later.

This is the MANAGE phase failure. And it is the failure mode that will happen to every production memory system that skips it.

Why Memory Rots

Every production memory system implements WRITE. Most implement READ. Almost none implement MANAGE — not because it is hard, but because it does not feel urgent until the system has been running for 60 days and the first users notice that their assistant “remembers things wrong.”

Memory rot is deterministic. It follows three failure modes, each with a specific root cause and a specific fix:

Contradiction rot: New information conflicts with stored memory. Without resolution, both facts sit in the store. At retrieval time, the agent may confidently serve the wrong one. The fix is contradiction detection on every write.

Recency drift: Time passes, but scores do not update. A memory written 18 months ago with importance: 0.9 scores the same as one written yesterday. The agent has no mechanism to distinguish “this was important long ago” from “this is important now.” The fix is decay scoring on a schedule.

Consolidation bloat: Agentic systems write many observations. “User mentioned preferring Python.” “User used Python in this task.” “User said Python is their go-to language.” Three memories. Same fact. Different phrasings. Together they crowd out genuinely distinct memories. The fix is periodic consolidation.

None of these require complex infrastructure. They require three focused operations running on a schedule. That is the MANAGE phase.

As we built in Issue 2, the WRITE phase produces memories with valid_until IS NULL. The MANAGE phase is what eventually sets those timestamps.

MANAGE PHASE LOOP
                                                           
memories table (active: valid_until IS NULL)              
      │                                                   
      ├─────────────────────────────────────────────┐     
      │                                             │     
      ▼                                             │     
NEW WRITE ──entity+attr match──▶ CONTRADICTION      │     
(WRITE phase)                       CHECK           │     
                                      │             │     
                                auto-resolve        │     
                                (set valid_until    │     
                                 + superseded_by)   │     
                                      │             │     
                            DECAY SCORING           │     
                            (every hour)            │     
                                      │             │     
                            raw = exp(-lambda*age)  │     
                            boost = f(access_count) │     
                            decay_score = combined  │     
                                      │             │     
                            CONSOLIDATION           │     
                            (weekly / on-demand)    │     
                                      │             │     
                            embed -> cluster (0.85) │     
                            LLM merge group         │     
                            insert canonical        │     
                            set valid_until on N    │     
                                      │             │     
                        UPDATED MEMORY STORE────────┘     
                        - no active conflicts             
                        - fresh decay_scores              
                        - no redundant entries            
                                      │                   
                                 READ phase (Issue 04)

Contradiction Detection

Contradiction detection answers one question: does this new memory conflict with something we already believe?

Most contradictions in memory systems are not about the text of the memory — they are about the structured fact underneath it. “User prefers morning meetings” and “User now prefers afternoon meetings” have different text but conflict on the same claim: the value of user.preferred_meeting_time.

This is why the entity/attribute/value pattern exists in the schema. Without those three fields, contradiction detection requires semantic similarity comparisons — expensive, probabilistic, and prone to false positives. With them, contradiction detection is a SQL query.

Here is how Recall detects contradictions (worker.py, _handle_contradiction):

async def _handle_contradiction(db: Any, memory: dict, user_id: str) -> None:
    """Mark conflicting active memories as superseded when entity+attribute match but value differs."""
    if not memory.get("entity") or not memory.get("attribute"):
        return
    existing = await db.execute_fetchall(
        "SELECT id FROM memories WHERE user_id = ? AND entity = ? AND attribute = ? "
        "AND valid_until IS NULL AND (value IS NULL OR value != ?)",
        (user_id, memory["entity"], memory["attribute"], memory["value"]),
    )
    for (ex_id,) in existing:
        await db.execute(
            "UPDATE memories SET valid_until = ?, superseded_by = ? WHERE id = ?",
            (memory["created_at"], memory["id"], ex_id),
        )

The logic:

Only check memories with entity and attribute set — procedure and decision types rarely have these
Query for active memories (valid_until IS NULL) with the same user_id + entity + attribute but a different value
For each match: set valid_until to the new memory’s timestamp, set superseded_by to the new memory’s ID

The old memory is not deleted — it is timestamped as expired and linked to its replacement. Given any current memory, you can trace the superseded_by chain backward to reconstruct the full history of what the system believed about that entity+attribute pair.

The valid_until and superseded_by columns are in Recall’s schema:

superseded_by   TEXT,   -- FK to memories.id that superseded this
valid_until     TEXT,   -- ISO8601 — NULL means still active; set on contradiction

The partial index that makes contradiction detection fast:

CREATE INDEX IF NOT EXISTS idx_memories_entity_attr
    ON memories(user_id, entity, attribute)
    WHERE valid_until IS NULL;

This index covers only active memories — exactly the set the contradiction query scans.

Auto-resolve vs. escalate: Recall auto-resolves all contradictions today. For most preference updates — “I now prefer TypeScript over JavaScript” — this is correct. But for context-dependent facts that genuinely alternate, auto-resolution creates churn. The anchor story is exactly this. The right response is escalation: flag the entity+attribute pair as contested and surface it for disambiguation. The failure mode and fix are covered below.

Scenario	Contradiction Type	Recommended Action	Reasoning
User states new preference ("I now prefer TypeScript over JavaScript")	Preference update	Auto-resolve	Unambiguous supersession. New value reflects current state.
User corrects a fact ("My timezone is PST, not EST")	Factual correction	Auto-resolve	Definitive correction. Old value was simply wrong.
User's context genuinely alternates (morning meetings for one job, afternoons for another)	Ambiguous context change	Escalate to input-required	Auto-resolution creates churn. Needs context-scoped values or user-provided disambiguation.

Memory Decay

In 1885, Hermann Ebbinghaus published Über das Gedächtnis — the first empirical study of memory retention. Experimenting on himself, he showed that forgetting follows exponential decay: you lose roughly 42% of newly learned material within 20 minutes, 56% within an hour, and 79% within a month — without reinforcement. He also showed the inverse: repeated retrieval increases memory strength and extends the interval before the next reinforcement is needed.

This is the foundational model behind every spaced repetition system. It also maps directly to machine memory.

The intuition: a memory that has never been retrieved since it was written is more likely to be stale. A memory retrieved frequently has demonstrated its ongoing relevance — it should decay slower. Time-based decay alone misses this. A user who pauses their use of the system for six months — on sabbatical, between projects — would return to a memory store where everything has decayed near zero. The facts are still accurate. The scores do not reflect that. Access-count protection prevents this.

Here is Recall’s exact decay formula, from decay.py:

raw_decay    = math.exp(-self._lambda * age_days)
access_boost = min(1.0, math.log(1 + ac) / math.log(1 + self._boost_cap))
decay_score  = raw_decay + (1 - raw_decay) * access_boost

Where:

age_days = days since last_accessed (falls back to created_at if never accessed)
ac = access_count for this memory
lambda defaults to 0.02 (~35-day half-life: ln(2) / 0.02 ≈ 34.7 days)
boost_cap defaults to 10 (at 10 accesses, the memory has full decay protection)

The formula has a clean interpretation: raw_decay is the Ebbinghaus curve. access_boost scales the gap between raw_decay and 1.0 based on access frequency. At zero accesses, access_boost = 0 and decay_score = raw_decay. At boost_cap accesses, access_boost = 1.0 and decay_score = 1.0 — the memory never decays regardless of age.

The DecayWorker class manages the scheduled execution:

class DecayWorker:
    """Scheduled in-process asyncio task that updates decay_score on active memories."""

    def __init__(self) -> None:
        self._lambda = float(os.environ.get("RECALL_DECAY_LAMBDA", 0.02))
        self._boost_cap = float(os.environ.get("RECALL_DECAY_BOOST_CAP", 10))
        self._interval = int(os.environ.get("RECALL_DECAY_JOB_INTERVAL", 3600))

    async def run_once(self) -> int:
        """Apply decay to all active memories. Returns count of memories updated."""
        # Queries all memories WHERE valid_until IS NULL
        # Computes raw_decay + access_boost for each
        # Updates decay_score column
        # Returns count of memories updated

Default interval: 3600 seconds (every hour). The decay_score is then used in hybrid search as a multiplier on the importance component:

effective_importance = (m.get("importance") or 0.5) * (decay if decay is not None else 1.0)

A memory with importance: 0.9 and decay_score: 0.3 has effective importance 0.27. It does not disappear from retrieval — it ranks lower than a fresher memory with the same nominal importance. Decayed memories are deprioritized, not deleted.

Tuning note: the default lambda = 0.02 (35-day half-life) is calibrated for weekly engagement. For high-frequency daily sessions, consider a lower λ (longer half-life). For rare monthly sessions, consider a higher λ.

Consolidation

After enough time, similar memories accumulate. Not contradictions — genuinely related facts that arrived at different times through different conversations.

A user working with an agent for six months might have:

“User finds long meetings exhausting.”
“User prefers async communication over synchronous calls.”
“User mentioned preferring written summaries over verbal updates.”
“User says they’re most productive in deep work blocks without interruptions.”

Four memories. One underlying preference: minimize synchronous communication load. Each arrived separately. Each is stored correctly. Together they create retrieval noise — and they compete for context window space when the agent needs relevant memories.

Consolidation reduces N similar memories to 1 canonical memory. The originals are superseded (soft-deleted), and the canonical memory captures the synthesized insight.

Recall’s consolidate_memories tool runs this flow:

@mcp.tool()
async def consolidate_memories(
    topic: str,
    similarity_threshold: float = 0.85,
    dry_run: bool = False,
) -> dict:
    """Find semantically similar memories in a topic and merge them into canonical facts.
    Requires embeddings extra: pip install 'szl-recall[embeddings]'. Returns a diff."""

The full flow:

Fetch all active memories for the given topic where valid_until IS NULL
Embed all memory texts using the configured embedding model
Cluster using greedy cosine similarity at the threshold (default: 0.85). Groups with fewer than 2 members are skipped.
LLM merge each group with Claude Haiku: produce one canonical memory preserving all distinct information without over-generalizing
Persist: insert canonical memories, set valid_until on originals

The dry_run=True option returns a preview diff — the memories that would be merged and the proposed canonical text — without writing to the database. Always use this before the first production consolidation run.

When to run: on-demand after a long session that produced many memories on a single topic, or on a weekly schedule for active users. Consolidation requires embeddings and LLM calls; it should not run on every write.

The threshold matters. At 0.85, only near-identical phrasings cluster. At 0.70, conceptually related but differently-worded memories merge — appropriate for preferences, risky for facts where two similar-sounding facts might be genuinely distinct. Start at 0.85 and tune downward if redundant memories persist.

Failure Mode: The Contradiction Loop

The anchor story is exactly this failure: the user’s context genuinely alternates between two states (morning meetings for consulting, afternoon blocks for the main job). Each state is correct for its context. The extraction pipeline correctly identifies both. The contradiction detection correctly fires. The result is a memory that keeps toggling — high valid_until churn on the same entity + attribute.

This is not a bug in the contradiction detection logic. It is a signal that auto-resolution is insufficient for this entity+attribute pair.

How to detect contradiction loops:

-- Find entity+attribute pairs with high valid_until churn (potential contradiction loops)
SELECT 
    entity,
    attribute,
    COUNT(*) AS superseded_count,
    MAX(valid_until) AS last_superseded,
    MIN(valid_from) AS first_seen,
    GROUP_CONCAT(value, ' -> ') AS value_history
FROM memories
WHERE user_id = :user_id
  AND valid_until IS NOT NULL
  AND entity IS NOT NULL
  AND attribute IS NOT NULL
  AND valid_until >= datetime('now', '-30 days')
GROUP BY entity, attribute
HAVING superseded_count >= 3
ORDER BY superseded_count DESC;

A result of 3+ supersessions for the same entity + attribute in 30 days means the system is in a contradiction loop.

The fix is escalation rather than auto-resolution. When the loop is detected:

Stop auto-resolving for this entity + attribute pair
Surface the conflict to the user or a supervisor agent
Allow explicit input: “When you mean the consulting context, say X; otherwise assume Y”
Store the resolution as a meta-memory with higher importance and confidence

This is the input-required pattern: escalate rather than resolve autonomously when auto-resolution has demonstrably failed.

Decision Guide: What to Run When

The MANAGE phase is three operations. They run at different frequencies and on different triggers:

Contradiction detection — runs synchronously on every memory write. It is an indexed SQL lookup that adds under 5ms to the write path. There is no reason to batch or delay it.

Decay scoring — runs on a schedule. Default: every hour. It touches all active memories but is compute-light (pure Python math, no LLM calls). Increasing the interval only introduces staleness between runs, not correctness problems.

Consolidation — runs on demand or on a weekly schedule. It requires embeddings and LLM calls. It is the most expensive operation and should not run continuously.

Operation	Trigger	Cost	Notes
Contradiction detection	Every write (synchronous)	Very low — indexed SQL	Blocking; runs before INSERT
Decay scoring	Every hour (configurable)	Low — Python math only	RECALL_DECAY_JOB_INTERVAL=3600
Consolidation	Weekly or on-demand	High — embeddings + LLM	Use dry_run=True first

The MANAGE phase is not a continuous maintenance loop. It is three targeted operations, scheduled appropriately, running against a schema designed for them from the start. The full implementation in Recall — all three workers — runs in under 200 lines.

Production Checklist: MANAGE Phase Readiness

	Item	Score
	Schema has valid_until and superseded_by columns — without these, neither contradiction detection nor consolidation can work correctly. Both columns must be present before the first memory is written.
	Contradiction detection runs on every memory write — synchronously, before the new memory is inserted. Any delay allows the system to briefly hold two conflicting facts as active.
	Decay scoring is on a schedule — the DecayWorker or equivalent is running. Verify that decay_score values in your DB are non-NULL and vary across memories (not all 1.0).
	Decay formula includes access-count protection — time-based decay alone punishes memories from low-frequency users. Verify the formula lifts memories with high access_count toward decay_score = 1.0.
	At least one consolidation pass has run — if your system has been storing memories for more than 30 days, you likely have redundant clusters. Run consolidate_memories with dry_run=True to audit before committing.
	You have a contradiction loop audit query — SQL to find high valid_until churn (3+ supersessions in 30 days) on the same entity + attribute. Run it monthly or after any reported stale-memory incident.

0 of 6

Resources

MemoryBank: Enhancing Large Language Models with Long-Term Memory ↗

Zhong et al. 2023 — arXiv:2305.10250 (AAAI 2024)

The first LLM memory system to incorporate the Ebbinghaus forgetting curve as a memory update mechanism. Foundational reference for access-weighted decay: the system selectively retains and reinforces memories based on time elapsed and access frequency.

Generative Agents: Interactive Simulacra of Human Behavior ↗

Park, O'Brien, Cai, Morris, Liang, Bernstein — arXiv:2304.03442, 2023

Defines the reflection and summarization mechanism — the conceptual predecessor to consolidation. The Memory Stream architecture and importance-scored retrieval are the direct ancestors of Recall's scoring model.

Zep: A Temporal Knowledge Graph Architecture for Agent Memory ↗

Zep Research — arXiv:2501.13956, 2025

Production-scale bi-temporal memory tracking. Every graph edge carries validity intervals (t_valid, t_invalid). Motivated by real enterprise deployments where unmanaged memory stores accumulated contradictory facts within weeks.

A-MEM: Agentic Memory for LLM Agents ↗

arXiv:2502.12110, 2025

Zettelkasten-inspired memory consolidation. When a new memory is added, it can trigger updates to existing memories' contextual representations — memory evolution rather than just merge-on-write.

Letta: Sleep-Time Compute ↗

Letta (formerly MemGPT) — letta.com/blog/sleep-time-compute

Background consolidation via dream subagents that run between sessions. Practical architecture for between-session memory maintenance: consolidation, deduplication, archiving, and reflection as scheduled background processes.

Issue 4 covers the READ phase — retrieval that weights decayed memories appropriately and suppresses superseded ones.

Until next issue,

Sentient Zero Labs