← Technical Series
Agent Design Fieldbook Issue 6/8

The Memory Layer: Conversations That Persist

Agents need memory -- not just chat history, but structured context (what was fetched, what was selected, what tokens were used) persisted to database for multi-turn conversations.

Apr 13, 2026 · 15 min read · Sentient Zero Labs
In this issue (11 sections)

In Issue 5, we built filter extraction. The agent can now search. But here’s what we discovered: it forgets everything between turns.

user Find laptops under $1000
bot Here are 5 options: Dell XPS, HP Spectre, Lenovo...
user Tell me about the first one
bot Which product are you referring to?

The bot listed products 30 seconds ago. Now it has no idea what “the first one” means.

In this issue:

  • Why stateless agents fail at multi-turn conversations
  • Session context design (what to remember)
  • Database persistence (5-table schema)
  • Token tracking for cost control
Memory is what separates a chatbot from an agent.

History Anchor: RNN / LSTM / GRU -> Modern Agent Architectures

Sequence memory has evolved dramatically. RNNs and LSTMs (1990s-2010s) were the first architectures that could remember context across time steps — but they forgot long-range dependencies. Transformers solved this with self-attention, holding entire sequences in memory. For agents, we still need explicit session state because the context window is finite and expensive. The session context pattern is the engineering solution to a fundamental architectural limit.


The Disasters

Disaster 1: The “Which One?” Problem

Turn 1:
USER: "Find laptops under $1000"
BOT: "Here are 5 options:
      1. Dell XPS 15 - $899
      2. HP Spectre x360 - $849
      3. Lenovo ThinkPad - $799
      4. Asus ZenBook - $749
      5. Acer Swift - $699"

Turn 2:
USER: "Tell me about the first one"
BOT: "Which product are you referring to?" ❌

The bot just listed 5 products. The user said “the first one.” This should be trivial.

Why it happened
No memory of what was fetched. Each turn was processed in isolation.

Disaster 2: The Token Explosion

MONTH 1: $50 in LLM costs
MONTH 2: $500 in LLM costs (10x growth!)
MONTH 3: $5,000 in LLM costs (100x growth!) ❌

INVESTIGATION:
• No visibility into token usage per session
• No tracking of which pipeline steps cost most
• No way to identify optimization opportunities
• One runaway session used $2,100 in a single night

PROBLEM: No token tracking = no cost control

We had no idea where the money was going. We couldn’t optimize what we couldn’t measure.

Why it happened
No per-step, per-turn, or per-session token tracking. We were flying blind.

Disaster 3: The Lost Cart

USER SESSION:
1. Searches for products
2. Compares options
3. Adds 3 items to selections
4. Refreshes page
5. Selections are empty ❌

USER: "Where did my selections go?!"

The user spent 10 minutes building a list. One page refresh wiped it out.

Why it happened
Selections stored only in memory, not persisted to database.

The Pattern

Memory isn't optional. It's the difference between a demo and a product.

The Diagnosis: Why Stateless Fails

Stateless vs. Stateful

┌──────────────────────────────────────────────────────────┐
│  STATELESS (What Fails)                                  │
├──────────────────────────────────────────────────────────┤
│  Each message is processed independently:                │
│                                                          │
│  Turn 1: "Find laptops" → Returns 5 results              │
│          (results discarded after response)              │
│                                                          │
│  Turn 2: "Tell me about the first one"                   │
│          → No context, can't resolve ❌                   │
│                                                          │
│  Problems:                                               │
│  • Can't resolve "the first one", "that laptop", "it"    │
│  • Can't track user selections                           │
│  • Can't refine searches ("make it cheaper")             │
│  • Can't track costs                                     │
└──────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────┐
│  STATEFUL (What Works)                                   │
├──────────────────────────────────────────────────────────┤
│  Session maintains context across turns:                 │
│                                                          │
│  Turn 1: "Find laptops" → Returns 5 results              │
│          (results stored in session context)             │
│                                                          │
│  Turn 2: "Tell me about the first one"                   │
│          → Context lookup: "first one" = Dell XPS 15 ✓   │
│                                                          │
│  Benefits:                                               │
│  • Resolves references ("first one" → Dell XPS 15)       │
│  • Tracks selections (survives refresh)                  │
│  • Enables refinement ("make it cheaper")                │
│  • Tracks token usage for optimization                   │
└──────────────────────────────────────────────────────────┘

Session Context Structure

What to Remember

Not everything needs to be remembered. Here’s what matters:

┌─────────────────────────────────────────────────────────┐
│  SESSION CONTEXT: What the Agent Remembers              │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  1. FETCHED PRODUCTS                                    │
│     • Products returned from searches                   │
│     • Stored as: MPN → Product mapping                  │
│     • Used for: Resolving "the first one", comparisons  │
│     • Keep: Last N products (e.g., 20)                  │
│                                                         │
│  2. FOCUS                                               │
│     • Currently discussed product                       │
│     • Updated when user says "this one", "the Dell"     │
│     • Used for: Pronoun resolution ("it", "this")       │
│                                                         │
│  3. SELECTIONS                                          │
│     • Products user has added to cart/BOM               │
│     • Stored as: Ordered list of MPNs                   │
│     • Used for: Cart management, export, summary        │
│                                                         │
│  4. SEARCH HISTORY                                      │
│     • Last query and filters                            │
│     • Used for: Refinement ("make it cheaper")          │
│                                                         │
│  5. TOKEN USAGE                                         │
│     • Aggregated input/output tokens                    │
│     • Used for: Cost tracking, limits, optimization     │
│                                                         │
└─────────────────────────────────────────────────────────┘

Session Context Implementation

from dataclasses import dataclass, field
from typing import Dict, List, Optional
from datetime import datetime

@dataclass
class Product:
    mpn: str
    name: str
    price: float
    specs: dict
    position: int  # Position in last search results

@dataclass
class SessionContext:
    session_id: str
    
    # Products fetched in this session (MPN → Product)
    fetched: Dict[str, Product] = field(default_factory=dict)
    
    # Ordered list of MPNs from last search (for position resolution)
    last_results: List[str] = field(default_factory=list)
    
    # Current focus (last discussed product)
    focus_mpn: Optional[str] = None
    
    # User's selections (cart/BOM)
    selected: List[str] = field(default_factory=list)
    
    # Search history for refinement
    last_query: Optional[str] = None
    last_filters: Optional[Dict] = None
    
    # Token tracking
    total_input_tokens: int = 0
    total_output_tokens: int = 0
    total_turns: int = 0
    
    def add_results(self, products: List[Product]):
        """Add search results to context."""
        self.last_results = []
        for i, p in enumerate(products):
            p.position = i + 1
            self.fetched[p.mpn] = p
            self.last_results.append(p.mpn)
    
    def resolve_reference(self, ref: str) -> Optional[str]:
        """Resolve 'first one', 'the Dell', 'it' to MPN."""
        ref_lower = ref.lower()
        
        # Positional references
        positions = {
            "first": 0, "first one": 0, "1st": 0,
            "second": 1, "second one": 1, "2nd": 1,
            "third": 2, "third one": 2, "3rd": 2,
            "last": -1, "last one": -1
        }
        if ref_lower in positions:
            idx = positions[ref_lower]
            if self.last_results:
                return self.last_results[idx]
        
        # Pronoun references (use focus)
        if ref_lower in ["it", "this", "that", "this one", "that one"]:
            return self.focus_mpn
        
        # Named references (fuzzy match)
        for mpn, product in self.fetched.items():
            if ref_lower in product.name.lower():
                return mpn
        
        return None
    
    def select(self, mpn: str) -> bool:
        """Add product to selections."""
        if mpn in self.fetched and mpn not in self.selected:
            self.selected.append(mpn)
            return True
        return False
💡 Key Insight
The session context is lightweight by design. It stores references (MPNs), not full product data. Full data lives in the database. This keeps the context small and fast.

Database Persistence

The 5-Table Schema

Memory needs to survive server restarts and page refreshes. Persist to database.

┌─────────────────────────────────────────────────────────┐
│  TABLE 1: chat_session                                  │
│  Session metadata with aggregated metrics               │
├─────────────────────────────────────────────────────────┤
│  session_id         UUID PRIMARY KEY                    │
│  focus_mpn          TEXT (current product)              │
│  total_input_tokens INTEGER                             │
│  total_output_tokens INTEGER                            │
│  total_turns        INTEGER                             │
│  created_at         TIMESTAMP                           │
│  last_active_at     TIMESTAMP                           │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│  TABLE 2: chat_turn                                     │
│  User message + assistant response pairs                │
├─────────────────────────────────────────────────────────┤
│  turn_id            BIGSERIAL PRIMARY KEY               │
│  session_fk         UUID REFERENCES chat_session        │
│  turn_number        INTEGER                             │
│  user_message       TEXT                                │
│  assistant_response TEXT                                │
│  intent_data        JSONB (intents detected)            │
│  input_tokens       INTEGER (turn total)                │
│  output_tokens      INTEGER (turn total)                │
│  duration_ms        INTEGER                             │
│  created_at         TIMESTAMP                           │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│  TABLE 3: turn_step                                     │
│  Individual pipeline steps with token tracking          │
├─────────────────────────────────────────────────────────┤
│  step_id            BIGSERIAL PRIMARY KEY               │
│  turn_fk            BIGINT REFERENCES chat_turn         │
│  step_type          TEXT (intent, filter, search, etc)  │
│  step_order         INTEGER (1, 2, 3...)                │
│  model              TEXT (gemini-1.5-flash, etc)        │
│  input_tokens       INTEGER                             │
│  output_tokens      INTEGER                             │
│  duration_ms        INTEGER                             │
│  success            BOOLEAN                             │
│  error              TEXT                                │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│  TABLE 4: session_component                             │
│  Cached fetched products                                │
├─────────────────────────────────────────────────────────┤
│  id                 BIGSERIAL PRIMARY KEY               │
│  session_fk         UUID REFERENCES chat_session        │
│  mpn                TEXT                                │
│  component_type     TEXT (laptop, phone, etc)           │
│  name               TEXT                                │
│  summary            TEXT (short description)            │
│  full_data          JSONB (complete product data)       │
│  position           INTEGER (order in results)          │
│  fetched_at         TIMESTAMP                           │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│  TABLE 5: session_selection                             │
│  User cart/BOM selections                               │
├─────────────────────────────────────────────────────────┤
│  id                 BIGSERIAL PRIMARY KEY               │
│  session_fk         UUID REFERENCES chat_session        │
│  mpn                TEXT                                │
│  position           INTEGER (order in cart)             │
│  selected_at        TIMESTAMP                           │
└─────────────────────────────────────────────────────────┘

Why 5 Tables?

TablePurposeQuery Pattern
chat_sessionSession metadataLoad on session start
chat_turnConversation historyDisplay chat history
turn_stepToken analyticsCost optimization
session_componentFetched productsResolve references
session_selectionUser cartPersist selections
Start with Postgres. When you hit ~10k concurrent users, put a Redis cache in front of chat_session for sub-millisecond reads. The relational implementation is easier to debug, query, and join with business data (products, orders). Optimize for observability first, then speed.
💡 Key Insight
Your turn_step table is a goldmine. Every step’s input/output tokens, latency, and success/failure is exactly the data you need to fine-tune a smaller model later (e.g., train a 3B model to replace GPT-4 for intent classification once you have 100k+ labeled examples).

Token Tracking

The $2,100 Night

We had a bug. A retry loop ran all night. One session consumed 42 million tokens. We woke up to a $2,100 bill.

This is why you track tokens.

Three-Level Aggregation

┌─────────────────────────────────────────────────────────┐
│  LEVEL 1: PER-STEP TRACKING                             │
│  Track each LLM call individually                       │
├─────────────────────────────────────────────────────────┤
│  Step 1: Intent Classification                          │
│  • Model: gemini-1.5-flash                              │
│  • Input: 150 tokens                                    │
│  • Output: 20 tokens                                    │
│  • Duration: 180ms                                      │
│                                                         │
│  Step 2: Filter Extraction                              │
│  • Model: gemini-1.5-flash                              │
│  • Input: 300 tokens                                    │
│  • Output: 50 tokens                                    │
│  • Duration: 220ms                                      │
│                                                         │
│  Step 3: Response Generation                            │
│  • Model: gemini-1.5-pro                                │
│  • Input: 800 tokens                                    │
│  • Output: 200 tokens                                   │
│  • Duration: 450ms                                      │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│  LEVEL 2: PER-TURN AGGREGATION                          │
│  Sum all steps in one turn                              │
├─────────────────────────────────────────────────────────┤
│  Turn Total:                                            │
│  • Input tokens: 1,250                                  │
│  • Output tokens: 270                                   │
│  • LLM calls: 3                                         │
│  • Duration: 850ms                                      │
│  • Estimated cost: $0.0008                              │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│  LEVEL 3: PER-SESSION AGGREGATION                       │
│  Sum all turns in session                               │
├─────────────────────────────────────────────────────────┤
│  Session Total (8 turns):                               │
│  • Input tokens: 10,000                                 │
│  • Output tokens: 2,160                                 │
│  • LLM calls: 24                                        │
│  • Duration: 6,800ms                                    │
│  • Estimated cost: $0.0064                              │
└─────────────────────────────────────────────────────────┘

Why Track at Each Level?

LevelWhat You LearnAction
Per-StepWhich operations are expensive?Optimize prompts, use smaller models
Per-TurnCost per user interactionSet per-turn limits, detect anomalies
Per-SessionTotal conversation costSet session limits, identify power users

Token Tracking Implementation

from dataclasses import dataclass
from typing import List, Optional
from contextlib import asynccontextmanager

@dataclass
class StepRecord:
    step_type: str
    model: str
    input_tokens: int
    output_tokens: int
    duration_ms: int
    success: bool
    error: Optional[str] = None

@dataclass 
class TurnRecord:
    turn_number: int
    user_message: str
    assistant_response: str
    steps: List[StepRecord]
    
    @property
    def total_input_tokens(self) -> int:
        return sum(s.input_tokens for s in self.steps)
    
    @property
    def total_output_tokens(self) -> int:
        return sum(s.output_tokens for s in self.steps)

class StepTracker:
    """Track token usage across pipeline steps."""
    
    def __init__(self, session_id: str):
        self.session_id = session_id
        self.current_turn: Optional[TurnRecord] = None
        self.steps: List[StepRecord] = []
    
    @asynccontextmanager
    async def track_step(self, step_type: str, model: str):
        """Context manager to track a single step."""
        import time
        start = time.time()
        step = StepRecord(
            step_type=step_type,
            model=model,
            input_tokens=0,
            output_tokens=0,
            duration_ms=0,
            success=True
        )
        
        try:
            yield step  # Caller updates input/output tokens
        except Exception as e:
            step.success = False
            step.error = str(e)
            raise
        finally:
            step.duration_ms = int((time.time() - start) * 1000)
            self.steps.append(step)
    
    async def save_turn(self, user_message: str, response: str, db):
        """Save completed turn to database."""
        # Save turn
        turn_id = await db.execute("""
            INSERT INTO chat_turn 
                (session_fk, user_message, assistant_response, 
                 input_tokens, output_tokens)
            VALUES ($1, $2, $3, $4, $5)
            RETURNING turn_id
        """, self.session_id, user_message, response,
            sum(s.input_tokens for s in self.steps),
            sum(s.output_tokens for s in self.steps))
        
        # Save steps
        for i, step in enumerate(self.steps):
            await db.execute("""
                INSERT INTO turn_step
                    (turn_fk, step_type, step_order, model,
                     input_tokens, output_tokens, duration_ms,
                     success, error)
                VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)
            """, turn_id, step.step_type, i + 1, step.model,
                step.input_tokens, step.output_tokens, 
                step.duration_ms, step.success, step.error)
        
        # Update session totals
        await db.execute("""
            UPDATE chat_session SET
                total_input_tokens = total_input_tokens + $2,
                total_output_tokens = total_output_tokens + $3,
                total_turns = total_turns + 1,
                last_active_at = now()
            WHERE session_id = $1
        """, self.session_id,
            sum(s.input_tokens for s in self.steps),
            sum(s.output_tokens for s in self.steps))
        
        self.steps = []  # Reset for next turn

Selection Tracking

The Selection Flow

USER: "Add the Dell XPS to my list"

┌─────────────────────────────────────────────────────────┐
│ STEP 1: RESOLVE PRODUCT                                 │
│ "Dell XPS" → Fuzzy match in context                     │
│ Found: MPN = "XPS-15-2024"                              │
│ Verify: MPN exists in fetched? ✓                        │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│ STEP 2: UPDATE SESSION CONTEXT                          │
│ context.selected.append("XPS-15-2024")                  │
│ context now has: ["XPS-15-2024"]                        │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│ STEP 3: PERSIST TO DATABASE                             │
│ INSERT INTO session_selection                           │
│   (session_fk, mpn, position, selected_at)              │
│ VALUES (session_id, "XPS-15-2024", 1, now())            │
└─────────────────────────────────────────────────────────┘

RESPONSE: "Added Dell XPS 15 to your selections. You have 1 item."

Selection Manager

class SelectionManager:
    def __init__(self, session_id: str, db):
        self.session_id = session_id
        self.db = db
    
    async def add(self, mpn: str) -> bool:
        """Add product to selections."""
        existing = await self.db.fetchval("""
            SELECT 1 FROM session_selection 
            WHERE session_fk = $1 AND mpn = $2
        """, self.session_id, mpn)
        
        if existing:
            return False  # Already in cart
        
        await self.db.execute("""
            INSERT INTO session_selection (session_fk, mpn, position)
            VALUES ($1, $2, (
                SELECT COALESCE(MAX(position), 0) + 1
                FROM session_selection
                WHERE session_fk = $1
            ))
        """, self.session_id, mpn)
        
        return True
    
    async def remove(self, mpn: str) -> bool:
        """Remove product from selections."""
        result = await self.db.execute("""
            DELETE FROM session_selection
            WHERE session_fk = $1 AND mpn = $2
        """, self.session_id, mpn)
        return result.rowcount > 0
    
    async def get_all(self) -> list:
        """Get all selections with product details."""
        return await self.db.fetch("""
            SELECT ss.mpn, ss.position, sc.name, sc.summary
            FROM session_selection ss
            JOIN session_component sc 
                ON ss.session_fk = sc.session_fk AND ss.mpn = sc.mpn
            WHERE ss.session_fk = $1
            ORDER BY ss.position
        """, self.session_id)
    
    async def get_summary(self) -> str:
        """Get cart summary for display."""
        items = await self.get_all()
        if not items:
            return "Your selections are empty."
        
        lines = [f"Your selections ({len(items)} items):"]
        for item in items:
            lines.append(f"  {item['position']}. {item['name']}")
        return "\n".join(lines)

The Proof: Before/After

Metrics That Changed

Before After
'First one' resolution: 0% (always fails) 'First one' resolution: 95%
Multi-turn success: 15% (context lost) Multi-turn success: 92%
Cart persistence: 0% (lost on refresh) Cart persistence: 100% (survives refresh)
Token visibility: None (flying blind) Token visibility: Complete (step, turn, session)
Cost optimization: Impossible Cost optimization: 40% reduction (found waste)
Runaway sessions: Undetected ($2,100 bill) Runaway detection: Automatic (token limits)

What changed: We stopped treating each turn as independent and started building persistent memory.


The Checklist: Memory Layer Readiness

Item Score
Do you track fetched products in session? Context
Do you track the currently discussed product? Focus
Can you resolve 'first one', 'the Dell', 'it'? Resolution
Do you persist user selections to database? Selections
Do you remember last query/filters for refinement? History
Are sessions persisted (survive refresh)? Persistence
Do you track tokens per-step, per-turn, per-session? Tokens
Can you query session data for insights? Analytics
0 of 8

Score Interpretation:

  • 0-3 checked: Memory layer not ready. Start with session context.
  • 4-6 checked: Prototype-ready. Add persistence and token tracking.
  • 7-8 checked: Production-ready.

What’s Next

Issue 7: The Sort & Rank Layer

Now that you can remember search results, how do you order them intelligently?

“User says ‘show me the cheapest.’ But what about ‘show me the best’? Best by what? Price? Rating? Value? And how do you sort when some fields are in JSONB?”

What You’ll Learn:

  • Sortable field definitions with preferred direction
  • JSONB SQL expressions for sorting
  • LLM-inferred sorting from vague terms (“best”, “top”)

Key Takeaways

  1. 1 The Problem: Stateless agents can't handle multi-turn conversations
  2. 2 The Solution: Session context (fetched products, focus, selections) + database persistence (5-table schema) + token tracking (per-step, per-turn, per-session)
  3. 3 The $2,100 Lesson: Track tokens at every level or risk runaway costs
  4. 4 Key Takeaway: Memory is the foundation of agent intelligence. Track everything, persist everything, measure everything.

Glossary

  • Session: A conversation with persistent state across turns
  • Context: What the agent remembers (fetched products, focus, selections)
  • Focus: The product currently being discussed
  • Turn: One user message + assistant response pair
  • Step: Individual pipeline operation (intent, filter, search, response)
  • Token Tracking: Counting input/output tokens for cost control
  • Reference Resolution: Converting “first one” or “the Dell” to actual product MPN
  • Redis Cache: In-memory store for sub-millisecond session reads at scale (>10k users)
  • Training Data: Step-level logs that can be used to fine-tune smaller models

Until next issue,

Sentient Zero Labs