In this issue (11 sections)
In Issue 5, we built filter extraction. The agent can now search. But here’s what we discovered: it forgets everything between turns.
The bot listed products 30 seconds ago. Now it has no idea what “the first one” means.
In this issue:
- Why stateless agents fail at multi-turn conversations
- Session context design (what to remember)
- Database persistence (5-table schema)
- Token tracking for cost control
Memory is what separates a chatbot from an agent.
History Anchor: RNN / LSTM / GRU -> Modern Agent Architectures
Sequence memory has evolved dramatically. RNNs and LSTMs (1990s-2010s) were the first architectures that could remember context across time steps — but they forgot long-range dependencies. Transformers solved this with self-attention, holding entire sequences in memory. For agents, we still need explicit session state because the context window is finite and expensive. The session context pattern is the engineering solution to a fundamental architectural limit.
The Disasters
Disaster 1: The “Which One?” Problem
Turn 1:
USER: "Find laptops under $1000"
BOT: "Here are 5 options:
1. Dell XPS 15 - $899
2. HP Spectre x360 - $849
3. Lenovo ThinkPad - $799
4. Asus ZenBook - $749
5. Acer Swift - $699"
Turn 2:
USER: "Tell me about the first one"
BOT: "Which product are you referring to?" ❌
The bot just listed 5 products. The user said “the first one.” This should be trivial.
Disaster 2: The Token Explosion
MONTH 1: $50 in LLM costs
MONTH 2: $500 in LLM costs (10x growth!)
MONTH 3: $5,000 in LLM costs (100x growth!) ❌
INVESTIGATION:
• No visibility into token usage per session
• No tracking of which pipeline steps cost most
• No way to identify optimization opportunities
• One runaway session used $2,100 in a single night
PROBLEM: No token tracking = no cost control
We had no idea where the money was going. We couldn’t optimize what we couldn’t measure.
Disaster 3: The Lost Cart
USER SESSION:
1. Searches for products
2. Compares options
3. Adds 3 items to selections
4. Refreshes page
5. Selections are empty ❌
USER: "Where did my selections go?!"
The user spent 10 minutes building a list. One page refresh wiped it out.
The Pattern
Memory isn't optional. It's the difference between a demo and a product.
The Diagnosis: Why Stateless Fails
Stateless vs. Stateful
┌──────────────────────────────────────────────────────────┐
│ STATELESS (What Fails) │
├──────────────────────────────────────────────────────────┤
│ Each message is processed independently: │
│ │
│ Turn 1: "Find laptops" → Returns 5 results │
│ (results discarded after response) │
│ │
│ Turn 2: "Tell me about the first one" │
│ → No context, can't resolve ❌ │
│ │
│ Problems: │
│ • Can't resolve "the first one", "that laptop", "it" │
│ • Can't track user selections │
│ • Can't refine searches ("make it cheaper") │
│ • Can't track costs │
└──────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────┐
│ STATEFUL (What Works) │
├──────────────────────────────────────────────────────────┤
│ Session maintains context across turns: │
│ │
│ Turn 1: "Find laptops" → Returns 5 results │
│ (results stored in session context) │
│ │
│ Turn 2: "Tell me about the first one" │
│ → Context lookup: "first one" = Dell XPS 15 ✓ │
│ │
│ Benefits: │
│ • Resolves references ("first one" → Dell XPS 15) │
│ • Tracks selections (survives refresh) │
│ • Enables refinement ("make it cheaper") │
│ • Tracks token usage for optimization │
└──────────────────────────────────────────────────────────┘
Session Context Structure
What to Remember
Not everything needs to be remembered. Here’s what matters:
┌─────────────────────────────────────────────────────────┐
│ SESSION CONTEXT: What the Agent Remembers │
├─────────────────────────────────────────────────────────┤
│ │
│ 1. FETCHED PRODUCTS │
│ • Products returned from searches │
│ • Stored as: MPN → Product mapping │
│ • Used for: Resolving "the first one", comparisons │
│ • Keep: Last N products (e.g., 20) │
│ │
│ 2. FOCUS │
│ • Currently discussed product │
│ • Updated when user says "this one", "the Dell" │
│ • Used for: Pronoun resolution ("it", "this") │
│ │
│ 3. SELECTIONS │
│ • Products user has added to cart/BOM │
│ • Stored as: Ordered list of MPNs │
│ • Used for: Cart management, export, summary │
│ │
│ 4. SEARCH HISTORY │
│ • Last query and filters │
│ • Used for: Refinement ("make it cheaper") │
│ │
│ 5. TOKEN USAGE │
│ • Aggregated input/output tokens │
│ • Used for: Cost tracking, limits, optimization │
│ │
└─────────────────────────────────────────────────────────┘
Session Context Implementation
from dataclasses import dataclass, field
from typing import Dict, List, Optional
from datetime import datetime
@dataclass
class Product:
mpn: str
name: str
price: float
specs: dict
position: int # Position in last search results
@dataclass
class SessionContext:
session_id: str
# Products fetched in this session (MPN → Product)
fetched: Dict[str, Product] = field(default_factory=dict)
# Ordered list of MPNs from last search (for position resolution)
last_results: List[str] = field(default_factory=list)
# Current focus (last discussed product)
focus_mpn: Optional[str] = None
# User's selections (cart/BOM)
selected: List[str] = field(default_factory=list)
# Search history for refinement
last_query: Optional[str] = None
last_filters: Optional[Dict] = None
# Token tracking
total_input_tokens: int = 0
total_output_tokens: int = 0
total_turns: int = 0
def add_results(self, products: List[Product]):
"""Add search results to context."""
self.last_results = []
for i, p in enumerate(products):
p.position = i + 1
self.fetched[p.mpn] = p
self.last_results.append(p.mpn)
def resolve_reference(self, ref: str) -> Optional[str]:
"""Resolve 'first one', 'the Dell', 'it' to MPN."""
ref_lower = ref.lower()
# Positional references
positions = {
"first": 0, "first one": 0, "1st": 0,
"second": 1, "second one": 1, "2nd": 1,
"third": 2, "third one": 2, "3rd": 2,
"last": -1, "last one": -1
}
if ref_lower in positions:
idx = positions[ref_lower]
if self.last_results:
return self.last_results[idx]
# Pronoun references (use focus)
if ref_lower in ["it", "this", "that", "this one", "that one"]:
return self.focus_mpn
# Named references (fuzzy match)
for mpn, product in self.fetched.items():
if ref_lower in product.name.lower():
return mpn
return None
def select(self, mpn: str) -> bool:
"""Add product to selections."""
if mpn in self.fetched and mpn not in self.selected:
self.selected.append(mpn)
return True
return False
Database Persistence
The 5-Table Schema
Memory needs to survive server restarts and page refreshes. Persist to database.
┌─────────────────────────────────────────────────────────┐
│ TABLE 1: chat_session │
│ Session metadata with aggregated metrics │
├─────────────────────────────────────────────────────────┤
│ session_id UUID PRIMARY KEY │
│ focus_mpn TEXT (current product) │
│ total_input_tokens INTEGER │
│ total_output_tokens INTEGER │
│ total_turns INTEGER │
│ created_at TIMESTAMP │
│ last_active_at TIMESTAMP │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ TABLE 2: chat_turn │
│ User message + assistant response pairs │
├─────────────────────────────────────────────────────────┤
│ turn_id BIGSERIAL PRIMARY KEY │
│ session_fk UUID REFERENCES chat_session │
│ turn_number INTEGER │
│ user_message TEXT │
│ assistant_response TEXT │
│ intent_data JSONB (intents detected) │
│ input_tokens INTEGER (turn total) │
│ output_tokens INTEGER (turn total) │
│ duration_ms INTEGER │
│ created_at TIMESTAMP │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ TABLE 3: turn_step │
│ Individual pipeline steps with token tracking │
├─────────────────────────────────────────────────────────┤
│ step_id BIGSERIAL PRIMARY KEY │
│ turn_fk BIGINT REFERENCES chat_turn │
│ step_type TEXT (intent, filter, search, etc) │
│ step_order INTEGER (1, 2, 3...) │
│ model TEXT (gemini-1.5-flash, etc) │
│ input_tokens INTEGER │
│ output_tokens INTEGER │
│ duration_ms INTEGER │
│ success BOOLEAN │
│ error TEXT │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ TABLE 4: session_component │
│ Cached fetched products │
├─────────────────────────────────────────────────────────┤
│ id BIGSERIAL PRIMARY KEY │
│ session_fk UUID REFERENCES chat_session │
│ mpn TEXT │
│ component_type TEXT (laptop, phone, etc) │
│ name TEXT │
│ summary TEXT (short description) │
│ full_data JSONB (complete product data) │
│ position INTEGER (order in results) │
│ fetched_at TIMESTAMP │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ TABLE 5: session_selection │
│ User cart/BOM selections │
├─────────────────────────────────────────────────────────┤
│ id BIGSERIAL PRIMARY KEY │
│ session_fk UUID REFERENCES chat_session │
│ mpn TEXT │
│ position INTEGER (order in cart) │
│ selected_at TIMESTAMP │
└─────────────────────────────────────────────────────────┘
Why 5 Tables?
| Table | Purpose | Query Pattern |
|---|---|---|
chat_session | Session metadata | Load on session start |
chat_turn | Conversation history | Display chat history |
turn_step | Token analytics | Cost optimization |
session_component | Fetched products | Resolve references |
session_selection | User cart | Persist selections |
Token Tracking
The $2,100 Night
We had a bug. A retry loop ran all night. One session consumed 42 million tokens. We woke up to a $2,100 bill.
This is why you track tokens.
Three-Level Aggregation
┌─────────────────────────────────────────────────────────┐
│ LEVEL 1: PER-STEP TRACKING │
│ Track each LLM call individually │
├─────────────────────────────────────────────────────────┤
│ Step 1: Intent Classification │
│ • Model: gemini-1.5-flash │
│ • Input: 150 tokens │
│ • Output: 20 tokens │
│ • Duration: 180ms │
│ │
│ Step 2: Filter Extraction │
│ • Model: gemini-1.5-flash │
│ • Input: 300 tokens │
│ • Output: 50 tokens │
│ • Duration: 220ms │
│ │
│ Step 3: Response Generation │
│ • Model: gemini-1.5-pro │
│ • Input: 800 tokens │
│ • Output: 200 tokens │
│ • Duration: 450ms │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ LEVEL 2: PER-TURN AGGREGATION │
│ Sum all steps in one turn │
├─────────────────────────────────────────────────────────┤
│ Turn Total: │
│ • Input tokens: 1,250 │
│ • Output tokens: 270 │
│ • LLM calls: 3 │
│ • Duration: 850ms │
│ • Estimated cost: $0.0008 │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ LEVEL 3: PER-SESSION AGGREGATION │
│ Sum all turns in session │
├─────────────────────────────────────────────────────────┤
│ Session Total (8 turns): │
│ • Input tokens: 10,000 │
│ • Output tokens: 2,160 │
│ • LLM calls: 24 │
│ • Duration: 6,800ms │
│ • Estimated cost: $0.0064 │
└─────────────────────────────────────────────────────────┘
Why Track at Each Level?
| Level | What You Learn | Action |
|---|---|---|
| Per-Step | Which operations are expensive? | Optimize prompts, use smaller models |
| Per-Turn | Cost per user interaction | Set per-turn limits, detect anomalies |
| Per-Session | Total conversation cost | Set session limits, identify power users |
Token Tracking Implementation
from dataclasses import dataclass
from typing import List, Optional
from contextlib import asynccontextmanager
@dataclass
class StepRecord:
step_type: str
model: str
input_tokens: int
output_tokens: int
duration_ms: int
success: bool
error: Optional[str] = None
@dataclass
class TurnRecord:
turn_number: int
user_message: str
assistant_response: str
steps: List[StepRecord]
@property
def total_input_tokens(self) -> int:
return sum(s.input_tokens for s in self.steps)
@property
def total_output_tokens(self) -> int:
return sum(s.output_tokens for s in self.steps)
class StepTracker:
"""Track token usage across pipeline steps."""
def __init__(self, session_id: str):
self.session_id = session_id
self.current_turn: Optional[TurnRecord] = None
self.steps: List[StepRecord] = []
@asynccontextmanager
async def track_step(self, step_type: str, model: str):
"""Context manager to track a single step."""
import time
start = time.time()
step = StepRecord(
step_type=step_type,
model=model,
input_tokens=0,
output_tokens=0,
duration_ms=0,
success=True
)
try:
yield step # Caller updates input/output tokens
except Exception as e:
step.success = False
step.error = str(e)
raise
finally:
step.duration_ms = int((time.time() - start) * 1000)
self.steps.append(step)
async def save_turn(self, user_message: str, response: str, db):
"""Save completed turn to database."""
# Save turn
turn_id = await db.execute("""
INSERT INTO chat_turn
(session_fk, user_message, assistant_response,
input_tokens, output_tokens)
VALUES ($1, $2, $3, $4, $5)
RETURNING turn_id
""", self.session_id, user_message, response,
sum(s.input_tokens for s in self.steps),
sum(s.output_tokens for s in self.steps))
# Save steps
for i, step in enumerate(self.steps):
await db.execute("""
INSERT INTO turn_step
(turn_fk, step_type, step_order, model,
input_tokens, output_tokens, duration_ms,
success, error)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)
""", turn_id, step.step_type, i + 1, step.model,
step.input_tokens, step.output_tokens,
step.duration_ms, step.success, step.error)
# Update session totals
await db.execute("""
UPDATE chat_session SET
total_input_tokens = total_input_tokens + $2,
total_output_tokens = total_output_tokens + $3,
total_turns = total_turns + 1,
last_active_at = now()
WHERE session_id = $1
""", self.session_id,
sum(s.input_tokens for s in self.steps),
sum(s.output_tokens for s in self.steps))
self.steps = [] # Reset for next turn
Selection Tracking
The Selection Flow
USER: "Add the Dell XPS to my list"
↓
┌─────────────────────────────────────────────────────────┐
│ STEP 1: RESOLVE PRODUCT │
│ "Dell XPS" → Fuzzy match in context │
│ Found: MPN = "XPS-15-2024" │
│ Verify: MPN exists in fetched? ✓ │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ STEP 2: UPDATE SESSION CONTEXT │
│ context.selected.append("XPS-15-2024") │
│ context now has: ["XPS-15-2024"] │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ STEP 3: PERSIST TO DATABASE │
│ INSERT INTO session_selection │
│ (session_fk, mpn, position, selected_at) │
│ VALUES (session_id, "XPS-15-2024", 1, now()) │
└─────────────────────────────────────────────────────────┘
↓
RESPONSE: "Added Dell XPS 15 to your selections. You have 1 item."
Selection Manager
class SelectionManager:
def __init__(self, session_id: str, db):
self.session_id = session_id
self.db = db
async def add(self, mpn: str) -> bool:
"""Add product to selections."""
existing = await self.db.fetchval("""
SELECT 1 FROM session_selection
WHERE session_fk = $1 AND mpn = $2
""", self.session_id, mpn)
if existing:
return False # Already in cart
await self.db.execute("""
INSERT INTO session_selection (session_fk, mpn, position)
VALUES ($1, $2, (
SELECT COALESCE(MAX(position), 0) + 1
FROM session_selection
WHERE session_fk = $1
))
""", self.session_id, mpn)
return True
async def remove(self, mpn: str) -> bool:
"""Remove product from selections."""
result = await self.db.execute("""
DELETE FROM session_selection
WHERE session_fk = $1 AND mpn = $2
""", self.session_id, mpn)
return result.rowcount > 0
async def get_all(self) -> list:
"""Get all selections with product details."""
return await self.db.fetch("""
SELECT ss.mpn, ss.position, sc.name, sc.summary
FROM session_selection ss
JOIN session_component sc
ON ss.session_fk = sc.session_fk AND ss.mpn = sc.mpn
WHERE ss.session_fk = $1
ORDER BY ss.position
""", self.session_id)
async def get_summary(self) -> str:
"""Get cart summary for display."""
items = await self.get_all()
if not items:
return "Your selections are empty."
lines = [f"Your selections ({len(items)} items):"]
for item in items:
lines.append(f" {item['position']}. {item['name']}")
return "\n".join(lines)
The Proof: Before/After
Metrics That Changed
| Before | After |
|---|---|
| 'First one' resolution: 0% (always fails) | 'First one' resolution: 95% |
| Multi-turn success: 15% (context lost) | Multi-turn success: 92% |
| Cart persistence: 0% (lost on refresh) | Cart persistence: 100% (survives refresh) |
| Token visibility: None (flying blind) | Token visibility: Complete (step, turn, session) |
| Cost optimization: Impossible | Cost optimization: 40% reduction (found waste) |
| Runaway sessions: Undetected ($2,100 bill) | Runaway detection: Automatic (token limits) |
What changed: We stopped treating each turn as independent and started building persistent memory.
The Checklist: Memory Layer Readiness
| Item | Score | |
|---|---|---|
| Do you track fetched products in session? | Context | |
| Do you track the currently discussed product? | Focus | |
| Can you resolve 'first one', 'the Dell', 'it'? | Resolution | |
| Do you persist user selections to database? | Selections | |
| Do you remember last query/filters for refinement? | History | |
| Are sessions persisted (survive refresh)? | Persistence | |
| Do you track tokens per-step, per-turn, per-session? | Tokens | |
| Can you query session data for insights? | Analytics |
Score Interpretation:
- 0-3 checked: Memory layer not ready. Start with session context.
- 4-6 checked: Prototype-ready. Add persistence and token tracking.
- 7-8 checked: Production-ready.
What’s Next
Issue 7: The Sort & Rank Layer
Now that you can remember search results, how do you order them intelligently?
“User says ‘show me the cheapest.’ But what about ‘show me the best’? Best by what? Price? Rating? Value? And how do you sort when some fields are in JSONB?”
What You’ll Learn:
- Sortable field definitions with preferred direction
- JSONB SQL expressions for sorting
- LLM-inferred sorting from vague terms (“best”, “top”)
Key Takeaways
- 1 The Problem: Stateless agents can't handle multi-turn conversations
- 2 The Solution: Session context (fetched products, focus, selections) + database persistence (5-table schema) + token tracking (per-step, per-turn, per-session)
- 3 The $2,100 Lesson: Track tokens at every level or risk runaway costs
- 4 Key Takeaway: Memory is the foundation of agent intelligence. Track everything, persist everything, measure everything.
Glossary
- Session: A conversation with persistent state across turns
- Context: What the agent remembers (fetched products, focus, selections)
- Focus: The product currently being discussed
- Turn: One user message + assistant response pair
- Step: Individual pipeline operation (intent, filter, search, response)
- Token Tracking: Counting input/output tokens for cost control
- Reference Resolution: Converting “first one” or “the Dell” to actual product MPN
- Redis Cache: In-memory store for sub-millisecond session reads at scale (>10k users)
- Training Data: Step-level logs that can be used to fine-tune smaller models
Until next issue,
Sentient Zero Labs