Agent Design Fieldbook
An 8-layer architecture for production AI agents — from data layer to RAG. Real code, real failures, battle-tested patterns for engineers and technical founders.
By the end of this series you'll be able to...
- Diagnose why your AI agent fails in production — and use the 5-gap framework to identify exactly which layer is missing. Issue 1
- Design a database schema that LLMs can reason about — including a Field Registry with aliases, units, ranges, and vague-term mappings that eliminates hallucinated field names. Issue 2
- Build a 4-step ingestion pipeline (Fetch → Transform → Validate → Upsert) that normalizes multi-vendor chaos into a clean schema, with raw data preserved for debugging. Issue 3
- Implement intent classification and routing — a Classify-Route-Execute pattern that handles multi-intent messages and resolves entity references. Issue 4
- Convert natural language to validated database queries — using field-aware extraction prompts, range clamping, and constraint relaxation that handles zero-result searches without dead-ending the user. Issue 5
- Build persistent agent memory — a 5-table session schema that survives page refreshes, resolves pronouns and positional references, and tracks token usage to prevent runaway API costs. Issue 6
- Define what "best" means for every field — a Sortable Field Registry with preferred direction per metric, JSONB SQL expression generation, and LLM-inferred sort extraction. Issue 7
- Decide when RAG is actually needed and implement a hybrid architecture that serves structured specs from a database and unstructured knowledge from a vector store. Issue 8
- Track and optimize LLM costs at three levels (step, turn, session) with per-step token records that double as a fine-tuning dataset. Issue 6
- Assess any existing agent's production readiness using eight layer-specific checklists, each scored so you know whether to stop, prototype with caution, or ship. All issues
The Foundation: Why Most AI Agents Fail in Production
Most AI agents are wrappers that crash in production; production agents are systems with 8 deterministic layers where the LLM is constrained, not trusted.
The Data Layer: Your Agent Is Only As Good As Your Data
LLMs will hallucinate field names and values unless you explicitly define what's real through schemas, field registries, and validation boundaries.
The Ingestion Layer: Getting Messy Data Into Your Clean Schema
Vendor data is chaos -- different units, formats, nulls, duplicates; ingestion is 80% of the work, and scripts beat pipelines for flexibility.
The Intent Layer: Classify Before You Act
The first LLM call should classify what the user wants (search, compare, select), not execute it -- then route to specialized handlers that do one job well.
The Filter Extraction Layer: From Natural Language to Query
LLMs are excellent at extracting filters from natural language, but terrible at enforcing boundaries -- inject your field registry, validate everything, and relax constraints when needed.
The Memory Layer: Conversations That Persist
Agents need memory -- not just chat history, but structured context (what was fetched, what was selected, what tokens were used) persisted to database for multi-turn conversations.
The Sort & Rank Layer: Ordering Results Intelligently
Sorting isn't just ORDER BY -- define what 'best' means for each field, handle JSONB with SQL expressions, and let the LLM infer user intent from keywords like 'cheapest' or 'best'.
The Product Deep-Dive Layer: When Users Want More Than Specs
When users want to go deep on a product -- reading reviews, understanding thermal performance, checking compatibility -- RAG lets you search unstructured knowledge that doesn't fit in structured columns.