Agent Design Fieldbook Issue 4/8

The Intent Layer: Classify Before You Act

The first LLM call should classify what the user wants (search, compare, select), not execute it -- then route to specialized handlers that do one job well.

Apr 13, 2026 · 12 min read · Sentient Zero Labs

In this issue (13 sections)

In Issues 2-3, we built clean data infrastructure. Now we add intelligence: understanding what users actually want.

Here’s what we learned the hard way: we had one giant prompt doing everything — search, compare, explain, select. It was 2,000 tokens long. And it broke constantly. “Find laptops and compare the top two” would return search results and completely ignore the comparison.

In this issue:

Why monolithic prompts fail (our 2,000-token disaster)
The intent taxonomy (search, refine, compare, deep-dive, select)
Multi-intent detection (“Find X and compare Y”)
Entity extraction (resolving “the first one” to a product)

ℹ

Classification is cheap. Execution is expensive. Separate them.

History Anchor: Symbolic AI -> Instruction Tuning / RLHF

Intent classification has deep roots. Early chatbots used regex pattern matching (ELIZA (1966)). Spoken dialog systems used statistical slot-filling (VoiceXML, Nuance). Rasa NLU brought ML-based intent classification. LLMs collapsed all of this into a single prompt — but the fundamental insight remains: classify first, execute second.

The Disasters

Disaster 1: The Monolithic Prompt

WE HAD ONE PROMPT DOING EVERYTHING:

THE 2,000-TOKEN MONSTER PROMPT
  You are a shopping assistant. You can:
  1. Search for products based on criteria...
  2. Compare products by analyzing...
  3. Explain specifications like...
  4. Add products to cart when user says...
  5. Refine searches by...
  ... [1,800 more tokens of instructions]

USER: "Find laptops and compare the top two"
  |
  v
BOT: Returns 5 laptops, ignores "compare" part

The prompt was so long that the LLM focused on the first instruction it could execute and ignored the rest.

Why it happened: Too many instructions. No prioritization. The LLM picked one task and called it done.

Disaster 2: The Multi-Intent Blindness

USER: "Show me the Dell XPS and add the HP to cart"
  |
  v
TWO INTENTS:
  1. DEEP_DIVE ("Show me the Dell XPS")
  2. SELECT ("add the HP to cart")
  |
  v
BOT: "I can help you with that!" (does neither)

The user wanted two things. Our system could only handle one intent at a time — so it handled neither.

Why it happened: No multi-intent detection. The system assumed one message = one intent.

Disaster 3: The Context Blindness

Turn 1:
  USER: "Find laptops under $1000"
  BOT: "Here are 5 options:
        1. Dell XPS 15 - $899
        2. HP Spectre x360 - $849
        3. Lenovo ThinkPad - $799"

Turn 2:
  USER: "Tell me about the first one"
  BOT: "Which product are you referring to?"

The bot listed products. The user said “the first one.” The bot had no idea what that meant.

Why it happened: No entity extraction. “The first one” is a reference that requires context to resolve.

The Pattern

One prompt can't handle complexity. Classify first, route to specialized handlers, execute cleanly.

The Diagnosis: Why Monolithic Prompts Fail

The Monolithic Prompt Problem

ONE PROMPT DOING EVERYTHING (What We Had):

User Message -> Giant Prompt (2,000 tokens) -> Response

Problems:
  - Prompt is huge (expensive, slow)
  - LLM gets confused with too many instructions
  - Can't handle multi-intent ("Find X and compare Y")
  - No routing (all logic in one place)
  - Hard to debug (which instruction failed?)

The Classification Solution

CLASSIFY -> ROUTE -> EXECUTE (What Works):

User Message
  |
  v
STEP 1: CLASSIFY (Cheap, 200 tokens)
  "What does the user want?"
  -> Intent: NEW_SEARCH
  |
  v
STEP 2: ROUTE (Deterministic, 0 tokens)
  NEW_SEARCH -> SearchHandler
  |
  v
STEP 3: EXECUTE (Specialized handler)
  SearchHandler extracts filters, queries DB

Benefits:
  - Classification is cheap (200 tokens)
  - Each handler does ONE job
  - Can detect multiple intents
  - Easy to debug (know which handler failed)

💡 The Insight

Classification is cheap (~200 tokens). Execution is expensive. Separate them.

This is the Routing Pattern (or Gateway Pattern) in agentic design. You’ll see similar ideas in “Mixture of Experts” architectures — a small router decides which specialized model handles each request.

Alternative: Semantic Routing. For simple intents like “Greeting” or “FAQ,” you can use embedding similarity instead of an LLM call (0 tokens, <50ms). We chose LLM routing because e-commerce intents are complex and overlapping — but for simpler domains, Semantic Routing is faster and cheaper.

The Intent Taxonomy

What Intents Exist

Before you can classify, you need to enumerate what users can do in your domain.

Search Intents (Finding Products)

NEW_SEARCH: “Find laptops under $1000”
REFINE_SEARCH: “Actually, make it under $800”
ALTERNATIVE_SEARCH: “Show me similar options”

Analysis Intents (Understanding Products)

DEEP_DIVE: “Tell me more about the Dell XPS”
COMPARE: “Compare the Dell vs HP”
EXPLAIN_SPEC: “What does ‘Rds(on)’ mean?”

Action Intents (Making Decisions)

SELECT: “Add the Dell to my list”
DESELECT: “Remove the HP”
EXPORT: “Export my selections to CSV”

Meta Intents (Conversation Management)

SUMMARIZE: “What have I selected so far?”
GENERAL: “What’s the best laptop brand?”
GREETING: “Hello!”

💡 Key Insight

The taxonomy is domain-specific. An e-commerce agent has different intents than a customer support agent. Enumerate all of yours before you build.

Intent Decision Tree

USER MESSAGE
  |
  v
Does it mention specific products? (Dell, HP, "first one")
  YES -> DEEP_DIVE or COMPARE or SELECT
  NO  -> Continue
  |
  v
Does it have search criteria? (price, specs, features)
  YES -> NEW_SEARCH or REFINE_SEARCH
  NO  -> Continue
  |
  v
Is it asking a question? ("What is...", "How does...")
  YES -> EXPLAIN_SPEC or GENERAL
  NO  -> Continue
  |
  v
Is it a greeting or meta request?
  YES -> GREETING or SUMMARIZE
  NO  -> UNCLEAR (ask for clarification)

The Classification

The Classification Prompt (Lean and Focused)

INTENT_PROMPT = """
Classify the user's intent into ONE or MORE of these categories:

SEARCH INTENTS:
- new_search: User wants to find new products
- refine_search: User wants to modify current search
- alternative_search: User wants similar options

ANALYSIS INTENTS:
- deep_dive: User wants details about a specific product
- compare: User wants to compare products
- explain_spec: User is asking about a specification

ACTION INTENTS:
- select: User is choosing/adding a product
- deselect: User is removing a product

META INTENTS:
- summarize: User wants to see their selections
- general: General question about products
- greeting: User is greeting

CONTEXT:
Products in session: {product_list}
User selections: {selected_list}

USER MESSAGE: "{message}"

Return JSON:
{
  "intents": ["intent1", "intent2"],
  "referenced_products": ["Dell XPS", "HP Spectre"],
  "confidence": 0.95,
  "reasoning": "User wants to..."
}
"""

Key design choices:

Low temperature (0.1): We want consistent classification, not creativity
Small output (~200 tokens): Just JSON, no explanation
Context injection: Products in session help resolve “first one”
Multi-intent support: Array of intents, not single value

Classification Flow

USER: "Find laptops and compare the top two"
  |
  v
LLM CLASSIFICATION (200 tokens):
{
  "intents": ["new_search", "compare"],
  "referenced_products": [],
  "confidence": 0.92,
  "reasoning": "User wants to search AND compare"
}
  |
  v
VALIDATION (Deterministic):
  - Check intents are valid
  - Check for conflicts (no conflicts)
  - Order by dependency: [new_search, compare]
  |
  v
ROUTE TO HANDLERS

The Implementation

from enum import Enum
from dataclasses import dataclass
from typing import List, Optional

class Intent(Enum):
    NEW_SEARCH = "new_search"
    REFINE_SEARCH = "refine_search"
    ALTERNATIVE_SEARCH = "alternative_search"
    DEEP_DIVE = "deep_dive"
    COMPARE = "compare"
    EXPLAIN_SPEC = "explain_spec"
    SELECT = "select"
    DESELECT = "deselect"
    EXPORT = "export"
    SUMMARIZE = "summarize"
    GENERAL = "general"
    GREETING = "greeting"

@dataclass
class ClassificationResult:
    intents: List[Intent]
    referenced_products: List[str]
    confidence: float
    reasoning: str

async def classify_intent(message: str, context) -> ClassificationResult:
    # Build prompt with context
    prompt = INTENT_PROMPT.format(
        message=message,
        product_list=context.fetched_products[:5],  # Limit context
        selected_list=context.selected_products
    )
    
    # Call LLM (small, fast model)
    response = await llm.generate(
        prompt=prompt,
        temperature=0.1,  # Low temperature for consistency
        max_tokens=200    # Small output
    )
    
    # Parse and validate
    result = json.loads(response)
    return ClassificationResult(
        intents=[Intent(i) for i in result["intents"]],
        referenced_products=result.get("referenced_products", []),
        confidence=result.get("confidence", 0.5),
        reasoning=result.get("reasoning", "")
    )

Multi-Intent Handling

The Multi-Intent Problem

USER: "Find cheap laptops and compare the top two"

CONTAINS TWO INTENTS:
  1. NEW_SEARCH ("Find cheap laptops")
  2. COMPARE ("compare the top two")

EXECUTION ORDER MATTERS:
  1. Execute NEW_SEARCH first -> get results
  2. Then execute COMPARE -> compare top 2 from results

Multi-Intent Flow

STEP 1: DETECT ALL INTENTS
  Intents: [new_search, compare]
  |
  v
STEP 2: ORDER BY DEPENDENCY
  new_search must run before compare (need results first)
  Ordered: [new_search, compare]
  |
  v
STEP 3: EXECUTE SEQUENTIALLY
  1. Execute new_search
     -> Returns 5 laptops
     -> Update context with results

  2. Execute compare
     -> Uses top 2 from updated context
     -> Returns comparison
  |
  v
STEP 4: COMBINE RESPONSES
  "Found 5 laptops under $800:
   1. Dell XPS - $799
   2. HP Spectre - $749
   ...

   Comparing top 2:
   Dell XPS vs HP Spectre..."

Intent Dependencies

Not all intents can run in any order. Define dependencies:

INTENT_DEPENDENCIES = {
    Intent.COMPARE: [Intent.NEW_SEARCH, Intent.REFINE_SEARCH],
    Intent.DEEP_DIVE: [Intent.NEW_SEARCH],
    Intent.SELECT: [Intent.NEW_SEARCH, Intent.DEEP_DIVE],
    Intent.REFINE_SEARCH: [Intent.NEW_SEARCH],
}

def order_intents(intents: List[Intent]) -> List[Intent]:
    """Order intents by dependency (search before compare, etc.)"""
    priority = {
        Intent.NEW_SEARCH: 1,
        Intent.REFINE_SEARCH: 2,
        Intent.DEEP_DIVE: 3,
        Intent.COMPARE: 4,
        Intent.SELECT: 5,
    }
    return sorted(intents, key=lambda i: priority.get(i, 10))

Entity Extraction

The Entity Problem

Turn 1:
  USER: "Find laptops under $1000"
  BOT: "Here are 5 options:
        1. Dell XPS 15 - $899
        2. HP Spectre x360 - $849
        3. Lenovo ThinkPad - $799"

Turn 2:
  USER: "Tell me about the first one"
         ^
    What is "the first one"?
    Need to resolve to "Dell XPS 15"

Entity Extraction Flow

USER: "Tell me about the first one"
  |
  v
CLASSIFICATION WITH CONTEXT:
  Context injected into prompt:
  Products in session:
    1. Dell XPS 15 (SKU: XPS-15-2024)
    2. HP Spectre x360 (SKU: SPECTRE-X360)
    3. Lenovo ThinkPad (SKU: THINKPAD-X1)

  LLM Output:
  {
    "intents": ["deep_dive"],
    "referenced_products": ["Dell XPS 15"],
    "reasoning": "'first one' refers to position 1"
  }
  |
  v
VALIDATION:
  - "Dell XPS 15" exists in context?  Yes
  - Resolve to SKU: "XPS-15-2024"
  |
  v
ROUTE TO DEEP_DIVE HANDLER with SKU: "XPS-15-2024"

Entity Types

Type	Examples	Resolution
Positional	"first one", "second option", "last one"	Map to list index
Named	"Dell XPS", "HP Spectre"	Fuzzy match against context
Pronoun	"it", "that one", "this laptop"	Use most recently mentioned

The Routing Pattern

Intent -> Handler Mapping

class ProductAgent:
    def __init__(self):
        # Map each intent to its handler
        self.handlers = {
            Intent.NEW_SEARCH: self._handle_search,
            Intent.REFINE_SEARCH: self._handle_refine,
            Intent.COMPARE: self._handle_compare,
            Intent.DEEP_DIVE: self._handle_deep_dive,
            Intent.SELECT: self._handle_select,
            Intent.DESELECT: self._handle_deselect,
            Intent.SUMMARIZE: self._handle_summarize,
            Intent.GENERAL: self._handle_question,
            Intent.GREETING: self._handle_greeting,
        }
    
    async def process(self, message: str, session_id: str):
        # 1. Load context
        context = self.session_store.get(session_id)
        
        # 2. Classify intent (LLM, ~200 tokens)
        result = await classify_intent(message, context)
        
        # 3. Order intents by dependency
        ordered = order_intents(result.intents)
        
        # 4. Execute each intent
        responses = []
        for intent in ordered:
            handler = self.handlers.get(intent)
            if handler:
                response = await handler(
                    message=message,
                    referenced=result.referenced_products,
                    context=context
                )
                responses.append(response)
                # Update context after each handler
                context = self.session_store.get(session_id)
        
        # 5. Combine responses
        return "\n\n".join(responses)

Why this works:

Each handler does ONE job
Easy to test (test each handler separately)
Easy to debug (know which handler failed)
Easy to extend (add new intent = add new handler)

The Proof: Before/After

Before	After
Prompt size: 2,000 tokens	Prompt size: 200 tokens (classification only)
Multi-intent success: 15%	Multi-intent success: 92%
Entity resolution: 0% ("first one" failed)	Entity resolution: 87% ("first one" works)
Cost per request: $0.04	Cost per request: $0.01 (4x cheaper)
Debugging time: Hours (which instruction failed?)	Debugging time: Minutes (know which handler failed)

What changed: We separated classification from execution. Small, focused prompts. Deterministic routing.

Resources

ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022) — The paper that formalized the Thought-Act-Observe loop. The classify-route-execute pattern is a constrained, production-hardened variant of ReAct’s action selection step.
ELIZA: A Computer Program for the Study of Natural Language Communication (Weizenbaum, 1966) — The first chatbot. Reading the original 9-page paper provides essential historical context for the history anchor in this issue.
InstructGPT: Training Language Models to Follow Instructions (Ouyang et al., 2022) — The RLHF paper that enabled consistent instruction following. Without this, the intent classification prompt would not reliably produce structured JSON output.
semantic-router: Fast Decision-Making for AI Agents — Open-source library for the Semantic Routing alternative mentioned in this issue. Zero LLM calls for simple intent classes.
LangChain Agent Documentation — Reference implementation of the router/tool pattern; compare against the custom routing approach built in this issue.

The Checklist: Intent Layer Readiness

	Item	Score
	Have you enumerated all user intents?	Taxonomy
	Is classification prompt < 500 tokens?	Prompt
	Do you detect multiple intents?	Multi-Intent
	Do you extract referenced products?	Entities
	Is intent -> handler mapping deterministic?	Routing
	Do you handle unclear intents?	Fallback
	Do you inject session context for resolution?	Context
	Do you validate extracted intents?	Validation

0 of 8

Score Interpretation:

0-3: Intent layer not ready
4-6: Prototype-ready
7-8: Production-ready

What’s Next

Issue 5: The Filter Extraction Layer

Now that you can classify intent as “search,” how do you execute it?

“User says ‘cheap laptops with good battery.’ What is ‘cheap’? What is ‘good’? How do you convert vague terms to specific database queries?”

What You’ll Learn:

Filter extraction with field-aware prompts
Validation and clamping (reject unknown fields, clamp to valid ranges)
Constraint relaxation (what if 0 results?)

Key Takeaways

1 The Problem: One prompt doing everything = chaos.
2 The Solution: Classify first (cheap, 200 tokens) -> Route to specialized handlers (deterministic) -> Execute with focused prompts (expensive only where needed).
3 Key Takeaway: Intent classification is the router of your agent. Get it right, and everything else becomes simpler.

Glossary

Intent: What the user wants to do (search, compare, select)
Classification: Detecting which intent(s) apply to a message
Routing Pattern: Architecture where a small classifier routes to specialized handlers (also called Gateway Pattern)
Semantic Routing: Using embedding similarity (not LLM) to route simple intents — faster but less flexible
Multi-Intent: One message with multiple goals (“Find X and compare Y”)
Entity Extraction: Identifying what they’re referring to (“the first one” -> product SKU)
Routing: Mapping intent to specialized handler function
Handler: Function that executes one specific intent
Monolithic Prompt: One giant prompt trying to do everything (don’t do this)

Until next issue,

Sentient Zero Labs