In this issue (12 sections)
In Issues 2-3, we built clean data infrastructure. Now we add intelligence: understanding what users actually want.
Here’s what we learned the hard way: we had one giant prompt doing everything — search, compare, explain, select. It was 2,000 tokens long. And it broke constantly. “Find laptops and compare the top two” would return search results and completely ignore the comparison.
In this issue:
- Why monolithic prompts fail (our 2,000-token disaster)
- The intent taxonomy (search, refine, compare, deep-dive, select)
- Multi-intent detection (“Find X and compare Y”)
- Entity extraction (resolving “the first one” to a product)
History Anchor: Symbolic AI -> Instruction Tuning / RLHF
Intent classification has deep roots. Early chatbots used regex pattern matching (ELIZA, 1966). Spoken dialog systems used statistical slot-filling (VoiceXML, Nuance). Rasa NLU brought ML-based intent classification. LLMs collapsed all of this into a single prompt — but the fundamental insight remains: classify first, execute second.
The Disasters
Disaster 1: The Monolithic Prompt
WE HAD ONE PROMPT DOING EVERYTHING:
THE 2,000-TOKEN MONSTER PROMPT
You are a shopping assistant. You can:
1. Search for products based on criteria...
2. Compare products by analyzing...
3. Explain specifications like...
4. Add products to cart when user says...
5. Refine searches by...
... [1,800 more tokens of instructions]
USER: "Find laptops and compare the top two"
|
v
BOT: Returns 5 laptops, ignores "compare" part
The prompt was so long that the LLM focused on the first instruction it could execute and ignored the rest.
Why it happened: Too many instructions. No prioritization. The LLM picked one task and called it done.
Disaster 2: The Multi-Intent Blindness
USER: "Show me the Dell XPS and add the HP to cart"
|
v
TWO INTENTS:
1. DEEP_DIVE ("Show me the Dell XPS")
2. SELECT ("add the HP to cart")
|
v
BOT: "I can help you with that!" (does neither)
The user wanted two things. Our system could only handle one intent at a time — so it handled neither.
Why it happened: No multi-intent detection. The system assumed one message = one intent.
Disaster 3: The Context Blindness
Turn 1:
USER: "Find laptops under $1000"
BOT: "Here are 5 options:
1. Dell XPS 15 - $899
2. HP Spectre x360 - $849
3. Lenovo ThinkPad - $799"
Turn 2:
USER: "Tell me about the first one"
BOT: "Which product are you referring to?"
The bot listed products. The user said “the first one.” The bot had no idea what that meant.
Why it happened: No entity extraction. “The first one” is a reference that requires context to resolve.
The Pattern
One prompt can't handle complexity. Classify first, route to specialized handlers, execute cleanly.
The Diagnosis: Why Monolithic Prompts Fail
The Monolithic Prompt Problem
ONE PROMPT DOING EVERYTHING (What We Had):
User Message -> Giant Prompt (2,000 tokens) -> Response
Problems:
- Prompt is huge (expensive, slow)
- LLM gets confused with too many instructions
- Can't handle multi-intent ("Find X and compare Y")
- No routing (all logic in one place)
- Hard to debug (which instruction failed?)
The Classification Solution
CLASSIFY -> ROUTE -> EXECUTE (What Works):
User Message
|
v
STEP 1: CLASSIFY (Cheap, 200 tokens)
"What does the user want?"
-> Intent: NEW_SEARCH
|
v
STEP 2: ROUTE (Deterministic, 0 tokens)
NEW_SEARCH -> SearchHandler
|
v
STEP 3: EXECUTE (Specialized handler)
SearchHandler extracts filters, queries DB
Benefits:
- Classification is cheap (200 tokens)
- Each handler does ONE job
- Can detect multiple intents
- Easy to debug (know which handler failed)
This is the Routing Pattern (or Gateway Pattern) in agentic design. You’ll see similar ideas in “Mixture of Experts” architectures — a small router decides which specialized model handles each request.
Alternative: Semantic Routing. For simple intents like “Greeting” or “FAQ,” you can use embedding similarity instead of an LLM call (0 tokens, <50ms). We chose LLM routing because e-commerce intents are complex and overlapping — but for simpler domains, Semantic Routing is faster and cheaper.
The Intent Taxonomy
What Intents Exist
Before you can classify, you need to enumerate what users can do in your domain.
Search Intents (Finding Products)
- NEW_SEARCH: “Find laptops under $1000”
- REFINE_SEARCH: “Actually, make it under $800”
- ALTERNATIVE_SEARCH: “Show me similar options”
Analysis Intents (Understanding Products)
- DEEP_DIVE: “Tell me more about the Dell XPS”
- COMPARE: “Compare the Dell vs HP”
- EXPLAIN_SPEC: “What does ‘Rds(on)’ mean?”
Action Intents (Making Decisions)
- SELECT: “Add the Dell to my list”
- DESELECT: “Remove the HP”
- EXPORT: “Export my selections to CSV”
Meta Intents (Conversation Management)
- SUMMARIZE: “What have I selected so far?”
- GENERAL: “What’s the best laptop brand?”
- GREETING: “Hello!”
Intent Decision Tree
USER MESSAGE
|
v
Does it mention specific products? (Dell, HP, "first one")
YES -> DEEP_DIVE or COMPARE or SELECT
NO -> Continue
|
v
Does it have search criteria? (price, specs, features)
YES -> NEW_SEARCH or REFINE_SEARCH
NO -> Continue
|
v
Is it asking a question? ("What is...", "How does...")
YES -> EXPLAIN_SPEC or GENERAL
NO -> Continue
|
v
Is it a greeting or meta request?
YES -> GREETING or SUMMARIZE
NO -> UNCLEAR (ask for clarification)
The Classification
The Classification Prompt (Lean and Focused)
INTENT_PROMPT = """
Classify the user's intent into ONE or MORE of these categories:
SEARCH INTENTS:
- new_search: User wants to find new products
- refine_search: User wants to modify current search
- alternative_search: User wants similar options
ANALYSIS INTENTS:
- deep_dive: User wants details about a specific product
- compare: User wants to compare products
- explain_spec: User is asking about a specification
ACTION INTENTS:
- select: User is choosing/adding a product
- deselect: User is removing a product
META INTENTS:
- summarize: User wants to see their selections
- general: General question about products
- greeting: User is greeting
CONTEXT:
Products in session: {product_list}
User selections: {selected_list}
USER MESSAGE: "{message}"
Return JSON:
{
"intents": ["intent1", "intent2"],
"referenced_products": ["Dell XPS", "HP Spectre"],
"confidence": 0.95,
"reasoning": "User wants to..."
}
"""
Key design choices:
- Low temperature (0.1): We want consistent classification, not creativity
- Small output (~200 tokens): Just JSON, no explanation
- Context injection: Products in session help resolve “first one”
- Multi-intent support: Array of intents, not single value
Classification Flow
USER: "Find laptops and compare the top two"
|
v
LLM CLASSIFICATION (200 tokens):
{
"intents": ["new_search", "compare"],
"referenced_products": [],
"confidence": 0.92,
"reasoning": "User wants to search AND compare"
}
|
v
VALIDATION (Deterministic):
- Check intents are valid
- Check for conflicts (no conflicts)
- Order by dependency: [new_search, compare]
|
v
ROUTE TO HANDLERS
The Implementation
from enum import Enum
from dataclasses import dataclass
from typing import List, Optional
class Intent(Enum):
NEW_SEARCH = "new_search"
REFINE_SEARCH = "refine_search"
ALTERNATIVE_SEARCH = "alternative_search"
DEEP_DIVE = "deep_dive"
COMPARE = "compare"
EXPLAIN_SPEC = "explain_spec"
SELECT = "select"
DESELECT = "deselect"
EXPORT = "export"
SUMMARIZE = "summarize"
GENERAL = "general"
GREETING = "greeting"
@dataclass
class ClassificationResult:
intents: List[Intent]
referenced_products: List[str]
confidence: float
reasoning: str
async def classify_intent(message: str, context) -> ClassificationResult:
# Build prompt with context
prompt = INTENT_PROMPT.format(
message=message,
product_list=context.fetched_products[:5], # Limit context
selected_list=context.selected_products
)
# Call LLM (small, fast model)
response = await llm.generate(
prompt=prompt,
temperature=0.1, # Low temperature for consistency
max_tokens=200 # Small output
)
# Parse and validate
result = json.loads(response)
return ClassificationResult(
intents=[Intent(i) for i in result["intents"]],
referenced_products=result.get("referenced_products", []),
confidence=result.get("confidence", 0.5),
reasoning=result.get("reasoning", "")
)
Multi-Intent Handling
The Multi-Intent Problem
USER: "Find cheap laptops and compare the top two"
CONTAINS TWO INTENTS:
1. NEW_SEARCH ("Find cheap laptops")
2. COMPARE ("compare the top two")
EXECUTION ORDER MATTERS:
1. Execute NEW_SEARCH first -> get results
2. Then execute COMPARE -> compare top 2 from results
Multi-Intent Flow
STEP 1: DETECT ALL INTENTS
Intents: [new_search, compare]
|
v
STEP 2: ORDER BY DEPENDENCY
new_search must run before compare (need results first)
Ordered: [new_search, compare]
|
v
STEP 3: EXECUTE SEQUENTIALLY
1. Execute new_search
-> Returns 5 laptops
-> Update context with results
2. Execute compare
-> Uses top 2 from updated context
-> Returns comparison
|
v
STEP 4: COMBINE RESPONSES
"Found 5 laptops under $800:
1. Dell XPS - $799
2. HP Spectre - $749
...
Comparing top 2:
Dell XPS vs HP Spectre..."
Intent Dependencies
Not all intents can run in any order. Define dependencies:
INTENT_DEPENDENCIES = {
Intent.COMPARE: [Intent.NEW_SEARCH, Intent.REFINE_SEARCH],
Intent.DEEP_DIVE: [Intent.NEW_SEARCH],
Intent.SELECT: [Intent.NEW_SEARCH, Intent.DEEP_DIVE],
Intent.REFINE_SEARCH: [Intent.NEW_SEARCH],
}
def order_intents(intents: List[Intent]) -> List[Intent]:
"""Order intents by dependency (search before compare, etc.)"""
priority = {
Intent.NEW_SEARCH: 1,
Intent.REFINE_SEARCH: 2,
Intent.DEEP_DIVE: 3,
Intent.COMPARE: 4,
Intent.SELECT: 5,
}
return sorted(intents, key=lambda i: priority.get(i, 10))
Entity Extraction
The Entity Problem
Turn 1:
USER: "Find laptops under $1000"
BOT: "Here are 5 options:
1. Dell XPS 15 - $899
2. HP Spectre x360 - $849
3. Lenovo ThinkPad - $799"
Turn 2:
USER: "Tell me about the first one"
^
What is "the first one"?
Need to resolve to "Dell XPS 15"
Entity Extraction Flow
USER: "Tell me about the first one"
|
v
CLASSIFICATION WITH CONTEXT:
Context injected into prompt:
Products in session:
1. Dell XPS 15 (SKU: XPS-15-2024)
2. HP Spectre x360 (SKU: SPECTRE-X360)
3. Lenovo ThinkPad (SKU: THINKPAD-X1)
LLM Output:
{
"intents": ["deep_dive"],
"referenced_products": ["Dell XPS 15"],
"reasoning": "'first one' refers to position 1"
}
|
v
VALIDATION:
- "Dell XPS 15" exists in context? Yes
- Resolve to SKU: "XPS-15-2024"
|
v
ROUTE TO DEEP_DIVE HANDLER with SKU: "XPS-15-2024"
Entity Types
| Type | Examples | Resolution |
|---|---|---|
| Positional | "first one", "second option", "last one" | Map to list index |
| Named | "Dell XPS", "HP Spectre" | Fuzzy match against context |
| Pronoun | "it", "that one", "this laptop" | Use most recently mentioned |
The Routing Pattern
Intent -> Handler Mapping
class ProductAgent:
def __init__(self):
# Map each intent to its handler
self.handlers = {
Intent.NEW_SEARCH: self._handle_search,
Intent.REFINE_SEARCH: self._handle_refine,
Intent.COMPARE: self._handle_compare,
Intent.DEEP_DIVE: self._handle_deep_dive,
Intent.SELECT: self._handle_select,
Intent.DESELECT: self._handle_deselect,
Intent.SUMMARIZE: self._handle_summarize,
Intent.GENERAL: self._handle_question,
Intent.GREETING: self._handle_greeting,
}
async def process(self, message: str, session_id: str):
# 1. Load context
context = self.session_store.get(session_id)
# 2. Classify intent (LLM, ~200 tokens)
result = await classify_intent(message, context)
# 3. Order intents by dependency
ordered = order_intents(result.intents)
# 4. Execute each intent
responses = []
for intent in ordered:
handler = self.handlers.get(intent)
if handler:
response = await handler(
message=message,
referenced=result.referenced_products,
context=context
)
responses.append(response)
# Update context after each handler
context = self.session_store.get(session_id)
# 5. Combine responses
return "\n\n".join(responses)
Why this works:
- Each handler does ONE job
- Easy to test (test each handler separately)
- Easy to debug (know which handler failed)
- Easy to extend (add new intent = add new handler)
The Proof: Before/After
| Before | After |
|---|---|
| Prompt size: 2,000 tokens | Prompt size: 200 tokens (classification only) |
| Multi-intent success: 15% | Multi-intent success: 92% |
| Entity resolution: 0% ("first one" failed) | Entity resolution: 87% ("first one" works) |
| Cost per request: $0.04 | Cost per request: $0.01 (4x cheaper) |
| Debugging time: Hours (which instruction failed?) | Debugging time: Minutes (know which handler failed) |
What changed: We separated classification from execution. Small, focused prompts. Deterministic routing.
The Checklist: Intent Layer Readiness
| Item | Score | |
|---|---|---|
| Have you enumerated all user intents? | Taxonomy | |
| Is classification prompt < 500 tokens? | Prompt | |
| Do you detect multiple intents? | Multi-Intent | |
| Do you extract referenced products? | Entities | |
| Is intent -> handler mapping deterministic? | Routing | |
| Do you handle unclear intents? | Fallback | |
| Do you inject session context for resolution? | Context | |
| Do you validate extracted intents? | Validation |
Score Interpretation:
- 0-3: Intent layer not ready
- 4-6: Prototype-ready
- 7-8: Production-ready
What’s Next
Issue 5: The Filter Extraction Layer
Now that you can classify intent as “search,” how do you execute it?
“User says ‘cheap laptops with good battery.’ What is ‘cheap’? What is ‘good’? How do you convert vague terms to specific database queries?”
What You’ll Learn:
- Filter extraction with field-aware prompts
- Validation and clamping (reject unknown fields, clamp to valid ranges)
- Constraint relaxation (what if 0 results?)
Key Takeaways
- 1 The Problem: One prompt doing everything = chaos.
- 2 The Solution: Classify first (cheap, 200 tokens) -> Route to specialized handlers (deterministic) -> Execute with focused prompts (expensive only where needed).
- 3 Key Takeaway: Intent classification is the router of your agent. Get it right, and everything else becomes simpler.
Glossary
- Intent: What the user wants to do (search, compare, select)
- Classification: Detecting which intent(s) apply to a message
- Routing Pattern: Architecture where a small classifier routes to specialized handlers (also called Gateway Pattern)
- Semantic Routing: Using embedding similarity (not LLM) to route simple intents — faster but less flexible
- Multi-Intent: One message with multiple goals (“Find X and compare Y”)
- Entity Extraction: Identifying what they’re referring to (“the first one” -> product SKU)
- Routing: Mapping intent to specialized handler function
- Handler: Function that executes one specific intent
- Monolithic Prompt: One giant prompt trying to do everything (don’t do this)
Until next issue,
Sentient Zero Labs