The Product Deep-Dive Layer: When Users Want More Than Specs
When users want to go deep on a product -- reading reviews, understanding thermal performance, checking compatibility -- RAG lets you search unstructured knowledge that doesn't fit in structured columns.
In this issue (14 sections)
We’ve built 7 layers. Users can search, filter, sort, and get results from structured data. But here’s what we kept hearing:
“What do people say about the battery life?” “How does the cooling system work?” “Is this compatible with my setup?”
These questions can’t be answered from structured columns. The answers are in user reviews, product manuals, compatibility guides — unstructured knowledge.
In this final issue:
- When users need product deep-dive (reviews, docs, detailed specs)
- Do You Need RAG? (decision framework)
- Hybrid architecture (database + RAG)
- The complete 8-layer architecture
History Anchor: Attention + Seq2Seq to Transformers to Tool Use + ReAct
Retrieval-Augmented Generation (RAG) combines two traditions: information retrieval (finding relevant documents) and language generation (producing coherent text). Before Transformers, retrieval was keyword-based and generation was template-based. The attention mechanism (2014) and Transformer architecture (2017) made it possible to retrieve semantically and generate contextually. RAG is the bridge between what your database knows and what the LLM can explain.
The Disasters
Disaster 1: The “What Do Users Say?” Problem
USER: "What do people say about the battery life?"
BOT: "The Dell XPS has a 10-hour battery" (from structured spec)
USER: "No, what do USERS say?"
BOT: "I don't have that information" ❌
REALITY:
• 1,000 user reviews exist in our database
• Reviews mention: "battery drains fast", "only 6-7 hours real-world"
• But reviews are stored as text blobs, not searchable
PROBLEM: We only exposed structured data (specs), not unstructured (reviews)
The user wanted real-world feedback. We gave them marketing specs.
Disaster 2: The “Explain This Spec” Problem
USER: "Explain the thermal performance"
BOT: "Operating temp: 0-40°C" (from structured spec)
USER: "No, HOW does the cooling system work?"
BOT: "I don't have that information" ❌
REALITY:
• 50-page technical manual exists
• Explains: dual-fan active cooling, vapor chamber, thermal paste
• But only basic specs were extracted to database
PROBLEM: Detailed documentation not accessible via search
The user wanted to understand the product deeply. We could only parrot the spec sheet.
Disaster 3: The “Is This Compatible?” Problem
USER: "Will this work with my motherboard?"
BOT: "I don't have compatibility information" ❌
REALITY:
• Compatibility guide exists in product documentation
• Lists: compatible motherboards, power requirements, clearance specs
• But stored as unstructured PDF, not in database
PROBLEM: Compatibility knowledge exists but isn't searchable
The answer existed. We just couldn’t find it.
The Pattern
Users don't just want specs. They want to understand products: reviews, documentation, compatibility, comparisons. That's where RAG becomes useful.
The Use Case: Product Deep-Dive
When users have found a product (via Issues 4-7), they often want to go deeper:
┌─────────────────────────────────────────────────────────┐
│ PRODUCT DEEP-DIVE QUESTIONS │
├─────────────────────────────────────────────────────────┤
│ │
│ 1. USER REVIEWS │
│ "What do people say about battery life?" │
│ "Are there common complaints?" │
│ "How's the build quality according to users?" │
│ │
│ 2. DETAILED SPECS │
│ "Explain the thermal performance" │
│ "How does the cooling system work?" │
│ "What's the display technology?" │
│ │
│ 3. COMPATIBILITY │
│ "Will this work with my motherboard?" │
│ "What power supply do I need?" │
│ "Does it fit in my case?" │
│ │
│ 4. COMPARISONS │
│ "How does this compare to the HP Spectre?" │
│ "What's better: this or the MacBook?" │
│ "Differences vs last year's model?" │
│ │
│ 5. DOCUMENTATION │
│ "What's the warranty policy?" │
│ "How do I upgrade the RAM?" │
│ "What's covered under warranty?" │
│ │
└─────────────────────────────────────────────────────────┘
None of these are in structured columns!
- Reviews: Full text, not just star rating
- Detailed specs: Narrative explanations, not just numbers
- Compatibility: Lists and guides, not boolean flags
- Documentation: PDFs, manuals, warranty text
Do You Need RAG?
RAG isn’t always the answer. Here’s the decision framework:
The Decision Tree
┌─────────────────────────────────────────────────────────┐
│ DO YOU NEED RAG? DECISION TREE │
├─────────────────────────────────────────────────────────┤
│ │
│ Q1: Do you have unstructured documents? │
│ (PDFs, manuals, reviews, articles) │
│ ├─ NO → Use structured context only (database) │
│ └─ YES → Continue... │
│ │
│ Q2: Does it fit in context window (< 100k tokens)? │
│ ├─ YES → Consider full context injection │
│ └─ NO → Continue... │
│ │
│ Q3: Is the content frequently updated? │
│ ├─ NO → Full context may work │
│ └─ YES → RAG (re-embed on update) │
│ │
│ Q4: Do you need source citations? │
│ ├─ YES → RAG (provides chunk sources) │
│ └─ NO → Either approach works │
│ │
│ RECOMMENDATION: │
│ • All NO → Structured context injection (database) │
│ • Some YES → Consider RAG │
│ • All YES → Definitely RAG │
│ │
└─────────────────────────────────────────────────────────┘
When RAG Makes Sense
For product deep-dive:
| Content Type | Volume | RAG Needed? |
|---|---|---|
| User reviews | 1000s per product | Yes |
| Product manuals | 50+ pages | Yes |
| Compatibility guides | Variable | Yes |
| Detailed specs | Narrative text | Yes |
| Basic specs | Structured fields | No (database) |
Hybrid Architecture
The Best of Both Worlds
Structured data stays in the database. Unstructured knowledge goes to RAG. Combine them in context.
┌─────────────────────────────────────────────────────────┐
│ HYBRID SERVICE ARCHITECTURE │
├─────────────────────────────────────────────────────────┤
│ │
│ USER: "What do users say about the Dell XPS battery?" │
│ ↓ │
│ ┌────────────────────────────────────────────┐ │
│ │ STEP 1: Get Structured Data (Database) │ │
│ │ • Product: Dell XPS 15 │ │
│ │ • Price: $899 │ │
│ │ • Battery (spec): 10 hours │ │
│ │ • Rating: 4.5/5 (from 1,234 reviews) │ │
│ └────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────┐ │
│ │ STEP 2: Detect RAG Need │ │
│ │ Query contains "what do users say" │ │
│ │ → Trigger RAG lookup for reviews │ │
│ └────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────┐ │
│ │ STEP 3: Retrieve from Vector Store │ │
│ │ • Embed query: "battery life reviews" │ │
│ │ • Filter: sku = "XPS-15-2024" │ │
│ │ • Top 5 review chunks retrieved │ │
│ └────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────┐ │
│ │ STEP 4: Combine and Generate │ │
│ │ Structured specs + Review chunks → LLM │ │
│ │ → Comprehensive answer │ │
│ └────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────┘
Hybrid Service Implementation
class HybridProductService:
def __init__(self, db, vector_store, llm):
self.db = db
self.vector_store = vector_store
self.llm = llm
async def deep_dive(self, query: str, product_sku: str) -> str:
# Step 1: Get structured data
product = await self.db.get_product(product_sku)
structured_context = self._format_product(product)
# Step 2: Detect if RAG needed
rag_keywords = ["users say", "reviews", "people think",
"how does", "explain", "warranty", "compatible"]
needs_rag = any(kw in query.lower() for kw in rag_keywords)
rag_context = ""
if needs_rag:
# Step 3: Retrieve from vector store
chunks = await self.vector_store.search(
query=query,
filter={"sku": product_sku},
top_k=5
)
rag_context = self._format_chunks(chunks)
# Step 4: Generate response
prompt = f"""
Answer the user's question using the provided context.
STRUCTURED DATA:
{structured_context}
{"RETRIEVED KNOWLEDGE:" if rag_context else ""}
{rag_context}
USER QUESTION: {query}
"""
return await self.llm.generate(prompt)
Chunking Strategies
Review Chunking: One Review = One Chunk
Reviews are self-contained. Keep them whole.
┌─────────────────────────────────────────────────────────┐
│ REVIEW CHUNKING STRATEGY │
├─────────────────────────────────────────────────────────┤
│ │
│ Each review = one chunk with metadata │
│ │
│ Chunk 1: │
│ text: "Battery lasts 8-9 hours with normal use. │
│ Great for a workday. Charging is fast too." │
│ metadata: { │
│ sku: "XPS-15-2024", │
│ type: "user_review", │
│ rating: 4.5, │
│ date: "2024-01-15", │
│ verified: true, │
│ aspects: ["battery", "charging"] │
│ } │
│ │
│ Chunk 2: │
│ text: "Disappointed with battery. Only 6 hours │
│ with my workflow. Advertised 10 is a lie." │
│ metadata: { │
│ sku: "XPS-15-2024", │
│ type: "user_review", │
│ rating: 2.0, │
│ date: "2024-02-20", │
│ verified: true, │
│ aspects: ["battery"] │
│ } │
│ │
└─────────────────────────────────────────────────────────┘
Documentation Chunking: Section-Aware
Manuals have sections. Respect them.
┌─────────────────────────────────────────────────────────┐
│ DOCUMENTATION CHUNKING STRATEGY │
├─────────────────────────────────────────────────────────┤
│ │
│ Detect sections, keep them together │
│ │
│ Chunk 1 (section: "Thermal Performance"): │
│ text: "Thermal Performance │
│ The XPS 15 uses a dual-fan active cooling │
│ system with a vapor chamber. Under normal │
│ load, fan noise stays below 30dB..." │
│ metadata: { │
│ sku: "XPS-15-2024", │
│ type: "documentation", │
│ section: "Thermal Performance", │
│ page: 12 │
│ } │
│ │
│ Chunk 2 (section: "Warranty"): │
│ text: "Warranty Policy │
│ Your device is covered for 2 years from │
│ date of purchase. This includes hardware │
│ defects under normal operating conditions..." │
│ metadata: { │
│ sku: "XPS-15-2024", │
│ type: "documentation", │
│ section: "Warranty", │
│ page: 45 │
│ } │
│ │
└─────────────────────────────────────────────────────────┘
Chunking Implementation
def chunk_reviews(reviews: list, sku: str) -> list:
"""Chunk reviews: one review = one chunk."""
chunks = []
for review in reviews:
chunks.append({
"text": review["text"],
"metadata": {
"sku": sku,
"type": "user_review",
"rating": review.get("rating"),
"date": review.get("date"),
"verified": review.get("verified", False),
}
})
return chunks
def chunk_documentation(doc_text: str, sku: str) -> list:
"""Chunk documentation: section-aware splitting."""
import re
# Split on headings (lines in ALL CAPS or with ## prefix)
sections = re.split(r'\n(?=[A-Z][A-Z\s]+\n|##)', doc_text)
chunks = []
for i, section in enumerate(sections):
if len(section.strip()) < 50: # Skip tiny sections
continue
# Extract section title (first line)
lines = section.strip().split('\n')
title = lines[0].strip('#').strip()
chunks.append({
"text": section.strip(),
"metadata": {
"sku": sku,
"type": "documentation",
"section": title,
"chunk_index": i
}
})
return chunks
Retrieval Flow
End-to-End RAG
USER QUERY: "What do users say about battery life?"
↓
┌─────────────────────────────────────────────────────────┐
│ STEP 1: EMBED QUERY │
│ "battery life user reviews" │
│ → [0.23, -0.45, 0.67, 0.12, ...] (768 dimensions) │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ STEP 2: VECTOR SIMILARITY SEARCH │
│ • Filter: sku = "XPS-15-2024" AND type = "user_review" │
│ • Top K: 5 chunks │
│ • Metric: Cosine similarity │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ STEP 3: RANK BY SIMILARITY │
│ 1. "Battery lasts 8-9 hours..." (0.94 similarity) │
│ 2. "Disappointed, only 6 hours..." (0.91 similarity) │
│ 3. "Great battery, 10+ hours..." (0.89 similarity) │
│ 4. "Battery drains fast when..." (0.85 similarity) │
│ 5. "Solid 9 hours of daily use..." (0.82 similarity) │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ STEP 4: COMBINE WITH STRUCTURED DATA │
│ Product specs + Top 5 review chunks → LLM context │
└─────────────────────────────────────────────────────────┘
↓
RESPONSE: "Users have mixed experiences with battery life.
While Dell advertises 10 hours, most users report:
- 8-9 hours with normal use (common)
- 6-7 hours with heavy workloads (some complaints)
- Fast charging is consistently praised
Overall, real-world battery is 6-9 hours depending on usage."
Token Economics
Cost Comparison: RAG vs. Full Context
SCENARIO: User asks about a product with 100 reviews
┌─────────────────────────────────────────────────────────┐
│ OPTION 1: RAG APPROACH │
├─────────────────────────────────────────────────────────┤
│ Costs: │
│ 1. Embed query: ~$0.0001 │
│ 2. Vector search: Negligible (database) │
│ 3. LLM with 5 retrieved chunks (~1,000 tokens): │
│ - Input: 1,000 tokens = ~$0.0025 │
│ - Output: 500 tokens = ~$0.005 │
│ │
│ Total per query: ~$0.008 │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ OPTION 2: FULL CONTEXT INJECTION │
├─────────────────────────────────────────────────────────┤
│ Costs: │
│ 1. LLM with all 100 reviews (~20,000 tokens): │
│ - Input: 20,000 tokens = ~$0.050 │
│ - Output: 500 tokens = ~$0.005 │
│ │
│ Total per query: ~$0.055 │
└─────────────────────────────────────────────────────────┘
VERDICT: RAG is ~7x cheaper when you have many reviews!
When Each Approach Wins
| Query Volume | Reviews/Product | Recommendation |
|---|---|---|
| < 1k/month | < 20 | Full context (simpler) |
| 1-10k/month | 20-100 | Either works |
| > 10k/month | > 100 | RAG (much cheaper) |
Generative UI and Latency Management
The Latency Problem
RAG is slow. Here’s why:
┌─────────────────────────────────────────────────────────┐
│ RAG LATENCY BREAKDOWN │
├─────────────────────────────────────────────────────────┤
│ │
│ Step 1: Embed query ~100ms │
│ Step 2: Vector search ~200ms │
│ Step 3: Rerank chunks ~300ms │
│ Step 4: Generate response ~2000ms │
│ ───────────────────────────────────────── │
│ TOTAL: ~2600ms │
│ │
│ User perception: "Why is this so slow?" │
│ │
└─────────────────────────────────────────────────────────┘
The Solution: Streaming and Visual Citations
Users hate staring at a spinner for 2+ seconds. The fix is Generative UI — show progress as it happens:
┌─────────────────────────────────────────────────────────┐
│ STREAMING UI PATTERN │
├─────────────────────────────────────────────────────────┤
│ │
│ 0ms: Show "Thinking..." skeleton │
│ 100ms: Show "Searching reviews..." status │
│ 300ms: Show retrieved sources (citations appear) │
│ 500ms: Start streaming text (words appear as │
│ they're generated) │
│ 2600ms: Complete response with all citations │
│ │
│ User perception: "This is fast and transparent!" │
│ │
└─────────────────────────────────────────────────────────┘
Implementation Pattern
async def stream_deep_dive(query: str, product_sku: str):
"""Stream RAG response with progressive UI updates."""
# 1. Immediately show status
yield {"type": "status", "message": "Searching product knowledge..."}
# 2. Retrieve and show sources
chunks = await vector_store.search(query, sku=product_sku)
yield {
"type": "sources",
"sources": [
{"title": c.source, "snippet": c.text[:100]}
for c in chunks[:3]
]
}
# 3. Stream generated response
async for token in llm.stream_generate(
prompt=build_rag_prompt(query, chunks),
max_tokens=500
):
yield {"type": "token", "content": token}
# 4. Final citations
yield {
"type": "complete",
"citations": [
{"source": c.source, "page": c.metadata.get("page")}
for c in chunks
]
}
Visual Citations
Show users where the answer came from:
┌─────────────────────────────────────────────────────────┐
│ VISUAL CITATION EXAMPLE │
├─────────────────────────────────────────────────────────┤
│ │
│ Response: │
│ "Battery life is excellent at 12+ hours [1]. │
│ Users note fast charging via USB-C [2]." │
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ [1] Review by TechReviewer, 4.5 stars │ │
│ │ "Easily lasted 12 hours with heavy use" │ │
│ └───────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ [2] Product Manual, Page 15 │ │
│ │ "USB-C Power Delivery supports 65W..." │ │
│ └───────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────┘
Why this matters: Streaming reduces perceived latency by 60-70%. Users see progress immediately, building trust. Citations build credibility — users can verify answers.
The Proof: Before/After
| Before | After |
|---|---|
| 'What do users say?' answered: 0% | 'What do users say?' answered: 95% |
| 'How does X work?' answered: 0% | 'How does X work?' answered: 87% |
| Documentation coverage: 0% | Documentation coverage: 92% |
| User satisfaction: Low | User satisfaction: High |
What changed: Added RAG for unstructured knowledge while keeping structured data in the database.
The Complete 8-Layer Architecture
Series Finale: All Layers Together
┌─────────────────────────────────────────────────────────┐
│ THE 8-LAYER AGENT ARCHITECTURE (COMPLETE!) │
├─────────────────────────────────────────────────────────┤
│ │
│ LAYER 0: FOUNDATION (Issue 1) │
│ • LLM as one component in deterministic chassis │
│ • Validation and structure beat raw prompting │
│ │
│ LAYER 1: DATA LAYER (Issue 2) │
│ • Field registry, schema design, validation │
│ • Single source of truth for what fields exist │
│ │
│ LAYER 2: INGESTION LAYER (Issue 3) │
│ • Scripts, normalization, full_data preservation │
│ • 80% of the work happens here │
│ │
│ LAYER 3: INTENT LAYER (Issue 4) │
│ • Classify first, execute second │
│ • Multi-intent handling, entity extraction │
│ │
│ LAYER 4: FILTER EXTRACTION (Issue 5) │
│ • NL → structured query with validation │
│ • Field-aware, validated, clamped │
│ │
│ LAYER 5: MEMORY LAYER (Issue 6) │
│ • Session context, database persistence │
│ • Token tracking for cost control │
│ │
│ LAYER 6: SORT & RANK (Issue 7) │
│ • Sortable fields with preferred direction │
│ • JSONB expressions, default sort logic │
│ │
│ LAYER 7: PRODUCT DEEP-DIVE (Issue 8) │
│ • Hybrid: Database (specs) + RAG (reviews/docs) │
│ • Section-aware chunking, filtered retrieval │
│ │
└─────────────────────────────────────────────────────────┘
How Layers Connect
USER MESSAGE: "What do users say about the cheapest MOSFET?"
↓
┌─────────────────────────────────────────────────────────┐
│ LAYER 3: INTENT │
│ Intents: [new_search, deep_dive] │
│ Multi-intent detected! │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ LAYER 4: FILTER EXTRACTION │
│ Filters: component_type = "mosfet" │
│ Sort: pricing ASC │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ LAYER 6: SORT & RANK │
│ ORDER BY (pricing->>'unit_price')::NUMERIC ASC │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ DATABASE SEARCH │
│ Found: MOSFET XYZ at $0.42 │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ LAYER 5: MEMORY │
│ Store in session context, update focus │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ LAYER 7: PRODUCT DEEP-DIVE (RAG) │
│ "What do users say" → Retrieve reviews for XYZ │
│ Top 5 review chunks added to context │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ RESPONSE GENERATION │
│ Combine: product specs + user reviews → answer │
└─────────────────────────────────────────────────────────┘
The Moat
Anyone can wrap an LLM API in a weekend. Few can build a production agent with all 8 layers. That's your competitive advantage.
- Field registries that validate
- Ingestion that normalizes
- Intent that classifies
- Filters that extract and clamp
- Memory that persists
- Sorting that understands “best”
- RAG that finds answers in docs
Complete Architecture Checklist
| Item | Score | |
|---|---|---|
| Layer 0: Do you treat LLM as one component, not the whole system? | — | |
| Layer 1: Do you have a field registry? | — | |
| Layer 2: Do you normalize vendor data at ingestion? | — | |
| Layer 3: Do you classify intent before executing? | — | |
| Layer 4: Do you validate extracted filters against the registry? | — | |
| Layer 5: Do you persist session context across turns? | — | |
| Layer 6: Do you have sortable field definitions with preferred direction? | — | |
| Layer 7: Do you have RAG for unstructured content? | — |
Score:
- 0-3: Foundation missing. Start with Issues 1-3.
- 4-5: Good start. Add Issues 4-5.
- 6-7: Almost there. Complete Issues 6-7.
- 8: Production-ready. Consider Issue 8 if needed.
Your Next Steps
Where to Start
Just starting? Read Issues 1-3 first. Build field registry, ingestion pipeline. Get structured data working.
Have basic search? Add Issues 4-5. Intent classification, filter extraction. Make search conversational.
Want multi-turn conversations? Implement Issue 6. Session memory, token tracking. Remember context across turns.
Need better results? Add Issue 7. Sortable fields, smart defaults. “Best” means something specific.
Want deep product knowledge? Implement Issue 8. RAG for reviews, documentation. Answer “what do users say?”
Key Takeaways
- 1 The Problem: Structured data alone can't answer 'what do users say?' or 'how does this work?'
- 2 The Solution: Hybrid architecture -- database for specs, RAG for reviews/docs. Decision framework to know when RAG is needed. Chunking: reviews (whole), docs (section-aware).
- 3 Key Takeaway: RAG complements structured data. Series complete -- you now have the full 8-layer agent architecture.
Until next issue,
Sentient Zero Labs