← Technical Series
Agent Design Fieldbook Issue 8/8

The Product Deep-Dive Layer: When Users Want More Than Specs

When users want to go deep on a product -- reading reviews, understanding thermal performance, checking compatibility -- RAG lets you search unstructured knowledge that doesn't fit in structured columns.

Apr 13, 2026 · 16 min read · Sentient Zero Labs
In this issue (14 sections)

We’ve built 7 layers. Users can search, filter, sort, and get results from structured data. But here’s what we kept hearing:

“What do people say about the battery life?” “How does the cooling system work?” “Is this compatible with my setup?”

These questions can’t be answered from structured columns. The answers are in user reviews, product manuals, compatibility guides — unstructured knowledge.

In this final issue:

  • When users need product deep-dive (reviews, docs, detailed specs)
  • Do You Need RAG? (decision framework)
  • Hybrid architecture (database + RAG)
  • The complete 8-layer architecture
This is the series finale. We’ll tie everything together.

History Anchor: Attention + Seq2Seq to Transformers to Tool Use + ReAct

Retrieval-Augmented Generation (RAG) combines two traditions: information retrieval (finding relevant documents) and language generation (producing coherent text). Before Transformers, retrieval was keyword-based and generation was template-based. The attention mechanism (2014) and Transformer architecture (2017) made it possible to retrieve semantically and generate contextually. RAG is the bridge between what your database knows and what the LLM can explain.


The Disasters

Disaster 1: The “What Do Users Say?” Problem

USER: "What do people say about the battery life?"
BOT: "The Dell XPS has a 10-hour battery" (from structured spec)
USER: "No, what do USERS say?"
BOT: "I don't have that information" ❌

REALITY:
• 1,000 user reviews exist in our database
• Reviews mention: "battery drains fast", "only 6-7 hours real-world"
• But reviews are stored as text blobs, not searchable

PROBLEM: We only exposed structured data (specs), not unstructured (reviews)

The user wanted real-world feedback. We gave them marketing specs.

Disaster 2: The “Explain This Spec” Problem

USER: "Explain the thermal performance"
BOT: "Operating temp: 0-40°C" (from structured spec)
USER: "No, HOW does the cooling system work?"
BOT: "I don't have that information" ❌

REALITY:
• 50-page technical manual exists
• Explains: dual-fan active cooling, vapor chamber, thermal paste
• But only basic specs were extracted to database

PROBLEM: Detailed documentation not accessible via search

The user wanted to understand the product deeply. We could only parrot the spec sheet.

Disaster 3: The “Is This Compatible?” Problem

USER: "Will this work with my motherboard?"
BOT: "I don't have compatibility information" ❌

REALITY:
• Compatibility guide exists in product documentation
• Lists: compatible motherboards, power requirements, clearance specs
• But stored as unstructured PDF, not in database

PROBLEM: Compatibility knowledge exists but isn't searchable

The answer existed. We just couldn’t find it.

The Pattern

Users don't just want specs. They want to understand products: reviews, documentation, compatibility, comparisons. That's where RAG becomes useful.

The Use Case: Product Deep-Dive

When users have found a product (via Issues 4-7), they often want to go deeper:

┌─────────────────────────────────────────────────────────┐
│  PRODUCT DEEP-DIVE QUESTIONS                            │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  1. USER REVIEWS                                        │
│     "What do people say about battery life?"            │
│     "Are there common complaints?"                      │
│     "How's the build quality according to users?"       │
│                                                         │
│  2. DETAILED SPECS                                      │
│     "Explain the thermal performance"                   │
│     "How does the cooling system work?"                 │
│     "What's the display technology?"                    │
│                                                         │
│  3. COMPATIBILITY                                       │
│     "Will this work with my motherboard?"               │
│     "What power supply do I need?"                      │
│     "Does it fit in my case?"                           │
│                                                         │
│  4. COMPARISONS                                         │
│     "How does this compare to the HP Spectre?"          │
│     "What's better: this or the MacBook?"               │
│     "Differences vs last year's model?"                 │
│                                                         │
│  5. DOCUMENTATION                                       │
│     "What's the warranty policy?"                       │
│     "How do I upgrade the RAM?"                         │
│     "What's covered under warranty?"                    │
│                                                         │
└─────────────────────────────────────────────────────────┘

None of these are in structured columns!

  • Reviews: Full text, not just star rating
  • Detailed specs: Narrative explanations, not just numbers
  • Compatibility: Lists and guides, not boolean flags
  • Documentation: PDFs, manuals, warranty text

Do You Need RAG?

RAG isn’t always the answer. Here’s the decision framework:

The Decision Tree

┌─────────────────────────────────────────────────────────┐
│  DO YOU NEED RAG? DECISION TREE                         │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Q1: Do you have unstructured documents?                │
│      (PDFs, manuals, reviews, articles)                 │
│      ├─ NO → Use structured context only (database)     │
│      └─ YES → Continue...                               │
│                                                         │
│  Q2: Does it fit in context window (< 100k tokens)?     │
│      ├─ YES → Consider full context injection           │
│      └─ NO → Continue...                                │
│                                                         │
│  Q3: Is the content frequently updated?                 │
│      ├─ NO → Full context may work                      │
│      └─ YES → RAG (re-embed on update)                  │
│                                                         │
│  Q4: Do you need source citations?                      │
│      ├─ YES → RAG (provides chunk sources)              │
│      └─ NO → Either approach works                      │
│                                                         │
│  RECOMMENDATION:                                        │
│  • All NO → Structured context injection (database)     │
│  • Some YES → Consider RAG                              │
│  • All YES → Definitely RAG                             │
│                                                         │
└─────────────────────────────────────────────────────────┘

When RAG Makes Sense

For product deep-dive:

Content TypeVolumeRAG Needed?
User reviews1000s per productYes
Product manuals50+ pagesYes
Compatibility guidesVariableYes
Detailed specsNarrative textYes
Basic specsStructured fieldsNo (database)
💡 Key Insight
Don’t default to RAG. Use it when you have unstructured knowledge that doesn’t fit in structured columns or context window.

Hybrid Architecture

The Best of Both Worlds

Structured data stays in the database. Unstructured knowledge goes to RAG. Combine them in context.

┌─────────────────────────────────────────────────────────┐
│  HYBRID SERVICE ARCHITECTURE                            │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  USER: "What do users say about the Dell XPS battery?"  │
│    ↓                                                    │
│  ┌────────────────────────────────────────────┐         │
│  │ STEP 1: Get Structured Data (Database)     │         │
│  │ • Product: Dell XPS 15                     │         │
│  │ • Price: $899                              │         │
│  │ • Battery (spec): 10 hours                 │         │
│  │ • Rating: 4.5/5 (from 1,234 reviews)       │         │
│  └────────────────────────────────────────────┘         │
│    ↓                                                    │
│  ┌────────────────────────────────────────────┐         │
│  │ STEP 2: Detect RAG Need                    │         │
│  │ Query contains "what do users say"         │         │
│  │ → Trigger RAG lookup for reviews           │         │
│  └────────────────────────────────────────────┘         │
│    ↓                                                    │
│  ┌────────────────────────────────────────────┐         │
│  │ STEP 3: Retrieve from Vector Store         │         │
│  │ • Embed query: "battery life reviews"      │         │
│  │ • Filter: sku = "XPS-15-2024"              │         │
│  │ • Top 5 review chunks retrieved            │         │
│  └────────────────────────────────────────────┘         │
│    ↓                                                    │
│  ┌────────────────────────────────────────────┐         │
│  │ STEP 4: Combine and Generate               │         │
│  │ Structured specs + Review chunks → LLM     │         │
│  │ → Comprehensive answer                     │         │
│  └────────────────────────────────────────────┘         │
│                                                         │
└─────────────────────────────────────────────────────────┘

Hybrid Service Implementation

class HybridProductService:
    def __init__(self, db, vector_store, llm):
        self.db = db
        self.vector_store = vector_store
        self.llm = llm

    async def deep_dive(self, query: str, product_sku: str) -> str:
        # Step 1: Get structured data
        product = await self.db.get_product(product_sku)
        structured_context = self._format_product(product)

        # Step 2: Detect if RAG needed
        rag_keywords = ["users say", "reviews", "people think",
                        "how does", "explain", "warranty", "compatible"]
        needs_rag = any(kw in query.lower() for kw in rag_keywords)

        rag_context = ""
        if needs_rag:
            # Step 3: Retrieve from vector store
            chunks = await self.vector_store.search(
                query=query,
                filter={"sku": product_sku},
                top_k=5
            )
            rag_context = self._format_chunks(chunks)

        # Step 4: Generate response
        prompt = f"""
        Answer the user's question using the provided context.

        STRUCTURED DATA:
        {structured_context}

        {"RETRIEVED KNOWLEDGE:" if rag_context else ""}
        {rag_context}

        USER QUESTION: {query}
        """

        return await self.llm.generate(prompt)

Chunking Strategies

Review Chunking: One Review = One Chunk

Reviews are self-contained. Keep them whole.

┌─────────────────────────────────────────────────────────┐
│  REVIEW CHUNKING STRATEGY                               │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Each review = one chunk with metadata                  │
│                                                         │
│  Chunk 1:                                               │
│    text: "Battery lasts 8-9 hours with normal use.      │
│           Great for a workday. Charging is fast too."   │
│    metadata: {                                          │
│      sku: "XPS-15-2024",                                │
│      type: "user_review",                               │
│      rating: 4.5,                                       │
│      date: "2024-01-15",                                │
│      verified: true,                                    │
│      aspects: ["battery", "charging"]                   │
│    }                                                    │
│                                                         │
│  Chunk 2:                                               │
│    text: "Disappointed with battery. Only 6 hours       │
│           with my workflow. Advertised 10 is a lie."    │
│    metadata: {                                          │
│      sku: "XPS-15-2024",                                │
│      type: "user_review",                               │
│      rating: 2.0,                                       │
│      date: "2024-02-20",                                │
│      verified: true,                                    │
│      aspects: ["battery"]                               │
│    }                                                    │
│                                                         │
└─────────────────────────────────────────────────────────┘

Documentation Chunking: Section-Aware

Manuals have sections. Respect them.

┌─────────────────────────────────────────────────────────┐
│  DOCUMENTATION CHUNKING STRATEGY                        │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Detect sections, keep them together                    │
│                                                         │
│  Chunk 1 (section: "Thermal Performance"):              │
│    text: "Thermal Performance                           │
│           The XPS 15 uses a dual-fan active cooling     │
│           system with a vapor chamber. Under normal     │
│           load, fan noise stays below 30dB..."          │
│    metadata: {                                          │
│      sku: "XPS-15-2024",                                │
│      type: "documentation",                             │
│      section: "Thermal Performance",                    │
│      page: 12                                           │
│    }                                                    │
│                                                         │
│  Chunk 2 (section: "Warranty"):                         │
│    text: "Warranty Policy                               │
│           Your device is covered for 2 years from       │
│           date of purchase. This includes hardware      │
│           defects under normal operating conditions..." │
│    metadata: {                                          │
│      sku: "XPS-15-2024",                                │
│      type: "documentation",                             │
│      section: "Warranty",                               │
│      page: 45                                           │
│    }                                                    │
│                                                         │
└─────────────────────────────────────────────────────────┘

Chunking Implementation

def chunk_reviews(reviews: list, sku: str) -> list:
    """Chunk reviews: one review = one chunk."""
    chunks = []
    for review in reviews:
        chunks.append({
            "text": review["text"],
            "metadata": {
                "sku": sku,
                "type": "user_review",
                "rating": review.get("rating"),
                "date": review.get("date"),
                "verified": review.get("verified", False),
            }
        })
    return chunks

def chunk_documentation(doc_text: str, sku: str) -> list:
    """Chunk documentation: section-aware splitting."""
    import re

    # Split on headings (lines in ALL CAPS or with ## prefix)
    sections = re.split(r'\n(?=[A-Z][A-Z\s]+\n|##)', doc_text)

    chunks = []
    for i, section in enumerate(sections):
        if len(section.strip()) < 50:  # Skip tiny sections
            continue

        # Extract section title (first line)
        lines = section.strip().split('\n')
        title = lines[0].strip('#').strip()

        chunks.append({
            "text": section.strip(),
            "metadata": {
                "sku": sku,
                "type": "documentation",
                "section": title,
                "chunk_index": i
            }
        })

    return chunks

Retrieval Flow

End-to-End RAG

USER QUERY: "What do users say about battery life?"

┌─────────────────────────────────────────────────────────┐
│ STEP 1: EMBED QUERY                                     │
│ "battery life user reviews"                             │
│ → [0.23, -0.45, 0.67, 0.12, ...]  (768 dimensions)      │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│ STEP 2: VECTOR SIMILARITY SEARCH                        │
│ • Filter: sku = "XPS-15-2024" AND type = "user_review"  │
│ • Top K: 5 chunks                                       │
│ • Metric: Cosine similarity                             │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│ STEP 3: RANK BY SIMILARITY                              │
│ 1. "Battery lasts 8-9 hours..." (0.94 similarity)       │
│ 2. "Disappointed, only 6 hours..." (0.91 similarity)    │
│ 3. "Great battery, 10+ hours..." (0.89 similarity)      │
│ 4. "Battery drains fast when..." (0.85 similarity)      │
│ 5. "Solid 9 hours of daily use..." (0.82 similarity)    │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│ STEP 4: COMBINE WITH STRUCTURED DATA                    │
│ Product specs + Top 5 review chunks → LLM context       │
└─────────────────────────────────────────────────────────┘

RESPONSE: "Users have mixed experiences with battery life.
           While Dell advertises 10 hours, most users report:
           - 8-9 hours with normal use (common)
           - 6-7 hours with heavy workloads (some complaints)
           - Fast charging is consistently praised
           Overall, real-world battery is 6-9 hours depending on usage."

Token Economics

Cost Comparison: RAG vs. Full Context

SCENARIO: User asks about a product with 100 reviews

┌─────────────────────────────────────────────────────────┐
│  OPTION 1: RAG APPROACH                                 │
├─────────────────────────────────────────────────────────┤
│  Costs:                                                 │
│  1. Embed query: ~$0.0001                               │
│  2. Vector search: Negligible (database)                │
│  3. LLM with 5 retrieved chunks (~1,000 tokens):        │
│     - Input: 1,000 tokens = ~$0.0025                    │
│     - Output: 500 tokens = ~$0.005                      │
│                                                         │
│  Total per query: ~$0.008                               │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│  OPTION 2: FULL CONTEXT INJECTION                       │
├─────────────────────────────────────────────────────────┤
│  Costs:                                                 │
│  1. LLM with all 100 reviews (~20,000 tokens):          │
│     - Input: 20,000 tokens = ~$0.050                    │
│     - Output: 500 tokens = ~$0.005                      │
│                                                         │
│  Total per query: ~$0.055                               │
└─────────────────────────────────────────────────────────┘

VERDICT: RAG is ~7x cheaper when you have many reviews!

When Each Approach Wins

Query VolumeReviews/ProductRecommendation
< 1k/month< 20Full context (simpler)
1-10k/month20-100Either works
> 10k/month> 100RAG (much cheaper)

Generative UI and Latency Management

The Latency Problem

RAG is slow. Here’s why:

┌─────────────────────────────────────────────────────────┐
│  RAG LATENCY BREAKDOWN                                  │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Step 1: Embed query              ~100ms                │
│  Step 2: Vector search            ~200ms                │
│  Step 3: Rerank chunks            ~300ms                │
│  Step 4: Generate response        ~2000ms               │
│  ─────────────────────────────────────────              │
│  TOTAL:                           ~2600ms               │
│                                                         │
│  User perception: "Why is this so slow?"                │
│                                                         │
└─────────────────────────────────────────────────────────┘

The Solution: Streaming and Visual Citations

Users hate staring at a spinner for 2+ seconds. The fix is Generative UI — show progress as it happens:

┌─────────────────────────────────────────────────────────┐
│  STREAMING UI PATTERN                                   │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  0ms:     Show "Thinking..." skeleton                   │
│  100ms:   Show "Searching reviews..." status            │
│  300ms:   Show retrieved sources (citations appear)     │
│  500ms:   Start streaming text (words appear as         │
│           they're generated)                            │
│  2600ms:  Complete response with all citations          │
│                                                         │
│  User perception: "This is fast and transparent!"       │
│                                                         │
└─────────────────────────────────────────────────────────┘

Implementation Pattern

async def stream_deep_dive(query: str, product_sku: str):
    """Stream RAG response with progressive UI updates."""

    # 1. Immediately show status
    yield {"type": "status", "message": "Searching product knowledge..."}

    # 2. Retrieve and show sources
    chunks = await vector_store.search(query, sku=product_sku)
    yield {
        "type": "sources",
        "sources": [
            {"title": c.source, "snippet": c.text[:100]}
            for c in chunks[:3]
        ]
    }

    # 3. Stream generated response
    async for token in llm.stream_generate(
        prompt=build_rag_prompt(query, chunks),
        max_tokens=500
    ):
        yield {"type": "token", "content": token}

    # 4. Final citations
    yield {
        "type": "complete",
        "citations": [
            {"source": c.source, "page": c.metadata.get("page")}
            for c in chunks
        ]
    }

Visual Citations

Show users where the answer came from:

┌─────────────────────────────────────────────────────────┐
│  VISUAL CITATION EXAMPLE                                │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Response:                                              │
│  "Battery life is excellent at 12+ hours [1].           │
│   Users note fast charging via USB-C [2]."              │
│                                                         │
│  ┌───────────────────────────────────────────┐          │
│  │ [1] Review by TechReviewer, 4.5 stars     │          │
│  │ "Easily lasted 12 hours with heavy use"   │          │
│  └───────────────────────────────────────────┘          │
│                                                         │
│  ┌───────────────────────────────────────────┐          │
│  │ [2] Product Manual, Page 15               │          │
│  │ "USB-C Power Delivery supports 65W..."    │          │
│  └───────────────────────────────────────────┘          │
│                                                         │
└─────────────────────────────────────────────────────────┘

Why this matters: Streaming reduces perceived latency by 60-70%. Users see progress immediately, building trust. Citations build credibility — users can verify answers.


The Proof: Before/After

Before After
'What do users say?' answered: 0% 'What do users say?' answered: 95%
'How does X work?' answered: 0% 'How does X work?' answered: 87%
Documentation coverage: 0% Documentation coverage: 92%
User satisfaction: Low User satisfaction: High

What changed: Added RAG for unstructured knowledge while keeping structured data in the database.


The Complete 8-Layer Architecture

Series Finale: All Layers Together

┌─────────────────────────────────────────────────────────┐
│  THE 8-LAYER AGENT ARCHITECTURE (COMPLETE!)             │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  LAYER 0: FOUNDATION (Issue 1)                          │
│    • LLM as one component in deterministic chassis      │
│    • Validation and structure beat raw prompting        │
│                                                         │
│  LAYER 1: DATA LAYER (Issue 2)                          │
│    • Field registry, schema design, validation          │
│    • Single source of truth for what fields exist       │
│                                                         │
│  LAYER 2: INGESTION LAYER (Issue 3)                     │
│    • Scripts, normalization, full_data preservation     │
│    • 80% of the work happens here                       │
│                                                         │
│  LAYER 3: INTENT LAYER (Issue 4)                        │
│    • Classify first, execute second                     │
│    • Multi-intent handling, entity extraction           │
│                                                         │
│  LAYER 4: FILTER EXTRACTION (Issue 5)                   │
│    • NL → structured query with validation              │
│    • Field-aware, validated, clamped                    │
│                                                         │
│  LAYER 5: MEMORY LAYER (Issue 6)                        │
│    • Session context, database persistence              │
│    • Token tracking for cost control                    │
│                                                         │
│  LAYER 6: SORT & RANK (Issue 7)                         │
│    • Sortable fields with preferred direction           │
│    • JSONB expressions, default sort logic              │
│                                                         │
│  LAYER 7: PRODUCT DEEP-DIVE (Issue 8)                   │
│    • Hybrid: Database (specs) + RAG (reviews/docs)      │
│    • Section-aware chunking, filtered retrieval         │
│                                                         │
└─────────────────────────────────────────────────────────┘

How Layers Connect

USER MESSAGE: "What do users say about the cheapest MOSFET?"

┌─────────────────────────────────────────────────────────┐
│ LAYER 3: INTENT                                         │
│ Intents: [new_search, deep_dive]                        │
│ Multi-intent detected!                                  │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│ LAYER 4: FILTER EXTRACTION                              │
│ Filters: component_type = "mosfet"                      │
│ Sort: pricing ASC                                       │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│ LAYER 6: SORT & RANK                                    │
│ ORDER BY (pricing->>'unit_price')::NUMERIC ASC          │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│ DATABASE SEARCH                                         │
│ Found: MOSFET XYZ at $0.42                              │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│ LAYER 5: MEMORY                                         │
│ Store in session context, update focus                  │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│ LAYER 7: PRODUCT DEEP-DIVE (RAG)                        │
│ "What do users say" → Retrieve reviews for XYZ          │
│ Top 5 review chunks added to context                    │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│ RESPONSE GENERATION                                     │
│ Combine: product specs + user reviews → answer          │
└─────────────────────────────────────────────────────────┘

The Moat

Anyone can wrap an LLM API in a weekend. Few can build a production agent with all 8 layers. That's your competitive advantage.
  • Field registries that validate
  • Ingestion that normalizes
  • Intent that classifies
  • Filters that extract and clamp
  • Memory that persists
  • Sorting that understands “best”
  • RAG that finds answers in docs

Complete Architecture Checklist

Item Score
Layer 0: Do you treat LLM as one component, not the whole system?
Layer 1: Do you have a field registry?
Layer 2: Do you normalize vendor data at ingestion?
Layer 3: Do you classify intent before executing?
Layer 4: Do you validate extracted filters against the registry?
Layer 5: Do you persist session context across turns?
Layer 6: Do you have sortable field definitions with preferred direction?
Layer 7: Do you have RAG for unstructured content?
0 of 8

Score:

  • 0-3: Foundation missing. Start with Issues 1-3.
  • 4-5: Good start. Add Issues 4-5.
  • 6-7: Almost there. Complete Issues 6-7.
  • 8: Production-ready. Consider Issue 8 if needed.

Your Next Steps

Where to Start

Just starting? Read Issues 1-3 first. Build field registry, ingestion pipeline. Get structured data working.

Have basic search? Add Issues 4-5. Intent classification, filter extraction. Make search conversational.

Want multi-turn conversations? Implement Issue 6. Session memory, token tracking. Remember context across turns.

Need better results? Add Issue 7. Sortable fields, smart defaults. “Best” means something specific.

Want deep product knowledge? Implement Issue 8. RAG for reviews, documentation. Answer “what do users say?”


Key Takeaways

  1. 1 The Problem: Structured data alone can't answer 'what do users say?' or 'how does this work?'
  2. 2 The Solution: Hybrid architecture -- database for specs, RAG for reviews/docs. Decision framework to know when RAG is needed. Chunking: reviews (whole), docs (section-aware).
  3. 3 Key Takeaway: RAG complements structured data. Series complete -- you now have the full 8-layer agent architecture.

Until next issue,

Sentient Zero Labs