Pragmatic AI for Founders Issue 4/6

Where AI Breaks and The Data Layer Solution

AI fails predictably in two ways -- hallucination and bad retrieval -- and designing your data layer first prevents 80% of production disasters.

Apr 13, 2026 · 10 min read · Sentient Zero Labs

In this issue (8 sections)

$100 billion. $500 million. $812.

Those are the costs of AI failures at Google’s AI demo error, Zillow, and Air Canada. Different scales, same root causes: hallucination and bad retrieval. The pattern is predictable. The fixes are systematic. And yet, most teams skip the one thing that prevents 80% of disasters: designing the data layer first.

In this issue, we focus on the two most expensive AI failure modes and the solution that catches them early. We show why “data layer first” is not a nice-to-have but a requirement for production AI. The goal is not to scare you. The goal is to help you avoid building systems that look fine in demos but collapse under real use.

What you will take away: a clear failure taxonomy, a data layer design pattern, and a pre-mortem exercise to find your weaknesses before they find you.

History Anchor: From Keyword Search to Grounded Generation

Before Transformers, search systems relied on keyword matching (TF-IDF) — if your query did not use the exact same words as the document, you got nothing back. Transformers (2017) enabled semantic search: systems that understand meaning, not just words, so “laptop for video editing” could match documents about “high-performance notebooks.” But this power came with a new failure mode — hallucination. Models that understand meaning can also generate meaning that does not exist. Retrieval-Augmented Generation (RAG) was developed to fix this: ground the model’s output in retrieved facts, not its imagination. The catch is that RAG only works when the data layer underneath it is well-structured. A brilliant retrieval system pulling from messy, outdated, or unlabeled data is just a faster way to deliver the wrong answer.

The $100 Billion Oops: When AI Hallucinates

February 2023: Google launches Bard to compete with ChatGPT. In the demo, Bard claims the James Webb Space Telescope took “the very first images of a planet outside our own solar system.” This is false. The first exoplanet image was taken in 2004.

Result: Google’s stock drops 7.4% in one day. $100 billion in market cap evaporated.

What Is Hallucination?

Hallucination happens when an AI model generates information that is confident but false. It is not making things up maliciously — it is doing what it was trained to do: predict the next most likely token based on patterns in its training data. When the pattern is incomplete or the data is missing, the model guesses. It does not know the difference between a fact and a plausible-sounding fiction.

Why it happens:

The training data does not cover the topic.
Retrieval fails (no relevant documents found).
The model is not grounded in external facts.

💡 Mental Model: The Confident Intern

Imagine an intern who read the entire internet but has zero common sense. You ask them a question they don’t know. Instead of saying “I don’t know,” they confidently make up an answer that sounds plausible. That is hallucination.

The Real Cost

Hallucination is not just embarrassing — it is legally and financially expensive:

Air Canada: A chatbot told a customer they could get a bereavement discount retroactively. They could not. The customer sued. The court ruled that companies are liable for chatbot misinformation. Air Canada paid $812 plus legal fees. The precedent was set: you own what your AI says.
Hallucination rates: Depending on the model and task, LLMs hallucinate 3-27% of the time without grounding.

The Fix: Citation or Rejection

The solution is simple but non-negotiable:

Require citations: Every factual claim must cite a source from retrieved documents.
Reject unsupported claims: If the model cannot cite a source, it must say “I don’t have that information.”

This is not a model problem. This is a system design problem. You cannot fix hallucination by waiting for better models. You fix it by grounding the model in real data and validating its outputs.

Hallucination is the flashy failure -- the $100B mistake. But there's a quieter killer: when your AI retrieves something, just the wrong something.

The Data Layer First Solution

2021-2022: Zillow’s AI-powered home-buying algorithm bought thousands of homes at inflated prices. The algorithm was trained on historical data, but market conditions shifted rapidly during the pandemic. The model could not adapt. Zillow had to sell the homes at a loss.

Cost: $500 million write-off. The iBuying program was shut down.

The root cause: RAG failures are data layer failures.

Most teams think of RAG as a model problem: “We need better embeddings” or “We need a better vector database.” But the real issue is deeper. If your data layer is not designed properly, even the best retrieval system will fail.

Here is the pattern that prevents 80% of RAG disasters:

Step 1: Design the Schema Before Anything

Before you ingest a single document, define the schema. What metadata will you track?

Bad approach:

Dump all documents into a database without labels.
Hope the embedding model figures out what is relevant.
Cannot tell what is outdated, restricted, or irrelevant.

Good approach (track before ingesting):

Document type: policy, API doc, tutorial, internal memo, public FAQ.
Created date: filter out old information.
Version: track which version of a policy or product this applies to.
Department: support, engineering, legal, sales.
Access level: public, internal, restricted.
Tags: categorize by topic (billing, shipping, returns, etc.).

Why this matters: You can filter to 3 relevant documents instead of 100 irrelevant ones. The model sees only what it needs, not everything it could see.

Step 2: Build Field Metadata (Discovery Layer)

Once you have a schema, make it discoverable. The system (and the AI) should know:

What fields can be searched?
What operators work? (>, <, =, contains, between)
What values are valid? (e.g., doc_type can only be one of 5 options)

This prevents impossible queries. If the AI tries to filter by a field that does not exist, the system rejects it before wasting a database call.

Step 3: Constraint Prioritization

Not all constraints are equal. When retrieval returns zero results, you need a strategy for relaxation.

Example: “Show me the refund policy for California orders over $500”

CRITICAL (never drop):

Must be a Policy document.
Must be about refunds.

PREFERRED (drop if no results):

State = California
Order value > $500

Result: It is better to return the general refund policy than to hallucinate California-specific rules that do not exist.

Step 4: Validation at Every Boundary

Before sending documents to the LLM, validate them:

Freshness check: Is the document older than your threshold (e.g., 6 months)? If yes, reject it.
Relevance check: Is the similarity score below 70%? If yes, reject it.
Permission check: Is this document restricted, and does the user lack access? If yes, reject it.
If all documents are rejected: Return “I don’t have information on that” instead of guessing.

💡 Real Win: Medical Documentation System

Problem: Retrieving outdated treatment guidelines (some from 2019). Fix: Added created_at > 2024-01-01 filter. Result: Outdated info dropped from 22% to less than 2%. Business impact: Prevented potential lawsuit from incorrect medical guidance.

The Swiss Cheese Model

This mental model comes from site reliability engineering (SRE) and safety-critical systems. The idea is simple: every guardrail has holes (failure modes). But when you stack multiple layers, the holes rarely align.

The Four Layers

Layer 1: Input Validation

Sanitize user prompts (block injection attacks).
Route queries to the correct handler (search vs. chat vs. command).
Reject malformed inputs.

Layer 2: Retrieval Validation

Filter by metadata (type, date, version, access level).
Score relevance (reject below threshold).
Rerank results (cross-encoder for better ordering).
Require citations (every claim must point to a source).

Layer 3: Output Validation

Validate structure (does it match the expected schema?).
Verify grounding (does the output cite retrieved docs?).
Check business rules (is the price positive? is the count correct?).

Layer 4: Review Loop

Human approval for high-stakes actions (delete, payment, legal).
Escalate low-confidence answers (model says “I’m not sure”).
Log everything for debugging and compliance.

Real success (e-commerce):

Before: 18% hallucinations, 12% schema errors.
After: 3% hallucinations, less than 1% schema errors.
Business impact: Saves $2M/year in support escalations and refunds.

ℹ

Air Canada had no input validation, no retrieval filtering, no output checking. One missing guardrail cost $812. Four missing guardrails cost a lawsuit. The lesson: single guardrails fail. Layered guardrails catch what others miss.

Pre-Mortem Exercise

Instead of waiting for your AI to fail in production, imagine it failed today. Write the post-mortem now.

Instructions:

Pick a failure mode: hallucination or RAG failure (bad retrieval).
Describe what went wrong: What did users experience?
Calculate the cost: Revenue lost, legal fees, reputation damage.
Identify the guardrail that would have caught it.

Example 1:

Failure: Chatbot retrieved a 2022 return policy (90-day window). Current policy is 30 days. User returns product on day 60, expects refund, is denied, sues.
Cost: $50K legal settlement + 20% NPS drop.
Guardrail: created_at filter + policy version tracking + citation requirements.

Example 2:

Failure: Bot leaked internal pricing docs to public users (no permission check).
Cost: Legal + PR disaster. Competitors saw wholesale prices.
Guardrail: access_level field + permission filtering in Layer 2.

Now do this for your system. Be brutally honest. The goal is not to scare you. The goal is to find the weakness before it becomes a $500M mistake.

Data Layer Audit (9 Questions)

Score yourself: 0-3 = high risk, 4-6 = medium, 7-9 = low risk.

Schema (3 questions)

	Item	Score
	Do you have a defined schema for your data before ingesting?	Y/N
	Can you filter by created_at, doc_type, version?	Y/N
	Do you know what fields are searchable and what operators work?	Y/N

0 of 3

Validation (3 questions)

	Item	Score
	Do you validate outputs against expected structure (JSON schema, required fields)?	Y/N
	Do you have business rule validators (e.g., price > 0, count matches results)?	Y/N
	Do you verify grounding (check if outputs cite retrieved sources)?	Y/N

0 of 3

Monitoring (3 questions)

	Item	Score
	Do you track retrieval relevance scores (are docs actually relevant)?	Y/N
	Do you monitor schema violation rates (outputs failing validation)?	Y/N
	Do you alert on stale data (average document age increasing)?	Y/N

0 of 3

Your score: ___ / 9

Activity: Run a Pre-Mortem

Imagine your AI failed 6 months from now. Write a 3-sentence incident report.
Identify the missing guardrail (Layer 1-4 above).
Estimate the cost if you do nothing vs. the cost to add the guardrail now.

Resources

Air Canada Chatbot Ruling (CBC News) — The original reporting on Moffatt v. Air Canada. Key legal takeaway: companies own what their AI says, even if a chatbot said it.
Why Zillow’s Algorithmic Home Buying Imploded (Stanford GSB) — The clearest business-audience post-mortem on the $500M iBuying failure. What went wrong with the training data and why no monitoring caught it.
Medallion Architecture — Data Layer Design (Databricks) — The schema-first data design pattern referenced in the “Data Layer First” solution. Explains Bronze/Silver/Gold tiers and why schema design precedes ingestion.
Pydantic — Data Validation for Python — The standard library for enforcing output schemas and catching constraint violations. Relevant for any team implementing the three-stage validation pipeline.
Swiss Cheese Model (Wikipedia) — Originally from aviation safety engineering. Explains why layered defenses are necessary — a single guardrail always has holes.

What’s Next

Next issue: Silent Failures and Monitoring — how to catch drift and silent errors before users leave.

Teaser: Even with perfect data layer design, you will face silent errors and drift. Issue 5 shows how to monitor for invisible failures before users notice.

Until next issue,

Sentient Zero Labs