Pragmatic AI for Founders Issue 3/6

Agents as Three-Layer Systems: Tools, Memory, and Orchestration

Agents are three-layer systems where the LLM is the decision layer, but 80% of the work is software engineering -- design the data layer first, then tools, then orchestration.

Apr 13, 2026 · 13 min read · Sentient Zero Labs

In this issue (11 sections)

Most people think “agent = chatbot with tools.” That mental model is incomplete and leads to fragile systems. The truth is simpler and harder: agents are orchestrated systems where the LLM is the decision layer, but 80% of the work is software engineering.

The biggest mistake teams make is building agents without designing the data layer and tool interfaces first. They write prompts, wire up tools, and end up with a black box that is impossible to debug or scale.

In this issue, we focus on agents as three-layer systems: Tools (what can it do?), Memory (what does it know?), and Orchestration (how does it decide?). We show why you must design the data layer first, then tools, then orchestration. And we give you the minimal viable agent pattern you can implement immediately.

What you will take away: the three-layer mental model, a production-grade scorecard, and patterns that separate demos from systems.

History Anchor: From Rule Books to Reasoning Agents

Early AI “agents” were glorified if-then rule books — hard-coded decision trees that could only handle scenarios their programmers had anticipated. If the situation was not in the rules, the system froze or failed. The ReAct pattern (Yao et al., 2022) showed that LLMs could do something fundamentally different: reason about a problem, decide which tool to use, observe the result, and then decide the next step — all without pre-programmed rules for every scenario. Modern agent architectures combine this LLM reasoning layer with deterministic tool execution, giving you the flexibility of human-like judgment with the reliability of software. For founders, this means agents are no longer science fiction — they are engineering problems with known patterns and predictable failure modes.

Mental Model: Agents = Tools + Memory + Orchestration

An agent is not just an LLM with function calling. It is a three-layer system where each layer has clear responsibilities:

Layer 1: Tools (The “What Can It Do?”)

Tools are callable functions with clear inputs, outputs, and error handling. They are the agent’s interface to the outside world: databases, APIs, calculators, file systems.

Tools should be small and composable, not monolithic.

Anti-pattern:

search_products(query)
  - Parses query
  - Extracts filters
  - Executes search
  - Formats results
  - Explains recommendations

This tool does too much. When it fails, you cannot tell which step broke.

Good pattern:

get_product_categories() -> returns available categories
search(category, filters) -> executes query
relax_filters(filters, priority) -> widens constraints
explain_tradeoffs(original, relaxed) -> shows impact

Each tool has one job. They compose into workflows. Debugging is straightforward.

Layer 2: Memory (The “What Does It Know?”)

Memory is the agent’s context across steps. It includes:

Short-term memory: Current conversation, intermediate results, session state.

Implementation: Context window, rolling buffer.
Example: A chatbot remembering the last 3 messages to maintain conversational flow.

Long-term memory: User preferences, past decisions, domain knowledge.

Types:
1. Episodic: Specific events (“User booked a trip to London last month”).
2. Procedural: Learned skills (“Best process for booking flights includes checking layover times”).
3. Semantic: General knowledge (“Visa requirements for UK travel”).
Implementation: Vector databases (Pinecone, ChromaDB), knowledge graphs.
Example: After 3 failed searches for budget laptops under $500, agent remembers the user’s price sensitivity and automatically prioritizes cheaper options in future queries.

Why long-term memory compounds value: Without it, the agent asks the same questions every session (“What’s your budget?” “Do you need HDMI ports?”). With long-term memory, the agent proactively filters: “Based on your past searches, I’m showing laptops with HDMI ports under $600.” This reduces friction and increases trust over time.

Most agents start with short-term memory only. Add long-term memory when workflows require personalization or when the system needs to learn from past interactions.

Layer 3: Orchestration (The “How Does It Decide?”)

Orchestration is the logic that decides:

Which tool to call next?
When to retry after a failure?
When to stop and return results?

Two approaches:

LLM-Driven Orchestration: The model decides which tool to call next based on the current state. Flexible, but harder to debug.
Explicit State Machine: You define the workflow with branching logic (if search returns empty, relax filters, then search again). Reliable, but less adaptive.

Best practice: Hybrid orchestration. Use an explicit workflow for critical paths (search -> relax -> explain) and let the LLM handle edge cases.

💡 Key Insight

Agents are not magic. They are systems where the LLM is the control plane, but the tools, memory, and orchestration are the infrastructure.

Design Pattern: Data Layer First, Then Tools, Then Orchestration

The most common mistake is writing orchestration prompts before understanding the data and tools. That leads to brittle agents that guess instead of query.

Here is the correct order:

Step 1: Design the Data Schema

Before building tools, understand the data:

What fields exist? What are their types (string, number, enum)?
What operators are supported (=, >, in, between)?
What are the constraints (e.g., voltage rating must be 2x nominal or greater)?

Example (e-commerce product search):

Product Category: Laptops
Fields:
  - price: number (USD)
  - ram: number (GB)
  - storage: number (GB)
  - brand: enum (Dell, HP, Lenovo, Apple, ...)
  - rating: number (1-5 stars)
Operators: =, !=, >, >=, <, <=, in, between

Why this matters: You cannot design good tools without knowing what queries are possible and what data exists.

Step 2: Build Tool Interfaces

Once you understand the data, build small, composable tools. Example (e-commerce pattern):

list_product_categories() — Returns available categories (laptops, phones, tablets, …).
get_category_filters(category) — Returns filterable fields and operators for a category.
get_sample_products(category, limit) — Returns sample products to understand data distribution.
search_products(category, filters, limit) — Executes a query with filters.
find_similar_products(product_id) — Suggests similar or alternative products.

Why this order: Each tool builds on the previous. You cannot search without knowing fields. You cannot relax filters without knowing what alternatives exist.

Step 3: Write Orchestration Prompts

Only now can you write the orchestration logic. The prompt defines:

The agent’s role and responsibilities.
The tools available and how to use them.
The decision loop (what to do when tools return empty, error, or success).
Fallback paths (what if retrieval fails?).

Example (product search agent):

You are an e-commerce shopping assistant. Your job is to help users find products.

Available tools:
- list_product_categories()
- get_category_filters(category)
- search_products(category, filters, limit)
- find_similar_products(product_id)

Workflow:
1. Parse the user query and extract constraints (must-have, preferred, nice-to-have).
2. Call get_category_filters(category) to verify filter fields exist.
3. Call search_products(category, filters) with must-have constraints only.
4. If results > 20, add preferred constraints to narrow.
5. If results = 0, relax nice-to-have constraints, then search again.
6. Explain which constraints were relaxed and the trade-offs.
7. Format results for comparison.

If a tool fails, explain the error and ask for clarification.

Why this works: The orchestration prompt is specific, testable, and explicit. No guessing.

Reliability Patterns for Agents

Here are patterns that make agents production-ready:

Clear Boundaries: Define what the agent can and cannot do. Enforce this in the tool layer, not just the prompt.

If the agent should never delete data, do not give it a delete tool.
If the agent should always explain decisions, make explanation a required step in the workflow.

Fallback Paths: When a tool fails or returns no results, have a default behavior.

Query -> Empty results
  |
Relax optional constraints -> Query again
  | (if still empty)
Relax preferred constraints -> Query again
  | (if still empty)
Explain why no results exist (constraints too strict, data gap)

Review Loops: For high-stakes actions (data modification, approvals, financial transactions), add a review step. The agent proposes; a human or validator confirms.

Explainability: The agent should log every tool call and decision. Users should be able to trace the workflow from input to output.

Validation: Check tool outputs before using them in the next step.

Multi-Agent Patterns (When to Use Which)

When workflows are complex, use multiple agents instead of one monolithic agent.

Pattern 1: Manager (Centralized Control)

A central “manager” LLM delegates tasks to specialized agents.

Example: Manager receives user query, delegates to “Search Agent” (finds parts), then “Comparison Agent” (ranks options), then “Recommendation Agent” (explains trade-offs).
Use when: Tasks are clearly separable. Centralized control and consistency are critical. Workflow is predictable.
Trade-off: Adds latency, but easier to debug.

Pattern 2: Decentralized (Peer-to-Peer)

Agents hand off tasks to each other without central coordination.

Example: Search Agent finds parts, hands to Comparison Agent, which hands to Recommendation Agent.
Use when: Environment is dynamic and unpredictable. High resilience needed. Tasks can be parallelized.
Trade-off: Harder to debug, but scales better and adapts faster.

Pattern 3: Orchestrator-Workers (Parallel Execution)

A central LLM breaks down a complex task, delegates to workers, and synthesizes results.

Example: User asks for complete home office setup. Orchestrator splits into sub-tasks (desk, chair, monitor, laptop), delegates to category specialists, synthesizes into a single recommendation.
Use when: Tasks are parallel and independent. Results need to be aggregated.
Trade-off: Good for heavy parallelization, but coordination overhead can be high.

Orchestration: LLM-Driven vs. State Machine

LLM-Driven Orchestration: The model decides which tool to call next based on the current state.

Use when workflow is exploratory or ambiguous (research, analysis).
Trade-off: Flexible but harder to debug.

State Machine Orchestration: Predefined workflow with explicit transitions.

Use when workflow is well-defined and repeatable (order processing, data ingestion).
Trade-off: Reliable and debuggable, but less adaptive.

Hybrid (Best Practice): State machine with LLM filling in gaps. Define the skeleton workflow (states + transitions), let the LLM decide how to execute each step. Combines reliability (known workflow) with flexibility (adaptive execution).

When Agents Add Value (vs. When They Add Risk)

Agents add value when:

The workflow has multiple decision points (if/else, retries, branching).
The task requires retrieval or external data (database queries, API calls).
The user needs explanations or trade-off analysis.
The workflow must adapt to varied inputs.

Agents add risk when:

The tools can modify data or trigger irreversible actions (delete, payment, deployment).
The orchestration logic is unclear or untested.
There is no review loop for high-stakes decisions.
Failure modes are not well-defined.

ℹ

Disaster Scenario (Real Production Incident): An AI coding agent was asked to “clear the cache.” It misinterpreted the request and executed rm -rf / on the production server, wiping the entire drive. Root cause: Agent had a delete_file() tool with no review loop or scope restrictions. Lesson: Never give agents destructive tools without human-in-the-loop approval or strict scoping.

A simple rule: If the workflow is linear and deterministic, a structured prompt is enough. If the workflow has branching, retries, or external dependencies, an agent makes sense.

The Minimal Viable Agent (MVA)

If you want to build your first agent, start with this pattern:

Components:

One tool (e.g., database search).
One LLM call (decide what to query based on user input).
One validation step (check that the tool returned valid data).
One fallback (if the tool returns empty, explain why and suggest alternatives).

Workflow:

User input -> Parse query -> Call search tool -> Validate result
  | (if empty)
Explain failure -> Suggest relaxing constraints -> End
  | (if success)
Format results -> Return to user

This is simple enough to debug but useful enough to demonstrate value. Once it works, you can:

Add more tools (comparison, recommendation, alternatives).
Add memory (track user preferences, past searches).
Add review loops (human confirmation for high-stakes actions).

Production-Grade Agent Design Scorecard

Part 1: Tool Quality (0-10 Points)

Are the tools well-designed?

	Item	Score
	Single Responsibility: Does each tool do one thing well? (Yes = 2, Monolithic = 0)	/2
	Clear Interfaces: Are inputs/outputs well-defined? (Yes = 2, Vague = 0)	/2
	Composability: Can tools be chained together? (Yes = 2, No = 0)	/2
	Error Handling: Do tools return errors vs. crashing? (Yes = 2, No = 0)	/2
	Idempotency: Can tools be retried safely? (Yes = 2, No = 0)	/2

Score: 0 / 10

Total: ___ / 10. If below 6, the tools are too fragile for production.

Part 2: Orchestration Logic (0-10 Points)

Is the decision-making clear and testable?

	Item	Score
	Explicit Workflow: Is the decision flow documented? (Yes = 2, Implicit = 0)	/2
	Fallback Paths: Are failures handled gracefully? (Yes = 2, No = 0)	/2
	Review Loops: Is there human oversight for high-risk actions? (Yes = 2, No = 0)	/2
	Explainability: Can you trace every decision? (Yes = 2, No = 0)	/2
	Testability: Can you unit-test the agent's flow? (Yes = 2, No = 0)	/2

Score: 0 / 10

Total: ___ / 10. If below 6, the agent is a black box; debugging will be painful.

Decision Guide

Total Score	Verdict
0 - 8	Don't Build. Agent is too brittle. Redesign tools and flows.
9 - 15	Prototype Only. Good foundation, but needs reliability work (fallbacks, review loops).
16 - 20	Production-Ready. Well-architected, explainable, and testable.

Activity: Design One Minimal Agent

Pick one workflow with 2-3 decision points. Design:

Data schema: What fields exist? What operators?
Tool interfaces: What tools are needed? What are their inputs/outputs?
Orchestration logic: What is the decision flow? What are the fallback paths?

Map this on paper before writing any code. Identify where tools might fail, where the LLM might make bad decisions, and where you need validation or review loops.

Resources

ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022) — The paper that established the reason-act-observe loop all modern agents use. Skim the introduction — the concept is clear without the benchmarks.
Building Effective Agents (Anthropic Engineering) — The most practical published guide to agent design from a team running agents at scale. Required reading before you build anything multi-step.
OpenAI Agents SDK — Lightweight Python SDK for building agents with handoffs and guardrails. Best starting point if you want to go from design to prototype.
A Survey on Large Language Model based Autonomous Agents (Wang et al., 2023) — Comprehensive taxonomy of agent architectures including the Manager, Peer-to-Peer, and Orchestrator-Workers patterns covered in this issue.
LangGraph — The go-to framework for building stateful, graph-based agent workflows. Relevant once you move beyond the Minimal Viable Agent.

What’s Next

Next issue: Limits, risks, and misconceptions — where agents break, what failures look like, and how to build guardrails.

Until next issue,

Sentient Zero Labs