Building Effective Tools for AI

Seven issues on building production-grade MCP servers, A2A agents, and observable tool systems. Grounded in the open-source Recall memory layer.

By the end of this series you'll be able to...

Design idempotent tools that are safe to retry under network failures — without duplicating state or silently swallowing errors. Issue 1
Write LLM-readable error messages that tell an agent what went wrong, what was expected, and what to do next — eliminating retry loops caused by opaque errors. Issue 1
Choose the right MCP transport and place auth, secrets, and session context at the correct layer so credentials never appear in model context. Issue 2
Build a production FastMCP server with a three-layer middleware stack (auth, timeout, logging) and async-acknowledge for slow operations. Issue 3
Reduce planning errors by designing tools around user intent — collapsing overlapping tools into fewer, better-scoped interfaces with measured error rates. Issue 4
Implement A2A coordination with a full task lifecycle (SUBMITTED → WORKING → INPUT_REQUIRED → COMPLETED) so multi-agent systems can pause, surface blockers, and resume. Issue 5
Instrument every tool call with a ToolCallRecord and configure the four alert queries that catch 90% of production failures. Issue 6
Defend against tool poisoning and tool shadowing with startup description validation and namespaced tool names. Issue 7
Trace cost back to the originating session in a multi-agent chain using per-call cost_usd logging. Issues 6–7
Evaluate any third-party MCP server or A2A sub-agent against a security checklist: description whitelist, namespace isolation, Agent Card signature verification. Issues 5, 7

What Makes a Good Tool

Five properties every production MCP tool must have — and why most demo tools satisfy only one of them.

Idempotency by DesignLLM-Readable ErrorsStructured Return ShapeCall-Level Observability

17 min read · mcp, tools, idempotency

→

MCP Architecture In Depth

Three transports, three primitives, and the trust boundary that determines where auth and secrets belong in an MCP server.

Transport SelectionTrust BoundaryAuth at the Transport LayerResource vs Tool Decision

18 min read · mcp, auth, architecture

→

Building Your First Production MCP Server

How to wire auth, timeout, and logging middleware before your first tool — and the async-acknowledge pattern that prevents hanging tool calls.

Schema From CodeMiddleware-First ArchitectureAsync-Acknowledge PatternDocstring as Runtime Instruction

22 min read · mcp, fastmcp, middleware

→

Tool Design in the Real World

How Recall's 8-tool design collapsed to 5 tools — and why designing for the LLM's decision surface, not your backend's capability, is the key to lower planning error rates.

Intent-Aligned Tool DesignPlanning Error RateParameter vs Tool BoundaryStart Coarse Heuristic

21 min read · mcp, tool-design, llm

→

A2A — When Agents Need to Talk to Each Other

A2A gives multi-agent systems a task lifecycle that makes every state in a sub-agent's execution visible, pausable, and recoverable — solving the coordination failures that async function calls cannot.

A2A Task Lifecycle State MachineINPUT_REQUIRED (Pause and Resume)Agent Card DiscoveryREJECTED vs FAILED Semantics

21 min read · a2a, mcp, multi-agent

→

Tool Observability

Every tool call produces one record that answers three questions — did it succeed, how long did it take, and how much did it cost — and those three questions, asked consistently, are the foundation of everything useful you'll ever know about your tool layer in production.

ToolCallRecord (One Record Per Call)Cost-Per-Session AlertingP95 Latency Trendinginputs_hash (Cardinality-Safe Deduplication)

22 min read · observability, monitoring, mcp

→

The Tool Ecosystem in 2026

MCP is no longer an emerging standard — it's infrastructure, with real security threats, five open problems, and a clear picture of what teams can solve today versus what requires ecosystem-level coordination.

Tool Poisoning (MCPTox Attack Surface)Namespace IsolationFive Open ProblemsSigned Agent Cards

21 min read · mcp, a2a, security

→