A2A — When Agents Need to Talk to Each Other
A2A gives multi-agent systems a task lifecycle that makes every state in a sub-agent's execution visible, pausable, and recoverable — solving the coordination failures that async function calls cannot.
In this issue (6 sections)
The consolidation worker accepted the task at 14:32:07 and went silent.
The orchestrating agent had submitted a batch — 47 conversation transcripts from a single user’s session history. Recall’s consolidation worker was designed for exactly this: receive the batch, run LLM extraction on each transcript, deduplicate the extracted facts, detect contradictions, store the final memory set. For a batch this size, the worker normally finished in two to four minutes. The orchestrator had submitted the task, received a task ID, and started waiting.
14:33. Nothing.
14:34. Still nothing.
14:35. The orchestrator had been designed to retry on timeout. But what timeout? The task hadn’t timed out — the task was still running. Or it had crashed. Or it was waiting. There was no way to tell the difference. The orchestrator was holding a task ID and three minutes of silence.
The batch had hit a contradiction. Two memories the worker had extracted from different transcripts contradicted each other directly: one said the user was vegetarian, one said they had ordered steak three weeks ago. A human would pause and ask which was correct. The consolidation worker had no mechanism to pause and ask. It could only complete or fail. It had done neither. It had hung — processing loop blocked on a question it couldn’t surface, holding its allocated memory, doing nothing.
The problem wasn’t that the worker was slow. The problem was that slow, failed, and waiting-for-input were all indistinguishable from the outside. The orchestrator had submitted a task. The task had entered a state that existed nowhere in its design. There was no error to catch, no timeout to trigger, no signal to act on.
A2A exists to solve that. Not with a different queue primitive or a fancier job system — with a task lifecycle that makes every state in the agent’s execution visible, pausable, and recoverable from the outside. The consolidation worker rebuilt on A2A can say WORKING while it extracts, INPUT_REQUIRED when it finds a contradiction it can’t resolve, and COMPLETED or FAILED when it finishes. The orchestrator can poll, receive the question, resolve it, and watch the worker resume. Three minutes of silence becomes a conversation.
Mental Model
MCP and A2A Are Complementary, Not Competing
Modern agent architectures evolved from single-model systems calling tools to multi-agent systems where specialized agents delegate work to one another. The ReAct pattern (Yao et al., 2022) demonstrated that LLMs could use tools reliably — but tools were always functions: call them, get a result, continue. When the work to be delegated requires its own reasoning loop, persistent state across multiple steps, and the ability to pause and surface questions, a function call is the wrong primitive. A sub-agent is not a tool. It needs a different protocol.
MCP and A2A are both under AAIF (the Agentic AI Foundation, under Linux Foundation governance as of December 9, 2025). They are designed as complements. MCP connects an agent to tools — database calls, API calls, file operations, fast operations that return immediately or nearly so. A2A connects an agent to other agents — sub-agents with their own reasoning loops, their own state, and tasks that may run for minutes, surface blockers, and need to be driven to completion by the caller. The question is not which protocol. The question is which pattern fits the work.
┌─────────────────────┬────────────────────────────────┬──────────────────────────────────┐
│ │ MCP (tool call) │ A2A (agent task) │
├─────────────────────┼────────────────────────────────┼──────────────────────────────────┤
│ Caller │ LLM or application │ Orchestrating agent │
├─────────────────────┼────────────────────────────────┼──────────────────────────────────┤
│ Callee │ Tool function │ Sub-agent with its own │
│ │ │ reasoning loop + state │
├─────────────────────┼────────────────────────────────┼──────────────────────────────────┤
│ Response model │ Synchronous result or │ Task ID, then polling or │
│ │ async-acknowledge │ push notification │
├─────────────────────┼────────────────────────────────┼──────────────────────────────────┤
│ State machine │ None — call and return │ SUBMITTED → WORKING → │
│ │ │ COMPLETED (or branch states) │
├─────────────────────┼────────────────────────────────┼──────────────────────────────────┤
│ Pause / resume │ Not possible │ INPUT_REQUIRED → resolve → │
│ │ │ resume WORKING │
├─────────────────────┼────────────────────────────────┼──────────────────────────────────┤
│ Discovery │ Hardcoded server URL │ Agent Card at │
│ │ │ /.well-known/agent-card.json │
├─────────────────────┼────────────────────────────────┼──────────────────────────────────┤
│ Governance │ AAIF (MCP spec) │ AAIF (A2A v1.0 spec) │
├─────────────────────┼────────────────────────────────┼──────────────────────────────────┤
│ Use when │ DB calls, API calls, fast │ Sub-agent needs own reasoning │
│ │ operations returning <30s │ loop, long tasks, INPUT_REQUIRED│
└─────────────────────┴────────────────────────────────┴──────────────────────────────────┘
Relationship: MCP + A2A are both under AAIF (Linux Foundation, founded Dec 9 2025).
They are complementary — no unification planned. A tool is not a sub-agent.
MCP is for tools. A2A is for sub-agents. The difference is more than latency.
The A2A Task Lifecycle
A2A v1.0 (the current stable spec under AAIF governance) defines a task as the fundamental unit of work between agents. Every task progresses through a state machine. The orchestrator submits a task, receives a task ID, and polls (or receives push notifications) to track state transitions. The sub-agent drives the transitions — it moves itself from SUBMITTED to WORKING when it begins, to INPUT_REQUIRED when it needs the caller to resolve something, back to WORKING when the resolution is received, and to a terminal state when it finishes.
┌─────────────────────────────────────────┐
│ Orchestrator submits task │
└──────────────────┬──────────────────────┘
│
▼
┌─────────────────┐
│ SUBMITTED │ ← task accepted, not yet running
└────────┬────────┘
│
▼
┌─────────────────┐
┌──► WORKING │ ← sub-agent is executing
│ └────────┬────────┘
│ │
│ ┌────────┴────────────────────┐
│ │ │
│ ▼ ▼
│ ┌──────────────────┐ ┌─────────────────┐
│ │ INPUT_REQUIRED │ │ COMPLETED ✓ │ ← terminal
│ └────────┬─────────┘ └─────────────────┘
│ │
│ orchestrator calls resolve()
│ with answer to the question
│ │
└───────────┘ (resumes WORKING)
Terminal states (no resume from these):
COMPLETED ✓ — task finished successfully
FAILED ✗ — task started and could not finish (retry with new task)
CANCELED ✗ — canceled by caller or by sub-agent timeout
REJECTED ✗ — sub-agent declined before starting (capacity, unsupported type, auth)
AUTH_REQUIRED ✗ — sub-agent requires authentication before it will accept the task
Key distinction:
REJECTED = declined before start (retry semantics: check capacity, resubmit)
FAILED = started and broke (retry semantics: new task, investigate root cause)
INPUT_REQUIRED is not an error. It is a first-class mechanism to surface a blocker and resume.
The REJECTED vs. FAILED distinction is worth holding onto — it determines retry strategy. If REJECTED, the sub-agent never started; the task can be resubmitted once the constraint is resolved (capacity freed, task type supported, auth provided). If FAILED, the sub-agent started and broke during execution; resubmitting the same task may hit the same failure. Different root causes, different handling.
Agent Cards
A2A agents advertise their capabilities via an Agent Card — a JSON document served at /.well-known/agent-card.json. The orchestrator fetches the card before its first task submission and uses it to validate that the sub-agent supports the capabilities required for the work at hand.
{
"name": "recall-consolidation",
"description": "Batch memory consolidation worker — deduplicates and extracts memories from conversation transcripts.",
"url": "https://agents.recall.internal/consolidation",
"version": "1.2.0",
"capabilities": {
"streaming": false,
"pushNotifications": true,
"stateTransitionHistory": true,
"inputRequired": true
},
"skills": [
{
"id": "consolidate_memories",
"name": "Consolidate Memory Batch",
"description": "Accepts a batch of conversation transcripts, extracts facts, deduplicates, and detects contradictions.",
"inputModes": ["text"],
"outputModes": ["text"],
"examples": ["Consolidate 50 transcripts for user usr_123"]
}
],
"authentication": {
"schemes": ["bearer"]
}
}
────────────────────────────────────────────────────────────────────────────────
Key fields the orchestrator validates before submitting a task:
capabilities.inputRequired → confirm sub-agent supports INPUT_REQUIRED
capabilities.pushNotifications → prefer push over polling if true
version → cache this, re-fetch on any submission failure (Agent Card drift)
skills[*].id → use to target a specific capability in the task submission
────────────────────────────────────────────────────────────────────────────────
The Agent Card is how sub-agents advertise what they can do. The orchestrator validates it before submitting any task.
Agent Cards replace hardcoded endpoint assumptions with a discovery mechanism. A fleet of sub-agents can be discovered dynamically; the orchestrator validates capabilities before submitting, not after the first failure. The card version field is particularly important: when a sub-agent is redeployed with changed capabilities, the version should change. The orchestrator can detect drift.
Understanding the lifecycle is the mental model. The next question is what the code looks like when you’re the caller: you submit a task to a sub-agent, you don’t know how long it will take, and you need to handle every state — including the one where it stops and asks you something.
Implementation
@mcp.tool() decorator, request context, error propagation. The ConsolidationClient below is the caller side of that architecture: the orchestrator that drives a sub-agent through its task lifecycle. The FastMCP server pattern handles what the sub-agent exposes; the A2A client pattern handles how the orchestrator calls it. Both sides are required for production multi-agent coordination. The primary pattern for any A2A caller is a three-part loop: submit, poll, and branch. The branch on INPUT_REQUIRED is the part most implementations skip — until they need it.
The A2A Client: Submit, Poll, and Handle INPUT_REQUIRED
The ConsolidationClient below is the full A2A caller for Recall’s consolidation worker. It handles task submission, async poll loop with exponential backoff, all terminal state handling, and INPUT_REQUIRED resolution via a callback pattern. The resolution callback is the key architectural decision — the client stays generic, and the caller decides how to resolve any question the sub-agent surfaces.
import asyncio
import httpx
from typing import Awaitable, Callable
TaskResolutionCallback = Callable[[str, dict], Awaitable[str]]
class ConsolidationClient:
"""A2A client for the consolidation worker. Handles all task states."""
def __init__(
self,
agent_url: str,
bearer_token: str,
resolution_callback: TaskResolutionCallback,
poll_interval: float = 2.0,
) -> None:
self.agent_url = agent_url
self.headers = {"Authorization": f"Bearer {bearer_token}"}
self.resolution_callback = resolution_callback
self.poll_interval = max(2.0, poll_interval) # 2s minimum — see notes
async def consolidate(self, user_id: str, transcripts: list[str]) -> dict:
"""Submit a consolidation task and drive it to completion.
Returns the final task result or raises on unrecoverable failure."""
task_id = await self._submit(user_id, transcripts)
return await self._poll(task_id)
async def _submit(self, user_id: str, transcripts: list[str]) -> str:
async with httpx.AsyncClient() as client:
resp = await client.post(
f"{self.agent_url}/tasks",
json={"skill": "consolidate_memories",
"input": {"user_id": user_id, "transcripts": transcripts}},
headers=self.headers,
timeout=30.0,
)
resp.raise_for_status()
return resp.json()["task_id"]
async def _poll(self, task_id: str) -> dict:
consecutive_working = 0
current_interval = self.poll_interval
while True:
async with httpx.AsyncClient() as client:
resp = await client.get(
f"{self.agent_url}/tasks/{task_id}",
headers=self.headers,
timeout=10.0,
)
resp.raise_for_status()
task = resp.json()
state = task["state"]
# A2A v1.0 wire format uses lowercase with hyphens:
# "submitted", "working", "input-required", "completed",
# "failed", "canceled", "rejected", "auth-required"
if state == "completed":
return task["result"]
elif state == "failed":
raise RuntimeError(f"Task {task_id} failed: {task.get('reason', 'unknown')}")
elif state in ("canceled", "rejected"):
raise RuntimeError(f"Task {task_id} ended with state {state}: {task.get('reason')}")
elif state == "auth-required":
# Sub-agent requires authentication before it will accept the task.
# AUTH_REQUIRED is a terminal state added in A2A v1.0 — refresh
# credentials and resubmit as a new task.
raise RuntimeError(
f"Task {task_id} requires authentication: {task.get('reason')}. "
"Refresh bearer token and resubmit."
)
elif state == "input-required":
# Sub-agent stopped and is asking a question — resolve and resume
question = task["input_required"]
answer = await self.resolution_callback(task_id, question)
await self._resolve(task_id, answer)
consecutive_working = 0
current_interval = self.poll_interval # reset backoff after INPUT_REQUIRED
elif state in ("submitted", "working"):
consecutive_working += 1
# Exponential backoff after 10 consecutive WORKING responses
if consecutive_working > 10:
current_interval = min(current_interval * 2, 30.0)
await asyncio.sleep(current_interval)
async def _resolve(self, task_id: str, answer: str) -> None:
async with httpx.AsyncClient() as client:
resp = await client.post(
f"{self.agent_url}/tasks/{task_id}/resolve",
json={"answer": answer},
headers=self.headers,
timeout=10.0,
)
resp.raise_for_status()
# ── Usage ────────────────────────────────────────────────────────────────────
# The caller decides how to resolve INPUT_REQUIRED — present to user,
# use a lookup table, call another agent. The client just shuttles the Q+A.
async def resolve_contradiction(task_id: str, question: dict) -> str:
"""Example: resolve a contradiction by asking the user."""
print(f"Contradiction detected: {question['description']}")
# In production: surface to user via UI, or route to a resolution agent
return "keep_most_recent"
async def main():
client = ConsolidationClient(
agent_url="https://agents.recall.internal/consolidation",
bearer_token="...",
resolution_callback=resolve_contradiction,
)
result = await client.consolidate(user_id="usr_123", transcripts=[...])
print(result)
Why 2s minimum poll interval. At 100ms intervals, a client polling an LLM-backed sub-agent sends 600 status requests per minute. The sub-agent spends more CPU answering status checks than doing the actual work. 2s is a practical lower bound for any task that involves LLM calls — the worker completes one extraction pass in roughly 2-4 seconds, so polling faster than that adds no information and costs CPU on both sides. For production systems where the sub-agent declares capabilities.pushNotifications: true in its Agent Card, prefer push notification over polling entirely. Push eliminates polling for the happy path.
The resolution callback pattern. The resolution_callback is a Callable[[str, dict], Awaitable[str]] — the task ID and the question payload in, the answer out. The client doesn’t know or care what “resolve” means in your system. Maybe it presents a UI to the user. Maybe it routes to a different agent that specializes in contradiction resolution. Maybe it applies a deterministic rule (“always keep most recent”). The callback pattern keeps the client reusable across those three strategies without modification. The client’s job is to recognize INPUT_REQUIRED, call the callback, and relay the answer. Your system’s job is to decide what the answer is.
REJECTED vs. FAILED. These two terminal states look similar — the task didn’t complete — but they have different retry semantics. REJECTED means the sub-agent evaluated the task before starting and declined it: capacity limit, unsupported skill ID, authentication not satisfied. The task never ran. Retry after addressing the constraint: wait for capacity, check the skill ID in the Agent Card, refresh the auth token. FAILED means the sub-agent started the task and could not finish: extraction threw an exception, LLM API returned an error, a contradiction was found but the timeout on INPUT_REQUIRED expired. The task ran partway. Retry with a new task ID after investigating the root cause — resubmitting the same task ID will not restart it.
Task state persistence. The sub-agent must persist its task state to durable storage — not in memory only. If the sub-agent restarts while a task is WORKING, in-memory state is gone. The orchestrator polling for that task ID will receive a 404 or an unexpected state. The fix is straightforward: write state transitions to a database as they happen. The orchestrator sees a consistent state regardless of sub-agent restarts. This is not optional for any task that runs longer than the sub-agent’s uptime guarantee.
Agent Card Discovery and Validation
Before submitting any task, the orchestrator should fetch and validate the sub-agent’s Agent Card. This catches capability mismatches before they become task failures, and the version field in the response gives you a baseline for detecting drift after deployments.
import httpx
class AgentCardError(Exception):
pass
async def fetch_and_validate_agent_card(
agent_url: str,
required_capabilities: set[str] | None = None,
) -> dict:
"""Fetch the Agent Card and validate it before submitting any task.
Raises AgentCardError if the card is unreachable or missing required capabilities.
The orchestrator should call this before the first task submission and re-fetch
on any submission failure (handles Agent Card version drift).
"""
if required_capabilities is None:
required_capabilities = {"inputRequired"}
async with httpx.AsyncClient() as client:
try:
resp = await client.get(
f"{agent_url}/.well-known/agent-card.json",
timeout=5.0,
)
resp.raise_for_status()
card = resp.json() # parse inside context — resp is only valid here
except httpx.HTTPError as e:
raise AgentCardError(f"Agent Card unreachable at {agent_url}: {e}") from e
# Validate required capabilities are declared
agent_caps = set(
k for k, v in card.get("capabilities", {}).items() if v is True
)
missing = required_capabilities - agent_caps
if missing:
raise AgentCardError(
f"Agent '{card.get('name')}' is missing required capabilities: {missing}. "
f"Declared: {agent_caps}. Check agent deployment."
)
return card
# ── Usage in orchestrator ─────────────────────────────────────────────────────
async def submit_with_validation(user_id: str, transcripts: list[str]) -> dict:
card = await fetch_and_validate_agent_card(
agent_url="https://agents.recall.internal/consolidation",
required_capabilities={"inputRequired", "pushNotifications"},
)
# Card version is available — log it for drift detection
version = card.get("version", "unknown")
client = ConsolidationClient(
agent_url=card["url"],
bearer_token="...",
resolution_callback=resolve_contradiction,
)
return await client.consolidate(user_id=user_id, transcripts=transcripts)
Why validate before submission, not after failure. A missing capability discovered at task submission time returns an opaque error — the sub-agent receives a task it cannot process, fails it, and the orchestrator sees a FAILED task state without a clear root cause. Validating the Agent Card before submission means the error is specific (AgentCardError: missing capabilities: {'inputRequired'}) and caught before any task state is created. The re-fetch-on-failure strategy in submit_with_validation handles the deployment drift case: if a submission fails with a validation error, re-fetch the card before deciding whether to retry — the sub-agent may have been redeployed with a different schema.
With the implementation pattern in place, the question shifts from “what code do I write” to “what can go wrong in production.” Three failure modes surface reliably once A2A coordination scales past a single orchestrator.
Failure Modes
POLLING THUNDERSTORM
What happens: Multiple orchestrators poll the consolidation worker at
100ms intervals. Worker spends more CPU responding to
status checks than running extractions. Throughput
collapses. New tasks queue while the worker is busy
answering "are you done yet?"
Root cause: Poll interval too short. No backoff. No coordination
between callers. Each orchestrator acts independently,
each assuming its task is the only one.
How to detect: Sub-agent request volume vs. task throughput ratio.
If HTTP GET /tasks/{id} requests >> completed tasks,
polling is the bottleneck. Observable in access logs
within minutes of a load increase.
Fix: Minimum poll interval 2s. Exponential backoff after 10
consecutive WORKING polls (2s → 4s → 8s → cap at 30s).
Prefer push notification when the sub-agent declares
capabilities.pushNotifications = true. Push eliminates
polling entirely for the happy path.
HUNG INPUT_REQUIRED
What happens: Sub-agent transitions to INPUT_REQUIRED ("input-required"
on the wire). Orchestrator receives the question. Caller
crashes before calling resolve(). Resolution callback throws.
User dismisses the UI. Task holds allocated memory indefinitely
on the sub-agent. Across 50 batches: sub-agent leaks state
until it OOMs and restarts, losing all in-progress tasks.
Root cause: No timeout on INPUT_REQUIRED state. No cleanup path for
tasks where the caller disappears after receiving the question.
How to detect: Monitor tasks in INPUT_REQUIRED state for longer than a
configurable threshold. Default: 10 minutes. Count of
tasks stuck in INPUT_REQUIRED is the key metric.
Fix: Sub-agent auto-cancels tasks that remain in INPUT_REQUIRED
past the timeout. Returns CANCELED ("canceled" on the wire)
with reason: "INPUT_REQUIRED not resolved within 10m timeout."
Caller receives a clean terminal state, not a hung task.
AGENT CARD VERSION DRIFT
What happens: Orchestrator caches the Agent Card at startup. Sub-agent
is redeployed — new version removes a skill or changes
input schema. Orchestrator continues submitting tasks
targeting the old skill ID. Tasks fail immediately with
schema validation errors. Error rate spikes. The root
cause is invisible — the orchestrator sees failures, not
the deployment that caused them.
Root cause: Agent Card treated as static. Cached once, never re-fetched
unless explicitly triggered.
How to detect: Validation error rate spike on task submission to a specific
sub-agent immediately following a deployment. The correlation
with deployment time is the signal.
Fix: On any submission failure, re-fetch the Agent Card before
retrying. Pin the card version in deployment manifests.
Integration test: submit a sample task to every registered
Agent Card after each deploy — before routing real traffic.
These three failures have a common shape: they’re invisible until they compound. The polling thunderstorm looks like a slow sub-agent. The hung INPUT_REQUIRED looks like a memory leak. The Agent Card drift looks like random task failures. All three are diagnosable from metrics — which is the subject of Issue 6. Before reaching for observability, though, the decision of whether to use A2A at all is the right first question.
Decision Guide
A2A adds protocol surface area. The poll loop, the Agent Card fetch, the INPUT_REQUIRED handler, the resolution callback — each is a piece of code that can fail, version-drift, or misbehave. Don’t pay for it until you need what it buys.
Use A2A when: Use async MCP tool when:
────────────────────────────────────── ──────────────────────────────────────
Sub-agent needs its own reasoning loop Task completes in <30 seconds
Multi-step execution with state Fire-and-forget acceptable
Long-running tasks (minutes, not sec) No contradiction or pause needed
INPUT_REQUIRED is a realistic path Single call, single result
Human-in-the-loop possible Endpoint is known and static
Agent Card discovery needed No Agent Card required
Sub-agent must persist state across In-process queue sufficient
its own restarts
For most teams, start with async MCP tools. A job-ID pattern — submit an async MCP tool call, receive a job ID, poll a status endpoint — covers the majority of background work. It’s simpler, introduces less surface area, and is sufficient for any task that completes in under 30 seconds and never needs to pause and surface a question.
Add A2A when three conditions are met: the sub-agent genuinely needs INPUT_REQUIRED behavior (it will find questions it cannot answer itself), the task runs long enough that state persistence across the sub-agent’s own restarts matters, or you need Agent Card-based discovery across a deployment of multiple sub-agents. If only one of those conditions is true, evaluate whether a simpler pattern covers it first.
The A2A protocol adds surface area. Don’t pay for it until you need what it buys.
Resources
Production Checklist
| Item | Score | |
|---|---|---|
| Sub-agent implements all required states: SUBMITTED, WORKING, INPUT_REQUIRED, COMPLETED, FAILED, CANCELED, REJECTED, AUTH_REQUIRED | ||
| Poll interval is minimum 2s with exponential backoff after 10 consecutive WORKING responses (2s → 4s → 8s → cap at 30s) | ||
| INPUT_REQUIRED state has a configurable auto-cancel timeout (default: 10 min) | ||
| Agent Card is versioned and updated on every capability-changing deployment | ||
| Task state is persisted to durable storage — not held in memory only | ||
| FAILED and CANCELED states return structured reason field | ||
| Orchestrator poll loop handles all terminal states: COMPLETED, FAILED, CANCELED, REJECTED, AUTH_REQUIRED | ||
| Agent Card is validated against required capabilities before task submission | ||
| Orchestrator re-fetches Agent Card on any submission validation failure |
The ConsolidationClient above satisfies the orchestrator-side items. The sub-agent implementation — particularly state persistence and the INPUT_REQUIRED timeout — is the side that most teams underinvest in. A sub-agent that handles every state correctly in development but loses in-memory task state on a pod restart is not production-ready. The checklist items for the sub-agent are harder to satisfy than the client-side items, and they are the ones that determine whether your A2A coordination holds up under load.
There is one dimension the checklist does not cover: when A2A coordination is working, how do you know? When it breaks, how fast do you find out? That’s the subject of Issue 6 — tool observability: what to log at every tool call, the three dashboards that catch 90% of production failures, and how to trace LLM API cost back to the originating session. The patterns apply to A2A tasks as directly as they apply to MCP tool calls. The ToolCallRecord pattern from Issue 1 extends naturally to task-level records — submit time, state transitions, resolution events, final result, and cost. Issue 6 builds that instrumentation in full.
Until next issue,
Sentient Zero Labs
Building Effective Tools for AI is a seven-issue series from Sentient Zero Labs. Each issue ships with working code from the Recall memory server — a production MCP tool and A2A coordination pattern built in public alongside the series. The consolidation worker, Agent Card, and client code are at github.com/Sentient-Zero-Labs/szl-recall.