Capability Discovery & Negotiation

Deep Dive · Protocols & Interop

Capability discovery and negotiation: knowing what a peer can do at runtime.

Hard-coding what a tool or agent can do binds you to build time and breaks on every change. Mature interop protocols make capabilities discoverable and version-negotiated at connection time. This essay generalises the discovery pattern, why version negotiation is non-optional, and why discovery and trust are separate concerns.

STEP 1

The cost of build-time capability knowledge.

The naive way to integrate is to encode a peer's capabilities into the client: this server has these three tools with these schemas; this agent accepts this request shape. It works until the first change. Add a tool, rename a parameter, deprecate a skill — every client that hard-coded the old assumption is now wrong, and nothing told it. With the M+N graph from the interop-problem essay, "nothing told it" multiplies across every client of every changed server.

Runtime discovery inverts this. The client does not assume; it asks. Capabilities become data fetched at connection time, not constants compiled into the caller. A server can grow new tools and conformant clients pick them up without redeployment, because the client's behaviour is driven by what discovery returned, not by what its author believed when they wrote it.

STEP 2

The discovery pattern, abstracted.

Across the protocols in this track the discovery move is the same three steps, with different spellings:

Announce. Each side states who it is and the broad capability classes it supports. MCP: the initialize capabilities object. A2A: the Agent Card. Tool calling: the declared tools array sent with each request.
Enumerate. The client lists concrete capabilities. MCP: tools/list, resources/list, prompts/list. A2A: the skills array in the card.
Bind. The client uses a capability by name with arguments shaped by the schema discovery returned — tools/call, message/send.

announce   ── who am I, what classes do I support ──>
enumerate  ── list the concrete tools/skills/resources ──>
bind       ── call one by name, args per its schema ──>
(re-enumerate on a listChanged notification)

The crucial property is that enumerate returns schemas, not just names. Discovery yields enough to construct a valid call without prior knowledge: the tool's name, its description, and its input JSON Schema. That is what makes a client generic — it can present and invoke a tool it has never seen because the description is machine- and model-readable.

Discovery and the model are coupled. The descriptions returned by tools/list or carried in an Agent Card skill are not only for the client's UI — they are fed to the model so it can decide whether and how to use the capability. Discovery metadata is prompt material, which is why the tool-calling-standards essay calls description the highest-leverage field.

STEP 3

Version negotiation is not optional.

The instant M clients and N servers share an interface, that interface must change without a synchronized upgrade. There is no flag day across an open ecosystem. So discovery is paired with negotiation: both sides exchange a protocol version and a capability set, and each must degrade gracefully when the other does not support something.

# Negotiation: agree a version, advertise optional features
# client → {"protocolVersion": "2025-06-18",
              "capabilities": {"sampling": {}}}
# server → {"protocolVersion": "2025-06-18",
              "capabilities": {"tools": {"listChanged": true},
                               "resources": {}}}
# Outcome: tools + resources usable; prompts absent;
# client must NOT call prompts/* — it was never offered.

Two disciplines follow directly. Feature-test, do not version-sniff: branch on "did the peer advertise listChanged?", not on "is the version ≥ X?" — capability flags are designed for exactly this and survive future versions. Treat the unadvertised as absent: if a capability was not announced, calling it is a protocol violation; degrade or refuse, never assume. This is what lets the M+N graph upgrade one node at a time instead of all at once.

A2A reflects the same principle in the Agent Card: a capabilities block (for example streaming, pushNotifications) tells the caller which interaction modes are available before it commits to a request shape, so it can choose a synchronous call against a peer that does not stream.

STEP 4

Discovery describes ability, not permission.

The most consequential conceptual error in this whole area: conflating "the peer says it can do X" with "the peer should be allowed to do X here." Discovery answers what is possible. It says nothing about what is authorized.

A discovered tool list is an advertisement, and the advertiser may be wrong, outdated, or hostile. A server can list a delete_all tool; discovering it does not mean your host should expose it to the model, and certainly not without consent. The host still owns three decisions discovery cannot make for it: which discovered servers/agents to connect at all, which of their capabilities to surface to the model, and which require explicit user approval before they run.

Capability discovery is an input to a policy decision, never the decision itself. "It was in tools/list" is not authorization. Allowlisting, scoping, and consent gates sit above discovery and are mandatory; the threat model and the scoping mechanics live in the Safety & Agentic Security deep-dives. The protocol's contribution is making the option set explicit so policy has something concrete to act on.

The throughline: discovery plus negotiation is what turns M × N hard-coded couplings into an M + N graph that can evolve — announce, enumerate (with schemas), bind, feature-test, and degrade gracefully. It tells you what is possible; deciding what is permitted is a separate, non-negotiable layer that sits on top of everything described here.