Tool / function calling explained.
A language model can only emit text. Tool calling (a.k.a. function calling) is the protocol that turns "emit text" into "do things in the world": the model requests a function, your code runs it, and the result goes back into context. This entry explains the exact request/response shape, the loop it implies, why the model never executes anything itself, and the design rules that keep tool-using agents safe.
The model proposes; your code disposes.
The single most important fact: the model does not call your function. It outputs a structured request asking you to call it. There is no code execution inside the model. Tool calling is a disciplined text protocol with three steps:
- You describe the available tools (name, purpose, parameter schema) in the request.
- The model, instead of answering in prose, may emit a tool-use message: "call
get_weatherwith{"city": "Paris"}." - Your application executes the real function, then sends the result back as a tool-result message. The model continues, now able to use that result.
Everything the model does is still next-token prediction. It was post-trained to emit a specific JSON-shaped structure when a tool would help; the "calling" is your code reacting to that structure.
The wire shape.
# 1. You declare tools in the request tools = [{ "name": "get_weather", "description": "Current weather for a city. Use when asked about weather.", "input_schema": { "type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"] } }]
# 2. Model replies with a tool-use request (not prose)
{ "type": "tool_use", "id": "t_01", "name": "get_weather",
"input": { "city": "Paris" } }
# 3. YOUR code runs get_weather("Paris") and returns a tool_result
{ "type": "tool_result", "tool_use_id": "t_01",
"content": "12C, light rain" }
# 4. Model now answers in prose, grounded in the result
"It's 12C with light rain in Paris right now."
The tool description and parameter schema are prompt engineering, not boilerplate. The model decides whether and how to call a tool based almost entirely on that text. A vague description ("does stuff with users") produces unreliable calling; a precise one ("Fetch a user by ID. Use only when you have a numeric user_id; do not guess IDs") produces reliable calling.
The loop this implies.
One tool call is rarely the end. The result may prompt another call, then another, until the model has enough to answer. That cycle — model → tool request → execute → result → model — repeated until the model emits a final answer instead of a tool request, is the agent loop (covered in the Agentic AI section). Tool calling is the primitive; the loop is the pattern built on it. Two consequences for beginners:
- You own the loop. The API does not auto-execute. You write the code that detects a tool-use message, runs the function, appends the result, and calls the model again.
- You need a stop condition. A buggy tool that always errors can make the model retry forever. Always cap iterations.
Why this primitive is powerful.
Tool calling removes three structural weaknesses of a bare LLM:
- Stale knowledge → live data. The model's training has a cutoff. A
searchorget_accounttool gives it current facts instead of memorized guesses. - Bad at exact computation → real computation. Models are unreliable at arithmetic and precise logic. A
calculatororrun_sqltool offloads exactness to code that is actually exact. - Can't act → can act.
send_email,create_ticket,deploy— tools are how an LLM affects the world rather than only describing it.
This is also the through-line of this section: RAG is essentially a retrieval tool wired into context; structured output is the same JSON-shaping machinery pointed at the final answer instead of a function call.
The safety rules that are not optional.
A tool argument produced by the model is untrusted input. The model can be wrong, or steered by prompt injection in retrieved/tool content. Treat every tool call as if a stranger on the internet sent that request to your function.
- Validate before executing. Re-check the arguments against the schema and your own business rules in code. Never pass model output straight into a shell, SQL string, or filesystem path.
- Least privilege. Expose the narrowest tool that does the job.
refund_order(order_id, amount)with server-side limits, notrun_arbitrary_sql. - Gate irreversible actions. Deletes, payments, sends — require explicit confirmation or human approval. Capability limits hold even when the prompt is subverted; a sentence telling the model "be careful" does not.
- Return errors as data. When a tool fails, send a structured error back as the tool result so the model can react ("that user does not exist") rather than crashing the loop.
- Make tools idempotent where possible. The loop may retry; a retried
charge_cardmust not double-charge.
Deliverable
You understand tool calling as a text protocol: you declare tools with precise descriptions and schemas, the model emits a structured tool-use request, your code executes the real function and returns a tool-result, and the model continues grounded in it. You own the surrounding loop and its stop condition. You know it exists to fix stale knowledge, weak computation, and inability to act. And you treat every model-produced argument as untrusted: validate it, scope tools to least privilege, gate irreversible actions with capability limits rather than instructions, and make execution idempotent.