Tools and function calling
Before we begin
An LLM outputs text. It cannot by itself call OpenWeatherMap or query Postgres. Tool calling (also function calling) is the bridge: the model emits a structured request (“run get_weather with city=Paris”), and your application executes it and returns facts.
The model proposes. Your code disposes. Never let the model hold API keys or run SQL without validation.
Figure
Tool call lifecycle
What you will learn
- Trace the full tool-calling lifecycle from registration to final answer.
- Write JSON schemas models can use reliably.
- Implement validation, auth, timeouts, and error messages in application code.
- Format tool results so the model continues correctly.
- Compare parallel vs sequential tool calls and provider differences.
Before this lesson
The lifecycle (step by step)
Step 1 — Register tools
You send the model a list of tools, each with:
name— stable identifier (get_weather, notweatherTool2)description— when to use it (models read this heavily)parameters— JSON Schema for arguments
The model does not execute anything yet — it only knows what could be called.
Step 2 — User message arrives
Example: “What should I pack for Paris tomorrow?”
Step 3 — Model returns a tool call (not final text)
Instead of answering directly, the API response includes something like:
{
"name": "get_weather",
"arguments": "{\"city\": \"Paris\", \"units\": \"metric\"}"
}Different providers wrap this differently (OpenAI tool_calls, Anthropic tool_use), but the idea is identical.
Step 4 — Your app validates and executes
Never pass arguments straight to SQL or shell. Validate first:
def run_get_weather(args: dict) -> dict:
city = args.get("city", "").strip()
if not city or len(city) > 80:
raise ValueError("invalid city")
if not re.match(r"^[\w\s\-'.]+$", city):
raise ValueError("city contains disallowed characters")
units = args.get("units", "metric")
if units not in ("metric", "imperial"):
units = "metric"
# HTTP call with key from os.environ — never from the prompt
return fetch_weather_api(city, units)Step 5 — Append tool result to conversation
You add a message the API understands as “tool output”:
Tool get_weather → {"temp_c": 18, "condition": "cloudy", "precip_mm": 2}Step 6 — Model continues
It may call another tool (get_forecast) or produce the final packing advice using observed temperatures.
Example schema (weather)
{
"name": "get_weather",
"description": "Get current weather for a city. Use when user asks about temperature, rain, or what to wear.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name in English, e.g. Paris, Tokyo"
},
"units": {
"type": "string",
"enum": ["metric", "imperial"],
"description": "Temperature units"
}
},
"required": ["city"]
}
}Description tips:
- Say when to use the tool (“Use when user asks about…”).
- Say when not to (“Do not use for historical climate essays”).
- Match parameter names to what your code expects.
Poor descriptions cause wrong tool selection — the most common agent bug after bad prompts.
A second tool — geocoding
Agents often chain tools. Add:
{
"name": "geocode_city",
"description": "Convert city name to latitude/longitude. Use before map or distance tools if coords unknown.",
"parameters": {
"type": "object",
"properties": {
"city": { "type": "string" }
},
"required": ["city"]
}
}Chain example: user says “distance from Eiffel Tower to Louvre” → geocode × 2 → get_route → answer.
Your responsibilities (not the model’s)
| Responsibility | Why |
|---|---|
| Validate arguments | Models typo city names, invent enum values |
| Auth & secrets | Keys in server env / vault only |
| Timeouts | httpx.get(..., timeout=10) — hung tools block the whole agent |
| Retries with backoff | 429/503 from upstream — retry 2×, then return error to model |
| Sanitize outputs | Strip HTML, truncate huge JSON before re-prompting |
| Confirm destructive actions | delete_account, charge_card → human or explicit UI confirm |
| Idempotent tools | Same action twice must not duplicate — e.g. create_ticket with an idempotency key (a unique ID so retries are safe) |
The model is a planner, not a trusted runtime.
Formatting tool results for the model
Do:
get_weather(city="Paris") succeeded:
{"temp_c": 18, "condition": "cloudy", "precip_mm": 2}Do not dump raw HTTP headers, 50 KB JSON, or stack traces into context.
On error:
get_weather(city="Pariss") failed: HTTP 404 — city not found. Try corrected spelling.Clear errors help the model self-correct on the next iteration.
Minimal Python loop (conceptual)
messages = [{"role": "user", "content": user_query}]
tools = [WEATHER_SCHEMA, GEOCODE_SCHEMA]
for step in range(MAX_STEPS):
response = client.chat.completions.create(
model="gpt-4.1-mini",
messages=messages,
tools=tools,
)
msg = response.choices[0].message
if not msg.tool_calls:
return msg.content # final answer
messages.append(msg)
for call in msg.tool_calls:
name = call.function.name
args = json.loads(call.function.arguments)
try:
result = TOOL_REGISTRY[name](args)
status = "succeeded"
except Exception as e:
result = {"error": str(e)}
status = "failed"
messages.append({
"role": "tool",
"tool_call_id": call.id,
"content": f"{name} {status}: {json.dumps(result)}",
})
return "Max steps reached — please try a simpler question."This is the control loop from Lesson 1, made concrete.
Parallel vs sequential tool calls
Some APIs let the model request multiple tools in one turn:
get_weather(Paris)andget_weather(Lyon)in parallel — good for comparison queries.- Your app can
asyncio.gatherindependent calls to save latency.
Dependencies must stay sequential: geocode before route if route needs coordinates.
Tool design checklist
Before shipping a tool:
- Description explains when and when not
- Required fields marked in schema
- Server-side validation for every parameter
- Timeout and rate limit on external API
- Errors return actionable strings
- Logged with
trace_id(unique request ID for debugging),user_id, latency - Destructive ops behind confirmation
Common mistakes
| Mistake | Consequence |
|---|---|
| 20 tools in one agent | Wrong tool picked — split agents (Lesson 6) |
Vague description | Model calls calculator for weather |
| Returning secrets in tool output | Leak into logs and next prompts |
| No max steps | Infinite loop on confused model |
| Trusting model-generated SQL | Injection — use parameterized queries only |
Connect to the travel project
Your project registers tools like get_weather, geocode, search_places. Each follows this lesson’s lifecycle. The executor agent (Lesson 3) focuses on getting arguments right; this lesson is the mechanics underneath.
What's next
Lesson 3 — Planning vs execution — ReAct loops and splitting planner vs executor roles.