Tools and function calling

Before we begin

An LLM outputs text. It cannot by itself call OpenWeatherMap or query Postgres. Tool calling (also function calling) is the bridge: the model emits a structured request (“run get_weather with city=Paris”), and your application executes it and returns facts.

The model proposes. Your code disposes. Never let the model hold API keys or run SQL without validation.

Figure

Tool call lifecycle

Model proposes → app validates & runs → result back to model → repeat or answer.

What you will learn

Trace the full tool-calling lifecycle from registration to final answer.
Write JSON schemas models can use reliably.
Implement validation, auth, timeouts, and error messages in application code.
Format tool results so the model continues correctly.
Compare parallel vs sequential tool calls and provider differences.

Before this lesson

The lifecycle (step by step)

Step 1 — Register tools

You send the model a list of tools, each with:

name — stable identifier (get_weather, not weatherTool2)
description — when to use it (models read this heavily)
parameters — JSON Schema for arguments

The model does not execute anything yet — it only knows what could be called.

Step 2 — User message arrives

Example: “What should I pack for Paris tomorrow?”

Step 3 — Model returns a tool call (not final text)

Instead of answering directly, the API response includes something like:

json

{
  "name": "get_weather",
  "arguments": "{\"city\": \"Paris\", \"units\": \"metric\"}"
}

Different providers wrap this differently (OpenAI tool_calls, Anthropic tool_use), but the idea is identical.

Step 4 — Your app validates and executes

Never pass arguments straight to SQL or shell. Validate first:

python

def run_get_weather(args: dict) -> dict:
    city = args.get("city", "").strip()
    if not city or len(city) > 80:
        raise ValueError("invalid city")
    if not re.match(r"^[\w\s\-'.]+$", city):
        raise ValueError("city contains disallowed characters")
    units = args.get("units", "metric")
    if units not in ("metric", "imperial"):
        units = "metric"
  # HTTP call with key from os.environ — never from the prompt
    return fetch_weather_api(city, units)

Step 5 — Append tool result to conversation

You add a message the API understands as “tool output”:

text

Tool get_weather → {"temp_c": 18, "condition": "cloudy", "precip_mm": 2}

Step 6 — Model continues

It may call another tool (get_forecast) or produce the final packing advice using observed temperatures.

Example schema (weather)

json

{
  "name": "get_weather",
  "description": "Get current weather for a city. Use when user asks about temperature, rain, or what to wear.",
  "parameters": {
    "type": "object",
    "properties": {
      "city": {
        "type": "string",
        "description": "City name in English, e.g. Paris, Tokyo"
      },
      "units": {
        "type": "string",
        "enum": ["metric", "imperial"],
        "description": "Temperature units"
      }
    },
    "required": ["city"]
  }
}

Description tips:

Say when to use the tool (“Use when user asks about…”).
Say when not to (“Do not use for historical climate essays”).
Match parameter names to what your code expects.

Poor descriptions cause wrong tool selection — the most common agent bug after bad prompts.

A second tool — geocoding

Agents often chain tools. Add:

json

{
  "name": "geocode_city",
  "description": "Convert city name to latitude/longitude. Use before map or distance tools if coords unknown.",
  "parameters": {
    "type": "object",
    "properties": {
      "city": { "type": "string" }
    },
    "required": ["city"]
  }
}

Chain example: user says “distance from Eiffel Tower to Louvre” → geocode × 2 → get_route → answer.

Your responsibilities (not the model’s)

Responsibility	Why
Validate arguments	Models typo city names, invent enum values
Auth & secrets	Keys in server env / vault only
Timeouts	`httpx.get(..., timeout=10)` — hung tools block the whole agent
Retries with backoff	429/503 from upstream — retry 2×, then return error to model
Sanitize outputs	Strip HTML, truncate huge JSON before re-prompting
Confirm destructive actions	`delete_account`, `charge_card` → human or explicit UI confirm
Idempotent tools	Same action twice must not duplicate — e.g. `create_ticket` with an idempotency key (a unique ID so retries are safe)

The model is a planner, not a trusted runtime.

Formatting tool results for the model

Do:

text

get_weather(city="Paris") succeeded:
{"temp_c": 18, "condition": "cloudy", "precip_mm": 2}

Do not dump raw HTTP headers, 50 KB JSON, or stack traces into context.

On error:

text

get_weather(city="Pariss") failed: HTTP 404 — city not found. Try corrected spelling.

Clear errors help the model self-correct on the next iteration.

Minimal Python loop (conceptual)

python

messages = [{"role": "user", "content": user_query}]
tools = [WEATHER_SCHEMA, GEOCODE_SCHEMA]
 
for step in range(MAX_STEPS):
    response = client.chat.completions.create(
        model="gpt-4.1-mini",
        messages=messages,
        tools=tools,
    )
    msg = response.choices[0].message
    if not msg.tool_calls:
        return msg.content  # final answer
 
    messages.append(msg)
    for call in msg.tool_calls:
        name = call.function.name
        args = json.loads(call.function.arguments)
        try:
            result = TOOL_REGISTRY[name](args)
            status = "succeeded"
        except Exception as e:
            result = {"error": str(e)}
            status = "failed"
        messages.append({
            "role": "tool",
            "tool_call_id": call.id,
            "content": f"{name} {status}: {json.dumps(result)}",
        })
return "Max steps reached — please try a simpler question."

This is the control loop from Lesson 1, made concrete.

Parallel vs sequential tool calls

Some APIs let the model request multiple tools in one turn:

get_weather(Paris) and get_weather(Lyon) in parallel — good for comparison queries.
Your app can asyncio.gather independent calls to save latency.

Dependencies must stay sequential: geocode before route if route needs coordinates.

Tool design checklist

Before shipping a tool:

Description explains when and when not
Required fields marked in schema
Server-side validation for every parameter
Timeout and rate limit on external API
Errors return actionable strings
Logged with trace_id (unique request ID for debugging), user_id, latency
Destructive ops behind confirmation

Common mistakes

Mistake	Consequence
20 tools in one agent	Wrong tool picked — split agents (Lesson 6)
Vague `description`	Model calls calculator for weather
Returning secrets in tool output	Leak into logs and next prompts
No max steps	Infinite loop on confused model
Trusting model-generated SQL	Injection — use parameterized queries only

Connect to the travel project

Your project registers tools like get_weather, geocode, search_places. Each follows this lesson’s lifecycle. The executor agent (Lesson 3) focuses on getting arguments right; this lesson is the mechanics underneath.

What's next

Lesson 3 — Planning vs execution — ReAct loops and splitting planner vs executor roles.