TL;DR

Tool use (also called function calling) is what turns AI from a system that can only talk into a system that can act. Instead of just generating text responses, the AI can call APIs, query databases, send emails, create calendar events, and interact with virtually any external service. You define which tools the AI can use, the AI decides when and how to use them, and your code executes the actual operations. This is the foundation of AI agents.

Why it matters

Without tool use, AI is limited to what it already knows and what you paste into the conversation. Ask ChatGPT "What is the weather in Sydney right now?" and it will tell you it cannot access real-time information. Give it a weather API tool, and it will check the actual weather and tell you.

This matters because most useful tasks require interacting with the real world. Booking a flight requires checking availability and making a reservation. Answering a customer's question about their order requires looking up their order in a database. Scheduling a meeting requires checking calendars and sending invitations.

Tool use is the bridge between "AI that knows things" and "AI that does things." It is what makes AI assistants genuinely useful in workflows rather than just conversational curiosities. Every major AI provider -- OpenAI, Anthropic, Google -- now supports tool use as a core capability, and it is the foundation upon which AI agents are built.

How function calling works step by step

The best way to understand tool use is to walk through a real example. Say you are building a customer support assistant that can look up order status.

Step 1: You define the available tools. You tell the AI what tools exist, what they do, and what parameters they accept. This is like giving someone a toolbox and explaining what each tool is for:

{
  "name": "lookup_order",
  "description": "Look up a customer order by order ID and return its current status, shipping information, and expected delivery date",
  "parameters": {
    "type": "object",
    "properties": {
      "order_id": {
        "type": "string",
        "description": "The order ID, e.g. ORD-12345"
      }
    },
    "required": ["order_id"]
  }
}

Step 2: The user asks a question. A customer says: "Where is my order ORD-78901?"

Step 3: The AI decides to use a tool. Instead of generating a text response, the AI recognizes it needs real-time order data and generates a tool call:

{
  "tool": "lookup_order",
  "arguments": {"order_id": "ORD-78901"}
}

Step 4: Your code executes the tool call. Your application receives this tool call, validates the parameters, calls your actual order database, and gets the result.

Step 5: You return the result to the AI. You send the database result back to the AI:

{
  "status": "shipped",
  "carrier": "Australia Post",
  "tracking": "AP123456789",
  "estimated_delivery": "2026-02-14"
}

Step 6: The AI generates a natural language response. Using the tool result, the AI responds to the customer: "Your order ORD-78901 has been shipped via Australia Post. The tracking number is AP123456789, and it is expected to arrive by February 14th."

The crucial thing to understand: the AI never actually executes the tool. It generates a request, and your code decides whether and how to execute it. This keeps you in control.

How different providers handle tool use

All major providers support tool use, but the implementation details differ.

OpenAI calls it "function calling" and supports it in the Chat Completions API. You define functions in the API request, and the model can choose to call one or more functions in its response. OpenAI supports parallel function calling -- the model can request multiple tool calls simultaneously when appropriate.

Anthropic calls it "tool use" in the Claude API. Claude supports defining tools with JSON schemas, and the model generates tool use blocks within its response. Anthropic emphasizes Claude's ability to reason about when to use tools and when a text response is sufficient.

Google supports function calling in the Gemini API with a similar pattern. Define functions, the model decides when to call them, and you execute and return results.

The patterns are converging across providers. If you design your tool definitions well, switching between providers is mostly a matter of adapting the API format rather than rethinking your approach.

Practical implementation patterns

Single-tool lookup is the simplest pattern. The AI calls one tool to get information and then responds. The order lookup example above illustrates this.

Multi-tool workflows involve the AI calling several tools in sequence to complete a task. For example: "Schedule a meeting with Sarah next week" might require calling a contacts tool to find Sarah's email, a calendar tool to check available slots, and an email tool to send the invitation. The AI orchestrates the sequence.

Tool chaining happens when the output of one tool feeds into the input of another. "Find the cheapest flight from Sydney to London and book it" requires a search tool (find flights), a selection tool (pick the cheapest), and a booking tool (purchase the ticket), where each step depends on the previous result.

Parallel tool calls are useful when the AI needs information from multiple independent sources. "Compare the weather in Sydney and Melbourne" can call the weather API twice simultaneously rather than sequentially.

Iterative tool use is when the AI refines its approach based on results. If a database query returns no results, the AI might broaden the search parameters and try again, much like a human would.

Security considerations

Tool use introduces serious security considerations that you must address before deployment.

Never give AI unrestricted access. An AI with access to "run any SQL query" or "call any API endpoint" is a security incident waiting to happen. Define specific, narrow tools for specific tasks. The AI should be able to look up an order, not run arbitrary database queries.

Validate every parameter. The AI generates parameters based on user input, which means those parameters can contain anything. Validate data types, ranges, and formats before executing. An order ID should match your expected format. A date should be within a reasonable range. Treat AI-generated parameters with the same suspicion you would treat user input from a web form.

Implement rate limiting. Without rate limits, a confused AI in a loop could call your tools thousands of times. Set per-minute and per-session limits on tool calls.

Use least-privilege access. The credentials your tools use should have the minimum permissions needed. A tool that reads order status should not have permission to delete orders.

Log everything. Record every tool call the AI makes -- what was called, with what parameters, what was returned, and in what context. This is essential for debugging, auditing, and detecting abuse.

Add human confirmation for high-stakes actions. Retrieving information is generally safe. Taking irreversible actions -- deleting data, sending money, publishing content -- should require human confirmation. Build a confirmation step into your workflow for any tool call that modifies state.

Building reliable tool-using agents

Making tool use work in demos is easy. Making it work reliably in production requires careful design.

Write clear, specific tool descriptions. The AI decides which tool to use and what parameters to pass based on your descriptions. Vague descriptions lead to wrong tool choices. "Get weather information for a specific city, including temperature in Celsius, conditions, and forecast" is much better than "Weather tool."

Handle errors gracefully. Tools fail -- APIs time out, databases return errors, services go down. Your system needs to tell the AI when a tool call failed and why, so it can inform the user or try an alternative approach. An AI that silently ignores failed tool calls will give users confusing or incorrect responses.

Test with adversarial inputs. Users will ask for things that do not map cleanly to your tools. "Cancel my order but also change the delivery address on my other order" requires the AI to identify two separate actions and execute them correctly. Test with compound, ambiguous, and edge-case requests.

Set appropriate timeouts. If a tool call takes more than a few seconds, the AI's response will feel slow. Set timeouts on tool execution and have the AI inform the user if an operation is taking longer than expected.

Common mistakes

Defining too many tools at once. If you give the AI 50 tools, it will struggle to choose the right one. Start with 3-5 well-defined tools and add more as needed. Most providers have soft limits on the number of tools that work well simultaneously.

Writing vague tool descriptions. The AI selects tools based on descriptions. If two tools have similar descriptions, the AI will frequently pick the wrong one. Be specific about when each tool should be used and what makes it different from similar tools.

Not handling the case where AI should not use a tool. Sometimes the best response is just text. If a user says "Thanks for your help," the AI should reply, not desperately search for a tool to call. Make sure your system allows the AI to respond without using tools.

Trusting AI-generated parameters without validation. The AI might generate a SQL injection in a query parameter, a negative number for a quantity, or a nonexistent user ID. Always validate before executing.

Skipping human-in-the-loop for consequential actions. The AI will occasionally misunderstand intent. "Delete my draft" could be misinterpreted if the user has multiple drafts. For any action that cannot be easily undone, add a confirmation step.

What's next?

Build on your tool use knowledge: