Tool Calling in Production: Protect Your APIs from the LLM

AI Integration

•30 de junio de 2026•5 min read•Por Daily Miranda Pardo

Your agent works perfectly in development. All tests pass. You deploy. Three days later you get a production error you should never see: an API threw an exception because it received a numeric field as a string. Or a required field was missing. Or the value was out of range.

The model generated an invalid tool call. And nothing stopped it before it reached your system.

The bug nobody tests in development

When you build an agent with tool calling — the mechanism by which an LLM decides to invoke an external function — your tests almost always cover the happy path: the model generates correct JSON, the tool runs, the result comes back.

What rarely gets tested:

The model generates "amount": "150" instead of "amount": 150 — number as string
It omits customerId because it wasn't relevant in the previous prompt turn
It invents a past date: "dueDate": "2024-03-01"
It passes a syntactically valid UUID that doesn't exist in your database

These aren't model bugs. They're statistical properties of LLMs: given enough samples, they will occur. In production, with thousands of real calls, the sample is always large enough.

The most common architectural mistake is having no layer between the model's output and the tool execution.

Three validation layers for production agents

The defense is built in three levels. Each one catches a different type of error.

Layer 1 — Schema validation with Zod

The first line of defense verifies that the JSON generated by the model satisfies the tool's contract. Never execute a tool call with raw model arguments:

import { z } from 'zod';

const SendInvoiceSchema = z.object({
  customerId: z.string().uuid(),
  amount: z.number().positive(),
  currency: z.enum(['EUR', 'USD', 'GBP']),
  dueDate: z.string().regex(/^\d{4}-\d{2}-\d{2}$/),
});

type SendInvoiceArgs = z.infer<typeof SendInvoiceSchema>;

function validateToolCall(name: string, args: unknown): SendInvoiceArgs {
  const result = SendInvoiceSchema.safeParse(args);
  if (!result.success) {
    throw new ToolValidationError(name, result.error.format());
  }
  return result.data;
}

If validation fails, you never reach the API. The error is handled internally — not as a production exception.

Layer 2 — Retry with error feedback to the model

Validation alone isn't enough if you simply terminate the process on failure. What makes an agent robust is that when validation fails, it returns the error to the model and gives it a chance to correct itself:

async function executeWithRetry<T>(
  toolName: string,
  rawArgs: unknown,
  validator: (args: unknown) => T,
  executor: (args: T) => Promise<unknown>,
  llm: LLMClient,
  maxAttempts = 3
): Promise<unknown> {
  let currentArgs = rawArgs;
  let lastError: string | null = null;

  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    try {
      const validArgs = validator(currentArgs);
      return await executor(validArgs);
    } catch (err) {
      lastError = err instanceof Error ? err.message : String(err);
      if (attempt + 1 >= maxAttempts) break;

      // Feed the error back to the model so it can correct the call
      currentArgs = await llm.correctToolCall(toolName, currentArgs, lastError);
    }
  }

  throw new Error(
    `Tool ${toolName} failed after ${maxAttempts} attempts. Last error: ${lastError}`
  );
}

The correctToolCall method sends the model a message like: "The tool call you generated for send_invoice failed with this error: amount must be a positive number, you passed a string. Please correct the arguments." The model generates a fixed version. This loop happens in milliseconds and is invisible to the end user.

Layer 3 — Business logic validation

There are errors that schema validation will never catch because they're syntactically valid but semantically wrong from a business perspective:

amount: 0.001 — a valid positive number, but an invoice for 0.1 cents is a mistake
dueDate: "2020-01-01" — valid format, but in the past
customerId: "valid-uuid-for-a-deactivated-customer" — correct UUID, inactive customer

These validations go inside the tool itself, before the actual operation runs:

async function sendInvoice(args: SendInvoiceArgs): Promise<void> {
  if (args.amount < 1) {
    throw new BusinessValidationError('Minimum invoice amount is €1.00');
  }

  const dueDate = new Date(args.dueDate);
  if (dueDate < new Date()) {
    throw new BusinessValidationError('Due date cannot be in the past');
  }

  const customer = await db.customer.findUnique({ where: { id: args.customerId } });
  if (!customer?.active) {
    throw new BusinessValidationError(
      `Customer ${args.customerId} not found or inactive`
    );
  }

  await invoiceService.create(args);
}

Business errors feed back to the model through the same retry pattern from layer 2.

What separates a prototype from a real production agent

A development agent works when the model is cooperative and the input is what you expect. A production agent works under adverse conditions: unexpected inputs, occasional hallucinations, process restarts.

The three layers — schema, retry with feedback, business validation — are not optional optimizations. They're the difference between a prototype and a system that can touch real money, customer data, or operational workflows without constant supervision.

The earlier post on stateful AI agents covers the persistent state problem. This is the sibling problem: what happens when the LLM generates an incorrect call. They're two distinct layers of the same robust architecture.

On every AI integration and automation agent project we build at DAILYMP, these three layers are baked in from the initial design. Adding them as a patch after the first visible production failure is more expensive, riskier, and harder to reason about.

If you have an agent already in production that has no defined behavior when the model generates an invalid tool call, it's worth a review before the first real incident:

Let's review your agent →