Tool Calling in Production: Protect Your APIs from the LLM
Your agent works perfectly in development. All tests pass. You deploy. Three days later you get a production error you should never see: an API threw an exception because it received a numeric field as a string. Or a required field was missing. Or the value was out of range.
The model generated an invalid tool call. And nothing stopped it before it reached your system.
The bug nobody tests in development
When you build an agent with tool calling — the mechanism by which an LLM decides to invoke an external function — your tests almost always cover the happy path: the model generates correct JSON, the tool runs, the result comes back.
What rarely gets tested:
- The model generates
"amount": "150"instead of"amount": 150— number as string - It omits
customerIdbecause it wasn't relevant in the previous prompt turn - It invents a past date:
"dueDate": "2024-03-01" - It passes a syntactically valid UUID that doesn't exist in your database
These aren't model bugs. They're statistical properties of LLMs: given enough samples, they will occur. In production, with thousands of real calls, the sample is always large enough.
The most common architectural mistake is having no layer between the model's output and the tool execution.
Three validation layers for production agents
The defense is built in three levels. Each one catches a different type of error.
Layer 1 — Schema validation with Zod
The first line of defense verifies that the JSON generated by the model satisfies the tool's contract. Never execute a tool call with raw model arguments:
import { z } from 'zod';
const SendInvoiceSchema = z.object({
customerId: z.string().uuid(),
amount: z.number().positive(),
currency: z.enum(['EUR', 'USD', 'GBP']),
dueDate: z.string().regex(/^\d{4}-\d{2}-\d{2}$/),
});
type SendInvoiceArgs = z.infer<typeof SendInvoiceSchema>;
function validateToolCall(name: string, args: unknown): SendInvoiceArgs {
const result = SendInvoiceSchema.safeParse(args);
if (!result.success) {
throw new ToolValidationError(name, result.error.format());
}
return result.data;
}
If validation fails, you never reach the API. The error is handled internally — not as a production exception.
Layer 2 — Retry with error feedback to the model
Validation alone isn't enough if you simply terminate the process on failure. What makes an agent robust is that when validation fails, it returns the error to the model and gives it a chance to correct itself:
async function executeWithRetry<T>(
toolName: string,
rawArgs: unknown,
validator: (args: unknown) => T,
executor: (args: T) => Promise<unknown>,
llm: LLMClient,
maxAttempts = 3
): Promise<unknown> {
let currentArgs = rawArgs;
let lastError: string | null = null;
for (let attempt = 0; attempt < maxAttempts; attempt++) {
try {
const validArgs = validator(currentArgs);
return await executor(validArgs);
} catch (err) {
lastError = err instanceof Error ? err.message : String(err);
if (attempt + 1 >= maxAttempts) break;
// Feed the error back to the model so it can correct the call
currentArgs = await llm.correctToolCall(toolName, currentArgs, lastError);
}
}
throw new Error(
`Tool ${toolName} failed after ${maxAttempts} attempts. Last error: ${lastError}`
);
}
The correctToolCall method sends the model a message like: "The tool call you generated for send_invoice failed with this error: amount must be a positive number, you passed a string. Please correct the arguments." The model generates a fixed version. This loop happens in milliseconds and is invisible to the end user.
Layer 3 — Business logic validation
There are errors that schema validation will never catch because they're syntactically valid but semantically wrong from a business perspective:
amount: 0.001— a valid positive number, but an invoice for 0.1 cents is a mistakedueDate: "2020-01-01"— valid format, but in the pastcustomerId: "valid-uuid-for-a-deactivated-customer"— correct UUID, inactive customer
These validations go inside the tool itself, before the actual operation runs:
async function sendInvoice(args: SendInvoiceArgs): Promise<void> {
if (args.amount < 1) {
throw new BusinessValidationError('Minimum invoice amount is €1.00');
}
const dueDate = new Date(args.dueDate);
if (dueDate < new Date()) {
throw new BusinessValidationError('Due date cannot be in the past');
}
const customer = await db.customer.findUnique({ where: { id: args.customerId } });
if (!customer?.active) {
throw new BusinessValidationError(
`Customer ${args.customerId} not found or inactive`
);
}
await invoiceService.create(args);
}
Business errors feed back to the model through the same retry pattern from layer 2.
What separates a prototype from a real production agent
A development agent works when the model is cooperative and the input is what you expect. A production agent works under adverse conditions: unexpected inputs, occasional hallucinations, process restarts.
The three layers — schema, retry with feedback, business validation — are not optional optimizations. They're the difference between a prototype and a system that can touch real money, customer data, or operational workflows without constant supervision.
The earlier post on stateful AI agents covers the persistent state problem. This is the sibling problem: what happens when the LLM generates an incorrect call. They're two distinct layers of the same robust architecture.
On every AI integration and automation agent project we build at DAILYMP, these three layers are baked in from the initial design. Adding them as a patch after the first visible production failure is more expensive, riskier, and harder to reason about.
If you have an agent already in production that has no defined behavior when the model generates an invalid tool call, it's worth a review before the first real incident: