What does this article cover?

How to get reliable JSON and tool arguments from LLMs using schemas, validation, retries and safe fallbacks.

Engineering teams building agentic workflows and automations that need predictable, machine-parseable outputs.

Structured Outputs for LLMs: Schemas, Validation and Repair Loops

LLMs are great at language. Production systems, however, run on structured data. If you want reliable automations—routing, ticket updates, workflows, agent tool calls—you need the model to produce machine-parseable outputs that can be validated and executed safely.

The pattern is consistent across most successful deployments: schema-first design, strict validation, and bounded repair loops.

Start schema-first

Define a schema before you write the prompt. Treat it like an API contract:

Use JSON schema or typed function signatures. Make required fields explicit and restrict enums where possible.
Design for operations. Include IDs, timestamps and trace fields so you can audit and replay.
Version everything. Schema versions should be logged with every output (see change control).

The biggest anti-pattern is “free-form JSON”. If the model can invent keys, it will—and your downstream system will fail in ways that are hard to debug.

Validate like you would validate untrusted input

Model outputs should be treated as untrusted input, just like web form submissions:

Parse strictly. Use a JSON parser with no tolerance for trailing text.
Validate types and constraints. Reject wrong types, missing required fields, and out-of-range values.
Sanitise strings. Prevent injection into downstream tools (SQL, shell, email templates).
Apply allow-lists. If a tool should only act on specific resources, enforce that outside the model.

Validation is where safety becomes real. Prompts are guidance; validation is enforcement.

Repair loops: bounded, observable, and safe

Even with schemas, models sometimes produce invalid outputs. A repair loop is a controlled retry that gives the model the validation errors and asks it to produce a corrected payload. The key is to bound and observe it:

Bound retries. Usually 1–2 retries is enough; beyond that you are masking deeper issues.
Return structured errors. Show the model the exact fields that failed validation.
Have a safe fallback. Escalate to a human, route to a narrower workflow, or return a refusal.

Repair loops should not be invisible. Log validation error types, retry counts and the final outcome. These become leading indicators for drift and regressions (see drift monitoring).

Tool calls: separate “decide” from “do”

For agentic workflows, the structured output often becomes a tool invocation. The safest architecture separates two steps:

Decision. The model proposes a tool call with arguments.
Authorisation + execution. A policy layer checks permissions, constraints and risk, then executes (see authorising tool use).

This prevents “prompt-only security” and gives you an auditable place to apply controls and approvals.

Test structured outputs with real failure modes

Most teams only test the happy path. Reliability comes from deliberately testing the failures:

Missing fields, swapped types, invalid enums, and truncated JSON.
Adversarial text that tries to smuggle instructions into fields.
Ambiguous user requests that tempt the model to guess.

Use evaluation harnesses to automate these checks and catch regressions when models or prompts change (see evaluation loops).

Structured output is one of the highest ROI improvements in enterprise AI systems: it makes automations safe, testable and operable—without requiring the model to be “perfect”.