What does this article cover?

A practical schema for AI telemetry that supports debugging, measurement and audits without collecting unnecessary sensitive data.

Platform, SRE and product teams operating LLM features who need reliable observability and defensible records.

AI Telemetry Schema: What to Log for LLM Features (and What Not to)

Most AI teams struggle with observability because their logs are inconsistent. One service logs model and tokens, another logs prompt versions, and a third logs nothing. When incidents happen, nobody can reconstruct what changed.

A telemetry schema fixes that by making the evidence predictable.

Design principles

Minimise by default. Prefer structured metadata over raw prompts (see data minimisation).
Version everything. Model, prompt template, policies, tools, and retrieval configs.
Make joins easy. Use stable request IDs, session IDs and trace IDs (see AI observability).
Separate sensitive fields. Store content-bearing fields behind stronger controls and shorter retention (see retention and deletion).

Core event types

Start with a small set of events that map to system layers:

Request. Who/what/where: tenant, environment, workflow, intent, and policy pack.
Retrieval. Sources queried, filters applied, hit counts, and freshness indicators (see retrieval quality).
Generation. Model/provider, prompt version, token counts, latency, and refusal signals.
Tool calls. Tool name, arguments summary, response status, retries, and idempotency keys (see tool authorisation).
Outcome. User action, escalation, edits, task completion, and feedback (see usage analytics).

Fields that pay off

The fields you consistently wish you had during incidents:

Model/provider, region, and deployment route (see routing and failover).
Prompt template version and policy prompt versions.
Retrieval configuration and source identifiers.
Latency by stage and total cost/tokens (see chargeback).
Error category and reason codes (see error taxonomy).

What not to log

Logging everything creates compliance and security problems. Avoid:

Always-on raw prompt storage.
Full tool outputs that may include sensitive records.
Identifiers that are not needed for decisions.

Telemetry is data. Treat it with the same discipline you apply to production systems, audits and controls (see compliance audits).