Most AI teams struggle with observability because their logs are inconsistent. One service logs model and tokens, another logs prompt versions, and a third logs nothing. When incidents happen, nobody can reconstruct what changed.
A telemetry schema fixes that by making the evidence predictable.
Design principles
- Minimise by default. Prefer structured metadata over raw prompts (see data minimisation).
- Version everything. Model, prompt template, policies, tools, and retrieval configs.
- Make joins easy. Use stable request IDs, session IDs and trace IDs (see AI observability).
- Separate sensitive fields. Store content-bearing fields behind stronger controls and shorter retention (see retention and deletion).
Core event types
Start with a small set of events that map to system layers:
- Request. Who/what/where: tenant, environment, workflow, intent, and policy pack.
- Retrieval. Sources queried, filters applied, hit counts, and freshness indicators (see retrieval quality).
- Generation. Model/provider, prompt version, token counts, latency, and refusal signals.
- Tool calls. Tool name, arguments summary, response status, retries, and idempotency keys (see tool authorisation).
- Outcome. User action, escalation, edits, task completion, and feedback (see usage analytics).
Fields that pay off
The fields you consistently wish you had during incidents:
- Model/provider, region, and deployment route (see routing and failover).
- Prompt template version and policy prompt versions.
- Retrieval configuration and source identifiers.
- Latency by stage and total cost/tokens (see chargeback).
- Error category and reason codes (see error taxonomy).
What not to log
Logging everything creates compliance and security problems. Avoid:
- Always-on raw prompt storage.
- Full tool outputs that may include sensitive records.
- Identifiers that are not needed for decisions.
Telemetry is data. Treat it with the same discipline you apply to production systems, audits and controls (see compliance audits).