What does this article cover?

How to detect and prevent unapproved changes in prompts, policies and routing that cause quality, safety or cost regressions.

Platform and engineering teams operating LLM systems across multiple environments and fast release cycles.

Configuration Drift in LLM Systems: Detecting Unapproved Prompt, Policy and Route Changes

Many AI regressions are not caused by the model. They are caused by drift: prompt templates changed without review, routing rules adjusted in production, or policy packs diverged across environments. Drift turns AI systems into moving targets and makes incidents harder to diagnose.

Define what counts as configuration

In LLM systems, configuration includes:

Prompt template versions and safety prompts.
Policy packs and output scanning thresholds.
Routing rules and fallback models (see routing).
Tool enablement and schemas (see tool authorisation).
Retrieval configuration and source allowlists for RAG.

Use registries as systems of record

Registries reduce drift by making the "official" versions explicit:

Prompt registry. Version prompts, policies and release history (see prompt registry).
Model registry. Track model routes, policy constraints and deployment gates (see model registry).

Detect drift continuously

Drift detection is a comparison problem: what is running vs what should be running. Practical patterns:

Config snapshots. Periodically snapshot runtime configuration and compare to registry state.
Decision logging. Record applied versions per request and alert on unknown versions (see decision logging).
Synthetic checks. Run golden queries and alert when behaviour shifts unexpectedly (see synthetic monitoring).

Make drift prevention part of delivery

Prevention is cheaper than detection. Use:

Feature flags and release rings for risky changes (see feature flags).
Approvals for high-risk controls and tool changes (see approvals).
Regression suites for prompt changes (see prompt regression testing).

Respond to drift like an incident

When drift is detected, treat it as an incident: freeze risky changes, restore known-good versions, and document what changed and why (see incident response and change freeze).

Stable AI systems are not built by avoiding change. They are built by controlling change and making drift visible.