Prompt templates and system instructions are often treated like harmless text. In production, they can contain sensitive IP: decision logic, policy rules, and proprietary workflows. Prompt leakage is a real risk - and it can happen through user prompts, tool outputs, logs, or shared debugging channels.
Assume users will ask for the system prompt
Many prompt leakage events are not sophisticated. Users simply ask: "show me your instructions". Your system should be designed to refuse that safely and consistently (see policy layering).
Reduce what needs protecting
The strongest defence is minimisation:
- Keep sensitive logic in code and policy engines, not in giant prompts.
- Use smaller, composable prompt blocks.
- Store only prompt identifiers in telemetry where possible (see telemetry schema).
Compartmentalise prompts and tools
Avoid a single prompt that contains everything. Use a layered architecture:
- System policy layer. High-level boundaries and safe behaviour.
- Task layer. Workflow-specific instructions.
- Evidence layer. Retrieved sources and tool results.
Then treat tool outputs as untrusted input and constrain what tools can return (see safe tooling).
Protect observability from becoming a leak
Teams accidentally leak prompts through logs and debugging exports. Apply controls:
- Do not log raw prompts by default (see data minimisation).
- Separate content-bearing fields behind stronger access controls and shorter retention (see retention and deletion).
- Use redaction and DLP scanning for exports and tickets (see DLP for LLM systems).
Test for prompt exfiltration
Prompt exfiltration attempts often use prompt injection or tool exploitation. Include these in your adversarial test set and regression suite (see prompt injection defence and red teaming).
Have a response plan
If prompts leak, treat it as a security incident: rotate secrets, review logs and exports, and update controls. Capture what prompt versions were exposed and when (see incident response and prompt change control).
Prompt confidentiality is not about hiding everything. It is about ensuring that sensitive control logic and IP are not casually exposed through normal product usage or operations.