LLM systems create new data leakage paths: users paste sensitive information into prompts, tools return confidential records into the context window, and the model may echo sensitive details in outputs. Data loss prevention (DLP) for LLMs is the set of controls that keep those paths bounded.
Start with classification and allowed data rules
DLP is most effective when rules are explicit. Define what data is allowed to flow into prompts and tools, what must be redacted, and what must be blocked (see data classification).
Redact at capture time, not after the fact
Capture-time redaction prevents sensitive content from entering logs, caches, and vendor requests. Practical patterns:
- Field-level redaction for structured inputs.
- Pattern-based redaction for common identifiers (with careful tuning).
- Minimisation of context: prefer retrieval over payload stuffing (see data minimisation).
Scan outputs and tool results
Output scanning is a critical layer because leakage can happen after retrieval and tool use. Use policy layers that detect sensitive disclosures and unsafe content (see policy layering and tool authorisation).
Make logging safe and minimal
Teams often leak sensitive data via observability. Apply the same discipline to telemetry:
- Prefer structured metadata over raw prompts.
- Separate content-bearing fields behind stronger controls.
- Apply retention rules aligned to risk (see retention and deletion and telemetry schema).
Harden integrations and vendors
DLP controls often fail at integration boundaries. Require:
- Scoped secrets and least-privilege access (see secrets management).
- Clear vendor terms for data usage, retention, and region guarantees (see procurement and data residency).
- Segmentation across tenants and environments (see multi-tenancy).
Test leakage paths deliberately
Run adversarial tests that try to force disclosures: prompt injection, policy edge cases, and tool misuse. Convert findings into regression tests (see red teaming and regression testing).
DLP for LLMs is not one filter. It is a set of layers that reduce the chance of leaks and make incidents diagnosable when they occur.