Multi-tenant RAG is a high-leverage capability: one platform, many customers, shared improvements. It is also a high-risk capability. The biggest trust failure is cross-tenant leakage: a user receives content they are not entitled to see, often because of a subtle filter bug or stale permissions.
Tenant isolation in RAG is not a single mechanism. It is a system of controls across ingestion, indexing, retrieval, caching and observability.
Choose an isolation strategy: partitioning vs filtering
There are three common strategies, each with trade-offs:
- Physical partitioning. Separate indexes per tenant. Strong isolation, higher operational overhead.
- Logical partitioning. Shared index with strict tenant filters. Lower overhead, higher correctness demands.
- Hybrid. Partition by sensitive segments (enterprise tenants) and share for lower-risk tenants.
The right choice depends on your threat model, scale and compliance constraints (see AI multi-tenancy).
Make entitlements first-class metadata
Tenant isolation depends on correct metadata. Capture and store:
- tenant_id and domain identifiers for every chunk.
- ACLs or group entitlements for every document.
- Source ownership and provenance for auditing.
Do not treat permissions as optional tags. Treat them as core data that drives retrieval (see RAG permissions design).
Enforce filters at retrieval time, not post-processing
A common bug pattern is to retrieve from the shared index, then filter results after ranking. This is risky because unsafe content may influence the model or leak through citations. Prefer:
- Filters applied inside the retrieval query.
- Ranking performed only within eligible results.
- Retrieval logs that record which filters were applied.
For hybrid systems, apply tenant isolation first, then any topical or freshness ranking (see freshness evaluation).
Synchronise ACL changes and deletes reliably
Isolation failures often come from stale permissions: a user was removed from a group, but the index still serves content. Your ingestion pipeline must handle:
- ACL sync. Permission updates propagate quickly and invalidate cached results.
- Deletes. Content deletions remove chunks and prevent resurrection (see deletion workflows).
- Connector hardening. Provenance and change detection by source system (see connector hardening).
Design caches with entitlement-aware keys
Caching is a classic place for leakage. If you cache answers or retrieval results without including entitlement context, you can serve one tenant's result to another. Cache keys should include:
- tenant_id
- user or group entitlement hash (where applicable)
- retrieval config version and filters
This is also part of safe LLM caching (see caching strategies).
Prove isolation with tests and red team scenarios
Isolation must be demonstrated, not assumed. Add validation:
- Automated tests that ensure cross-tenant retrieval returns zero results.
- Golden queries per tenant, run as a regression suite (see RAG benchmark harness).
- Red team scenarios that try to bypass filters with prompt injection and tool abuse (see red teaming).
Make isolation observable and auditable
In production, you need visibility. Track:
- Retrieval events with tenant, user and applied filter fields.
- Policy blocks when content is rejected due to entitlement mismatch.
- Alerts for suspicious patterns (unexpected tenant IDs, filter gaps).
Connect this to identity and session security (see identity and session security).
Tenant isolation is not a feature you "add later". It is an architectural decision that touches the entire RAG pipeline. If you build it deliberately, you can scale multi-tenant RAG safely and confidently.