Basic RAG assumes a single retrieval step: embed the query, fetch top chunks, generate an answer. That works for many questions, but it breaks down when queries are ambiguous, multi-part, or require joining multiple facts.
Query orchestration is the set of patterns that make RAG more deliberate: clarify intent, retrieve iteratively, and generate answers with an explicit plan.
Start by making the question explicit
Many failures are query mismatch: the user asks in one language, the corpus is organised in another. Use a lightweight query rewrite step that preserves intent and adds domain terms (see ranking and relevance). Log both the original and rewritten query for auditability (see observability).
Decompose multi-part questions
When a user asks a question with multiple constraints, split it into sub-questions:
- Definitions and policy context.
- Specific rules or thresholds.
- Exceptions and regional variants.
- Execution steps or required approvals.
Each sub-question can retrieve from a different part of the corpus, improving recall without flooding the context window.
Iterate retrieval instead of guessing
In high-stakes workflows, it is better to iterate than hallucinate:
- First pass. Retrieve broad candidates.
- Second pass. Retrieve within the best domain or entity scope.
- Third pass. Retrieve the specific section that answers the sub-question.
Track retrieval coverage and failure modes (see retrieval quality and common RAG failures).
Use an answer plan and require evidence
Before writing the final answer, build a short answer plan:
- What claims will be made?
- What sources support each claim?
- What information is missing?
Then generate an answer that includes citations for key claims (see citations and grounding).
Make orchestration observable and controllable
Orchestration adds moving parts. Treat it like a system:
- Version the orchestration logic and retrieval configs.
- Capture decisions and reason codes (see decision logging).
- Use evaluation suites and golden queries to prevent regressions (see RAG evaluation and synthetic monitoring).
Query orchestration does not make a model smarter. It makes the retrieval and evidence process more reliable - which is usually what users actually need.