Once basic retrieval is working, the next quality ceiling is relevance. Users ask in messy language, knowledge bases contain duplicates, and “closest embedding” is not always “best answer.” The result is a familiar symptom: the right document exists, but the wrong one is surfaced.
RAG relevance improves fastest when you treat it like a search problem, not an LLM problem. The LLM can only be as good as the evidence you provide.
Use hybrid search as the default
Pure vector search misses exact keywords, IDs, and rare terms. Pure lexical search misses semantic matches. Hybrid search combines both:
- Lexical (BM25). Great for product codes, exact policy clauses, names.
- Vector search. Great for paraphrases and semantic similarity.
Most enterprise implementations benefit from hybrid retrieval and then a reranking stage that refines the top candidates.
Add a reranker (and measure the trade-off)
Rerankers are often the highest ROI improvement once recall is reasonable. They take the top N retrieved chunks and reorder them based on relevance. The trade-off is latency and cost—so you should budget reranking for higher-value intents and keep N bounded.
Query rewriting is part of the interface
Users don’t know your taxonomy. Query rewriting bridges that gap by transforming user queries into domain queries. Common patterns:
- Synonym expansion. “Annual leave” vs “PTO” vs “vacation.”
- Contextual filters. Add region, business unit, product line, tenant constraints.
- Entity extraction. Pull IDs, account names, or policy numbers into structured fields.
When query rewriting is used, log both original and rewritten queries so you can audit and improve it over time (see observability).
Close the loop with feedback
Relevance systems improve with feedback loops. Lightweight options include:
- User feedback (“this was helpful”) tied to retrieved source IDs.
- Click-through on citations (when you show them).
- Escalation signals (user asks again, or asks for a human).
Use these signals to create new golden queries and to prioritise content fixes, not just model changes. For evaluation approaches, see RAG evaluation playbook and enterprise search with LLMs.
Strong ranking and relevance is what makes RAG feel “smart.” Without it, even a strong model will look unreliable.