Build a Hybrid RAG System with FAISS, BM25, LangGraph and Claude Sonnet Model

Last Updated on June 22, 2026 by Editorial Team Author(s): Alpha Iterations Originally published on Towards AI. Build a Hybrid RAG System with FAISS, BM25, LangGraph and Claude Sonnet Model Combine semantic search and keyword search into one powerful document Q&A app using Claude Sonnet 4.6 API, step by step tutorial Hybrid Retrieval (Image by Alpha Iterations, Created using ChatGPT) Non members read here for free. Introduction With the rapid advancement of Large Language Models and vector embeddings, Retrieval-Augmented Generation (RAG) has become the go-to solution for querying unstructured documents. Upload a PDF, ask a question, get an answer. It feels like magic. But sometimes, it is not enough. The silent failure mode of most RAG systems is not the LLM. It is the retrieval step. Dense vector search is powerful at finding semantically similar text. It understands that “urban spending” and “city expenditure” mean the same thing. But ask it for a specific error code, a contract clause number, or a precise financial figure, and it can silently return the wrong chunks with high confidence. On the other hand, keyword search like BM25 nails exact matches every time. But it has no concept of meaning. “Automobile” and “car” are completely different strings to it, and any paraphrased question will leave it lost. The uncomfortable truth is that neither retriever is universally better. Each dominates on a different class of queries. And in real-world documents like legal contracts, financial reports, and technical manuals, you will always have both kinds. Hybrid RAG solves this by running both retrievers in parallel and fusing their results using Reciprocal Rank Fusion. You get the semantic understanding of vector search and the precision of keyword search, in a single ranked list, at near-zero extra cost. In this article, we will build a complete Hybrid RAG system from scratch. FAISS for dense search, BM25 for keyword search, and Reciprocal Rank Fusion to merge the two ranked lists into a single, better-ranked result LangGraph for orchestration, and a Streamlit UI where you can toggle between retrieval modes and inspect every chunk and score behind each answer. Real-world use cases this solves Legal teams querying contracts for specific clause numbers (exact match) as well as intent (semantic) Financial analysts asking about EBITDA definitions and quarterly revenue figures in earnings reports Support engineers searching error codes in technical manuals while also asking about root-cause explanations Research teams querying across dozens of papers for both exact citations and conceptual similarity The complete end to end code can be referred to my github repo: agentic-ai-usecases/beginner/hybrid-rag at main · alphaiterations/agentic-ai-usecases This repository consists of agentic ai usecases. . Contribute to alphaiterations/agentic-ai-usecases development by… github.com The Problem with Single-Mode Retrieval Before jumping into code, it helps to understand why hybrid retrieval matters. Dense vector search Converts text into high-dimensional embeddings and finds the nearest neighbours by cosine similarity. It excels at paraphrasing: ‘What is the profit margin?’ finds chunks that say ‘net income as a percentage of revenue’ even though none of those words overlap with the query. But it can silently skip a chunk that contains ERR_4021 because that token was rare in training data and sits in an odd region of the embedding space. BM25 Best Match 25 is a classical information retrieval algorithm based on term frequency and inverse document frequency. It scores documents based on how many query words appear in them and how rare those words are across the whole corpus. It nails exact matches, part numbers, named entities, and specific terminology. The weakness is that it has no semantic understanding at all, so ‘automobile’ and ‘car’ are completely different words to BM25. Test Cases where Semantic Search & BM25 Fail (Image by Alpha Iterations) Hybrid retrieval Combines both signals. The merged ranked list tends to surface chunks that are simultaneously semantically relevant and lexically relevant, which is exactly what you want when your document contains a mix of technical terms and descriptive prose. Hybrid RAG — Best of both. (Image by Alpha Iterations. Created using ChatGPT) The question is: How do we decide which chunk to prioritize? RRF is the answer. RRF (Reciprocal Reranking Fusion): RRF is a rank-based merging algorithm that combines multiple ranked lists into a single, unified ranking without caring about the raw score values from any individual retriever. Instead of asking “which chunk scored highest overall?”, it asks “which chunk appeared near the top of the most lists?” RRF Steps. (Image by Alpha Iterations) The formula is simple: RRF score(d) = Σ 1 / (k + rank(d, list)) where k is a smoothing constant (typically 60) and rank(d, list) is the 1-indexed position of chunk d in a given retriever’s result list. The sum runs over every retriever that returned the chunk. RRF Calculation. (Image by Alpha Iterations) A few properties make RRF especially well-suited for hybrid retrieval: Score-scale agnostic: Cosine similarity from FAISS sits in the range [-1, 1]. BM25 scores are unbounded and document-length-dependent. These two numbers are not comparable you cannot simply average them. RRF sidesteps the problem entirely by converting everything to ranks first. Rewards cross-list agreement: A chunk that ranks 1st in BM25 and 2nd in vector search scores higher than a chunk that ranks 1st in only one list. The fusion step amplifies agreement, which is exactly the signal you want. Robust to outliers: A single retriever that confidently returns a wrong chunk at rank 1 can only contribute 1 / (60 + 1) ≈ 0.016 to the RRF score. If the other retriever did not return that chunk at all, it goes nowhere near the top. In practice, this means: when both retrievers agree on a chunk, it rises to the top. When only one retriever surfaces it, it still gets credit but not enough to dominate if another chunk had broader support. System Architecture Here is the full architecture of what we are going to build: Fig 1: Architecture of Hybrid RAG (Image by Alpha Iterations) Architecture note: Key design decision: FAISS and BM25 […]