Hybrid Retrieval: Combining BM25 and Embeddings for Smarter Search

In the world of information retrieval, no single method is perfect. While traditional keyword-based search engines like BM25 excel at precision and relevance for exact matches, modern AI-powered approaches like vector embeddings shine in semantic understanding. But what if you could combine both, achieving the best of keyword accuracy and contextual intelligence?

Page Contents

Welcome to Hybrid Retrieval, the future of smarter, more accurate search systems.

What Is Hybrid Retrieval?

Hybrid retrieval combines lexical search (like BM25 or TF-IDF) with semantic search (based on vector embeddings).

BM25 (Best Matching 25): A ranking function used by search engines to score documents based on keyword frequency and relevance. It’s great at finding documents containing the exact query terms.
Embeddings: High-dimensional numerical representations of text created by large language models (LLMs). They capture meaning rather than just words, allowing search systems to find semantically similar content.

In hybrid retrieval, results from both methods are merged or re-ranked, giving users both precision and depth in their search results.

Why Hybrid Retrieval Works So Well

1. Complementary Strengths

BM25 is perfect for exact matches, factual data, and keyword-based lookups.
Embeddings excel in conceptual searches, synonyms, or paraphrased questions.
Together, they minimise each other’s weaknesses.

2. Improved Recall and Relevance

Hybrid systems retrieve a wider set of relevant documents, both those that match exactly and those that are contextually related.

3. Reduced Hallucinations

By grounding LLM responses in hybrid search results, you improve factual accuracy, crucial for RAG (Retrieval-Augmented Generation) pipelines.

How Hybrid Retrieval Works

Step 1: Lexical Search (BM25)

When a user submits a query, BM25 retrieves the top N documents that match the query terms.
Example: Searching “best JavaScript frameworks” will prioritise documents containing those exact words.

Step 2: Semantic Search (Embeddings)

At the same time, the system computes a vector embedding for the query and retrieves similar document embeddings from a vector database (like Pinecone, Qdrant, or Weaviate).

Step 3: Merge and Rank

The results from both searches are combined using one of these strategies:

Weighted Scoring:
Final Score = α * BM25_score + β * Embedding_score
Reranking with LLMs:
Use an LLM to rerank the top combined results based on semantic context.
Reciprocal Rank Fusion (RRF):
A popular method that normalises ranks from both systems for balanced retrieval.

Practical Implementation Example (Python)

Here’s a simple implementation using Elasticsearch (BM25) and Pinecone (Embeddings):

from elasticsearch import Elasticsearch
from sentence_transformers import SentenceTransformer
import pinecone

# Initialize
es = Elasticsearch("http://localhost:9200")
model = SentenceTransformer('all-MiniLM-L6-v2')
pinecone.init(api_key="your_key", environment="us-east1-gcp")

index = pinecone.Index("hybrid-demo")

def hybrid_search(query, alpha=0.5):
    # BM25 Search
    bm25_results = es.search(index="docs", query={"match": {"content": query}}, size=5)

    # Embedding Search
    query_vector = model.encode(query).tolist()
    vector_results = index.query(vector=query_vector, top_k=5, include_metadata=True)

    # Merge results
    combined = {}
    for doc in bm25_results['hits']['hits']:
        combined[doc['_id']] = alpha * doc['_score']

    for match in vector_results['matches']:
        combined[match['id']] = combined.get(match['id'], 0) + (1 - alpha) * match['score']

    # Return top combined results
    sorted_results = sorted(combined.items(), key=lambda x: x[1], reverse=True)
    return sorted_results[:5]

When to Use Hybrid Retrieval

Use Case	Why It Works
Enterprise Search	Mixes keyword precision with semantic understanding of documents
Chatbots / RAG Apps	Improves factual grounding and reduces irrelevant context
Knowledge Bases	Handles diverse query styles (exact vs conceptual)
E-commerce Search	Finds products that match both keywords and intent

Challenges and Optimisation Tips

Balancing weights: Fine-tune α and β to suit your domain.
Performance: Dual retrieval is costlier; cache frequent queries.
Normalisation: Scale scores to comparable ranges before fusion.
Latency: Use async or batched requests to maintain real-time performance.

The Future of Hybrid Search

The next evolution of hybrid retrieval integrates contextual reranking models (such as ColBERT or Cross-Encoders) and multimodal embeddings for text, images, and audio — enabling even richer search experiences.

Hybrid search isn’t just a trend. It’s becoming the standard for robust, real-world information systems where both precision and meaning matter.

Parvesh Sandila

Parvesh Sandila is a passionate web and Mobile app developer from Jalandhar, Punjab, who has over six years of experience. Holding a Master’s degree in Computer Applications (2017), he has also mentored over 100 students in coding. In 2019, Parvesh founded Owlbuddy.com, a platform that provides free, high-quality programming tutorials in languages like Java, Python, Kotlin, PHP, and Android. His mission is to make tech education accessible to all aspiring developers.

Spread the love

Hybrid Retrieval: Combining BM25 & Embeddings

What Is Hybrid Retrieval?

Why Hybrid Retrieval Works So Well

1. Complementary Strengths

2. Improved Recall and Relevance

3. Reduced Hallucinations

How Hybrid Retrieval Works

Step 1: Lexical Search (BM25)

Step 2: Semantic Search (Embeddings)

Step 3: Merge and Rank

Practical Implementation Example (Python)

When to Use Hybrid Retrieval

Challenges and Optimisation Tips

The Future of Hybrid Search

Select Your Region

What Is Hybrid Retrieval?

Why Hybrid Retrieval Works So Well

1. Complementary Strengths

2. Improved Recall and Relevance

3. Reduced Hallucinations

How Hybrid Retrieval Works

Step 1: Lexical Search (BM25)

Step 2: Semantic Search (Embeddings)

Step 3: Merge and Rank

Practical Implementation Example (Python)

When to Use Hybrid Retrieval

Challenges and Optimisation Tips

The Future of Hybrid Search

Related Posts