Master Advanced Search with Nuclia: Ranking, Fusion and Reranking Explained

Previously published on Nuclia.com. Nuclia is now Progress Agentic RAG.

What can be thought of as a simple query, may involve more steps than one can realize at first. The Nuclia platform supports multiple search modes, rank fusion and reranking and everything is exposed in the API to provide a custom search experience to satisfy your needs. But how can this really work? How can one extract the full potential of Nuclia search?

This document is a deep dive on how Nuclia scoring, ranking and reranking work. We’ll show how it’s done and how you can customize it for a better search experience.

At the end, we’ll show an example on how to use the API to improve queries across multilingual datasets.

Overview

Users usually use the /find or /ask endpoints to query their data. A common /find request may look like this:

{
"query": "how does nucliadb search and rank?",
"features": ["keyword", "semantic"],
"reranker": "predict",
"top_k": 20,
}

The NucliaDB database will search among your data and find the best results utilizing keyword and semantic search. Then it’ll use a cross-encoder to rerank the results and improve the order, and finally it’ll return the best 20 results. But how is the NucliaDB database choosing which are the best ones? How are keyword and semantic search combined? Which results are being used by the reranker? And for that matter, what’s is a reranker?

Concepts

Let’s first dive into the basic concepts and we’ll build up from there.

Scoring

Any information retrieval system, given a query and a corpus of data, will evaluate candidates, give a score to each document or paragraph and sort them by score. The Nuclia platform does the same thing, but it's more complicated, as it supports multiple ways to evaluate candidates and give scores to matches. In this blog, we’ll focus on two of them: keyword and semantic search. Each provides a different search experience. On one hand, keyword search is good for queries matching important words, but multilingual experience isn’t particularly good. On the other hand, semantic search with an appropriate model, is able to find content similar in meaning across languages and without matching words, but for keyword-like queries may not be the best either. The truth is there’s no best for everyone since it depends on the use case. Usually, the combination of both is the key to finding the information you ask for.

So, here's how we combine them. Each search mode has a scoring mechanism and results in a ranked list of matches (paragraphs) sorted from best to worse. These scores make sense in their own search mode, but are hard to compare with other search modes. Keyword search uses BM25 while semantic search uses a distance function between embeddings. Comparing scores directly is like comparing apples with oranges, they are fruits (sores) but choosing the best in not trivial.

Rank Fusion

A rank fusion algorithm is one able to merge multiple ranked lists into a single one. Those algorithms are usually simple, fast and fall into one of the following categories:

Score-based algorithms use each element score and produce a new unified score. When used, one is usually assuming comparable scores. Imagine for example having two lists of keyword searches both scored with BM25. One could simply merge them and sort again by score.
Rank-based algorithms assume the best apple is comparable to the best orange just because both are the best in their lists. This group of algorithms usually have some kind of boosting for matches in multiple lists.

Rank-based algorithms are really useful when the scores are unrelated, like BM25 and dot distance. A well-known algorithm in this category is Reciprocal Rank Fusion (RRF), which produces a new score per element depending on the rank and, if some element matches in multiple lists, its score is added. That way, we can merge unrelated scores and boost matches in both search modes.

We use RRF out of the box, so let’s illustrate how RRF would merge our previous sets of results:

Notice how keyword_4 and semantic_2 are now both_1. We have supposed both were pointing to the same text block and RRF boosts this kind of multiple matches, so its score is now greater than the rest.

In this example, unique elements with the same ranking in the original list have the same score. Thus keyword_1 and semantic_1 or keyword_3 and semantic_3 have the same score. We could change this, boosting one search mode or the other, if we wanted to.

Reranking

Once we have a unified list of scored elements, there’s an optional reranking step, where elements are revisited and reordered with some criteria. Our company provides the option to use a cross-encoder model to rerank the results.

In simpler words, a reranker is a model that will use the query and the results, compare each of them with the query and provide a new score depending on how well each result answers the query. This is an expensive process (compared with rank fusion), but the results quality improves a lot. As we have been “blindly” merging elements from keyword and semantic search, reranking boosts results quality.

Continuing with the example, given the merged results, we can use a reranker to change the final score and order of the results, so the most meaningful results for the query are shown first:

In the example, both_1 that was the first, wasn’t the best match for the query, and the reranker has considered two results to be better than it.

Answering when to use a reranker heavily depends on the use case. If we are reranking the top_k results, it is more useful on /find than /ask requests, as /find is usually presented to the user as a sorted list while the LLM in /ask will receive the results anyway. However, when reranking more elements than top_k, it’s useful in both use cases, as reordering will change which matches are considered the best. We’ll see more on how to rerank more elements than top_k soon.

The full picture

The whole example we’ve seen becomes:

A user performs a /find request, the system searches for keyword and semantic matches, performs rank fusion to merge all of them in a single list and finally reranks for a better result quality. Then the system returns the results asked by the user.

Search Tuning in Nuclia

Let’s see how to use all these options to customize the search experience and get better results.

Starting with scoring/ranking, the features parameter on the search endpoints have the keyword and semantic options to trigger these search modes. Both are used by default, so no action is usually needed.

Tuning rank fusion can be trickier as it depends on the dataset and the queries. The rank_fusion parameter in /find and /ask allows tuning rank fusion. RRF is the recommended algorithm and provides multiple options to customize the search experience, such as changing the k parameter or adding boosting/weights to certain search modes. There’s a boosting parameter that allows multiplying the RRF score by a constant. This is useful if, for some reason, we know keyword or semantic search will outperform the other.

Finally, the Nuclia cross-encoder reranker is available using the reranker=predict option in /find and /ask endpoints.

But wait, what about the amount of results each step uses?

Defining windows

Sometimes, asking for 20 results and doing all these steps with 20 elements is not good enough. That’s why the Nuclia platform allows defining a window for rank fusion and use another window for reranking out of the box.

A window here means using more results than top_k in some steps in order to improve search quality. In RRF for example, a match appearing in multiple lists increases its score. Then, having longer lists will increase the chances of multi-match and thus, increase scores of some results.

In reranking the same thing happens, if we choose the top 20 across 50 will give better results than simply reordering your already chosen 20 results.

Let’s see it in more detail. Imagine a rank_fusion window M, a reranker window N and we asked for the top K results:

In the first step, retrieval is performed. Each search mode will produce a list of, at most, M results. Rank fusion combines those results into a list of, at most, 2M elements that are cut to fit the rank fusion window M. Before sending to rerank, we cut again to N, the reranker window. Those results are sent to the reranker model and returned in a different order. With those, we finally remove the exceeding results and cut to K (our defined top_k parameter)

As we can see, increasing the window sizes has a cost, as we are processing more results for the same query, but provides better search experience and quality results. As always, there’s a tradeoff between quality and latency that each use case should evaluate to get the most of the Nuclia platform.

Use Case Example: Search Across Languages

After all these details, let’s see how everything fits in a real-world example.

Imagine you have a multilingual knowledge base, and queries are usually done in a different language than your content. You’re quite sure keyword search won’t provide the best results, although it can be useful for names, abbreviations and words used in all languages. Semantic search though, will bring the best matches. A combination of both will be the best.

In the rank fusion step, as we are quite sure semantic will be usually better, we’ll use RRF with semantic boosting. We’ll double the scores of semantic results compared with keyword ones, so the end result contains more matches coming from semantic search.

Finally, we’ll use a cross-encoder to rerank the final results and boost the best ones.

Considering windows, we decide to ask for the 20 best, rerank the 50 bests and rank fusion 80.

In an actual request, these are the parameters to use in /find or /ask:

{
    "features": ["semantic", "keyword"],
    "rank_fusion": {
        "name": "rrf",
        "boosting": {
            "semantic": 2,

        },
        "window": 80,
    },
    "reranker": {
        "name": "predict",
        "window": 50,
    },
    "top-k": 20,
    ...
}

Conclusion

A quality search experience may involve more things that one can think at first glance. The feature-rich and flexible Nuclia RAG system allows its users to find the data they are looking for, then fine tune parameters as needed to get the best out of your data. Understanding how our platform does retrieval can help you get better results.

Different search modes provide a different look at your data and how you find it. The classic keyword search finds results by relevant words while semantic search can find meaning across languages and expressions.

To combine search modes, a rank fusion algorithm is needed. Prior knowledge on how your data is and how it’ll be queried can improve how you query it.

Reranking is an added step that trades off latency for quality.

And finally, consider how different windows play a role in the amount of results evaluated and how they can improve your overall search experience.

What are you waiting for? You can now try it out by yourself!

Agentic RAG