Understanding RAG III: Fusion Retrieval and Reranking


In previous installments of this series, we discussed the fundamentals of Retrieval-Augmented Generation (RAG), its significance in the context of Large Language Models (LLMs), and the classic retriever-generator system. In this third article, we delve into an enhanced approach for building RAG systems: fusion retrieval.

Before we dive deeper, let’s briefly revisit the fundamental RAG scheme previously outlined.

Basic RAG Scheme

The classic RAG framework involves an initial retrieval phase where an information retrieval engine processes a user’s query for the LLM, converting it into a numerical vector. This vector is then utilized to search through a vast knowledge base for relevant documents. The retrieved documents augment the original query, which is subsequently sent to the LLM to generate an informed response.

By implementing fusion retrieval techniques during this retrieval phase, the context added to the original query can be enriched and more relevant, thus improving the quality of the final response generated by the LLM. Fusion retrieval capitalizes on insights gained from multiple retrieved documents, blending them into a clearer and more contextually appropriate output. It’s important to note that classic RAG can also retrieve multiple documents, rather than a single one. So, what distinguishes fusion retrieval from the traditional method?

The primary distinction lies in the processing and integration of multiple retrieved documents. In classic RAG, content from the retrieved documents is either concatenated or summarised extractively and then provided as context to the LLM. There are no advanced fusion strategies used in this scenario. In contrast, fusion retrieval utilizes specialized methods to amalgamate relevant information from multiple documents, enhancing the retrieval process either during augmentation or generation.

Fusion Retrieval Explained

In the augmentation stage, fusion retrieval encompasses techniques that reorder, filter, or combine documents before they are presented to the generator. Two notable methods include reranking and aggregation:

  • Reranking: This involves scoring documents and arranging them by relevance before incorporating them alongside the user prompt into the model. By doing so, the system can prioritize the most pertinent documents for the task at hand.
  • Aggregation: This process merges the most relevant portions of information from each document into a cohesive context. Classic information retrieval techniques like TF-IDF (Term Frequency-Inverse Document Frequency) or embedding operations are typically used in this context.

In the generation stage, fusion methods allow the LLM (the generator) to process each retrieved document independently, combining their insights when producing the final response. This essentially blends the augmentation and generation stages of RAG. One prevalent technique in this space is Fusion-in-Decoder (FiD), which enables the model to independently process each retrieved document and then integrates their insights during response generation. For a deeper understanding of FiD, further reading of related literature is recommended.

How Reranking Works

Reranking serves as a straightforward yet effective method for consolidating information from multiple retrieved sources. Here’s how reranking functions:

During the reranking process, the initial collection of documents retrieved by the engine is reordered to better align with the user’s query, enhancing relevance and improving the final output quality. The retrieved documents are relayed to a component known as the ranker, which reassesses them based on criteria such as learned user preferences. This includes applying sorting techniques to present the most relevant documents at the top of the ranking list.

Methods such as weighted averaging or other scoring mechanisms are employed to prioritize documents, thus ensuring that content from the highest-ranked documents is more likely to contribute to the final context than that from lower-ranked documents.

To illustrate reranking, let’s consider a scenario within the context of tourism in Eastern Asia. Suppose a traveler queries a RAG system for “top destinations for nature lovers in Asia.” An initial retrieval might yield a mix of general travel guides and articles about popular cities along with some recommendations for national parks. However, employing a reranking model, which incorporates user-specific preferences—such as preceding liked activities or favored destinations—can reorder these results to highlight the most relevant content. The reranked results might prioritize serene national parks, unique hiking trails, and eco-friendly tours tailored to nature enthusiasts.

In essence, reranking reorganizes the retrieved documents based on additional relevance criteria, ensuring that content from the most pertinent documents is prioritized in the extraction process, thereby enhancing the quality of the generated responses.


If you need any further modifications or additional information, let me know!

Leave a Comment