Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems
This repository contains the source code and implementation for the paper "Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems." The project introduces a framework that optimizes retrieval processes in Retrieval-Augmented Generation systems, enhancing both the quality and efficiency of information retrieval for Open-Domain Question Answering tasks.
Retrieval-Augmented Generation (RAG) systems combine large language models (LLMs) with external knowledge retrieval to improve the relevance and accuracy of responses. However, traditional RAG systems often face issues such as low retrieval quality, irrelevant knowledge, and redundant retrievals. Our approach introduces a four-module synergy to tackle these limitations:
- Query Rewriter+: Generate more nuanced and multi-faceted queries, enhancing search coverage and clarifying intent.
- Knowledge Filter: A module that filters out irrelevant information using natural language inference (NLI) tasks, ensuring that only relevant knowledge is retrieved.
- Memory Knowledge Reservoir: A caching mechanism that speeds up retrieval for recurring queries by utilizing previously retrieved external knowledge.
- Retrieval Trigger: A calibration-based mechanism that determines when to initiate external knowledge retrieval based on the confidence level of existing information.
Our four-module synergy addresses these issues by improving response accuracy from 14% to 21% compared to directly querying the LLM and achieving around a 8%~12% improvement over the traditional RAG pipeline. Additionally, we can reduce response time cost by 46% and external knowledge retrieval cost by 71% without compromising response quality.
The current limitations of RAG systems include the following:
- Information Plateau: A single query limits the scope of retrieval, leading to less comprehensive information.
- Ambiguity in Query Interpretation: Misaligned phrasing often results in unreliable responses.
- Irrelevant Knowledge: Excessive retrieval can bring irrelevant information, reducing response quality.
- Redundant Retrieval: Repeated questions result in inefficient use of computational resources.
The following datasets were used for our experiments:
- CAmbigNQ: A curated version of the AmbigNQ dataset with clarified questions, designed to address ambiguities.
- NQ (Natural Questions): A dataset of real-world search engine queries.
- PopQA: Focuses on less popular topics from Wikidata.
- AmbigNQ: Contains ambiguous questions transformed into closely related queries.
- 2WIKIMQA & HotPotQA: Datasets requiring logical reasoning and multi-hop question answering.
We provide demo dataset for Q&A, and Fine-Tuning Gemma-2B in Records.
- Query Rewriting: Clarifying ambiguous questions significantly improves retrieval precision.
- Multi-Query Retrieval: Employing multiple, semantically varied queries enhances the amount of relevant information retrieved, overcoming the information plateau.
- Knowledge Filtering: The Knowledge Filter reduces noise from irrelevant data, increasing the accuracy and reliability of RAG systems.
- Efficiency: The use of the Memory Knowledge Reservoir accelerates repeated retrievals, reducing time cost by 46% at optimal configurations.
-
Clone the repository:
git clone https://github.com/Ancientshi/ERM4.git
-
Install dependencies:
cd ERM4 pip install -r requirements.txt
-
Download Demo Datasets from https://drive.google.com/drive/folders/1UYkFJqfuNbJJZUad-psssL4uSn4ttuAY?usp=sharing, move under ERM4.
-
Run the demo (knowledge retrieved from Bing search is pre-prepared for ease of use):
cd shell bash ERM4.sh
-
Fine-tune Gemma-2B
cd shell bash instruct_fine_tune_gemma.sh
-
Deploy the trained GEMMA-2B service for Flask to support API calls
cd shell bash infer_gemma_rewriter.sh
The provided code includes a demo that illustrates how our four-module synergy works within a RAG system. The example retrieval process uses pre-fetched data from Bing searches to streamline the execution.
In the demo, we provide a set of pre-designed prompts for each query. It’s important to note that these prompts may influence the results to some degree. If you're interested in further experimentation, we encourage adjusting these prompts to suit the format and style of each specific dataset. The provided prompts are meant to serve as a reference, and customizing them may yield different retrieval outcomes.
The source code is shared to promote advancements in the field and facilitate future research. However, we do not guarantee a 100% replication of the exact results reported in our paper. This is due to the rapid evolution of the RAG landscape, the dynamic nature of large language model (LLM) APIs, dynamics in search engine behaviors, and LLM's fine-tuning difference—all of which introduce considerable variance. Nevertheless, the key point is that the findings and conclusions should align with those in our work.
We hope our guideline can help you can continue to explore RAG systems, and contribute to the evolving discourse in this domain.
For any questions or contributions, please reach out to the project lead:
- Yunxiao Shi
Email: [email protected]
If you find our work useful and would like to reference it, please cite our paper as follows:
@incollection{Shi2024,
author = {Yunxiao Shi and Xing Zi and Zijing Shi and Haimin Zhang and Qiang Wu and Min Xu},
title = {Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems},
booktitle = {ECAI 2024},
publisher = {IOS Press},
year = {2024},
pages = {2258--2265},
doi = {10.3233/FAIA240748},
url = {https://ebooks.iospress.nl/doi/10.3233/FAIA240748}
}
This project is licensed under the by-nc-sa 4.0 License.