Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems

This repository contains the source code and implementation for the paper "Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems." The project introduces a framework that optimizes retrieval processes in Retrieval-Augmented Generation systems, enhancing both the quality and efficiency of information retrieval for Open-Domain Question Answering tasks.

Overview

Retrieval-Augmented Generation (RAG) systems combine large language models (LLMs) with external knowledge retrieval to improve the relevance and accuracy of responses. However, traditional RAG systems often face issues such as low retrieval quality, irrelevant knowledge, and redundant retrievals. Our approach introduces a four-module synergy to tackle these limitations:

Query Rewriter+: Generate more nuanced and multi-faceted queries, enhancing search coverage and clarifying intent.
Knowledge Filter: A module that filters out irrelevant information using natural language inference (NLI) tasks, ensuring that only relevant knowledge is retrieved.
Memory Knowledge Reservoir: A caching mechanism that speeds up retrieval for recurring queries by utilizing previously retrieved external knowledge.
Retrieval Trigger: A calibration-based mechanism that determines when to initiate external knowledge retrieval based on the confidence level of existing information.

Our four-module synergy addresses these issues by improving response accuracy from 14% to 21% compared to directly querying the LLM and achieving around a 8%~12% improvement over the traditional RAG pipeline. Additionally, we can reduce response time cost by 46% and external knowledge retrieval cost by 71% without compromising response quality.

Motivation

The current limitations of RAG systems include the following:

Information Plateau: A single query limits the scope of retrieval, leading to less comprehensive information.
Ambiguity in Query Interpretation: Misaligned phrasing often results in unreliable responses.
Irrelevant Knowledge: Excessive retrieval can bring irrelevant information, reducing response quality.
Redundant Retrieval: Repeated questions result in inefficient use of computational resources.

Datasets

The following datasets were used for our experiments:

CAmbigNQ: A curated version of the AmbigNQ dataset with clarified questions, designed to address ambiguities.
NQ (Natural Questions): A dataset of real-world search engine queries.
PopQA: Focuses on less popular topics from Wikidata.
AmbigNQ: Contains ambiguous questions transformed into closely related queries.
2WIKIMQA & HotPotQA: Datasets requiring logical reasoning and multi-hop question answering.

We provide demo dataset for Q&A, and Fine-Tuning Gemma-2B in Records.

Key Findings

Query Rewriting: Clarifying ambiguous questions significantly improves retrieval precision.
Multi-Query Retrieval: Employing multiple, semantically varied queries enhances the amount of relevant information retrieved, overcoming the information plateau.
Knowledge Filtering: The Knowledge Filter reduces noise from irrelevant data, increasing the accuracy and reliability of RAG systems.
Efficiency: The use of the Memory Knowledge Reservoir accelerates repeated retrievals, reducing time cost by 46% at optimal configurations.

Installation

Clone the repository:

git clone https://github.com/Ancientshi/ERM4.git

Install dependencies:
```
cd ERM4
pip install -r requirements.txt
```
Download Demo Datasets from https://drive.google.com/drive/folders/1UYkFJqfuNbJJZUad-psssL4uSn4ttuAY?usp=sharing, move under ERM4.
Run the demo (knowledge retrieved from Bing search is pre-prepared for ease of use):
```
cd shell
bash ERM4.sh
```

Fine-tune Gemma-2B

cd shell
bash instruct_fine_tune_gemma.sh

Deploy the trained GEMMA-2B service for Flask to support API calls
```
cd shell
bash infer_gemma_rewriter.sh
```

Usage

The provided code includes a demo that illustrates how our four-module synergy works within a RAG system. The example retrieval process uses pre-fetched data from Bing searches to streamline the execution.

Key Considerations:

Prompt Design

In the demo, we provide a set of pre-designed prompts for each query. It’s important to note that these prompts may influence the results to some degree. If you're interested in further experimentation, we encourage adjusting these prompts to suit the format and style of each specific dataset. The provided prompts are meant to serve as a reference, and customizing them may yield different retrieval outcomes.

Reproducibility and Research Development

The source code is shared to promote advancements in the field and facilitate future research. However, we do not guarantee a 100% replication of the exact results reported in our paper. This is due to the rapid evolution of the RAG landscape, the dynamic nature of large language model (LLM) APIs, dynamics in search engine behaviors, and LLM's fine-tuning difference—all of which introduce considerable variance. Nevertheless, the key point is that the findings and conclusions should align with those in our work.

We hope our guideline can help you can continue to explore RAG systems, and contribute to the evolving discourse in this domain.

Contact

For any questions or contributions, please reach out to the project lead:

Yunxiao Shi
Email: [email protected]

Citation

If you find our work useful and would like to reference it, please cite our paper as follows:

@incollection{Shi2024,
  author    = {Yunxiao Shi and Xing Zi and Zijing Shi and Haimin Zhang and Qiang Wu and Min Xu},
  title     = {Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems},
  booktitle = {ECAI 2024},
  publisher = {IOS Press},
  year      = {2024},
  pages     = {2258--2265},
  doi       = {10.3233/FAIA240748},
  url       = {https://ebooks.iospress.nl/doi/10.3233/FAIA240748}
}

License

This project is licensed under the by-nc-sa 4.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Prompt		Prompt
shell		shell
Components.py		Components.py
LICENSE		LICENSE
README.md		README.md
bing.py		bing.py
config.py		config.py
evaluation.py		evaluation.py
finetune_gemma_rewriter.py		finetune_gemma_rewriter.py
infer_gemma_rewriter.py		infer_gemma_rewriter.py
main.py		main.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems

Overview

Motivation

Datasets

Key Findings

Installation

Usage

Key Considerations:

Prompt Design

Reproducibility and Research Development

Contact

Citation

License

About

Releases

Packages

Languages

License

Ancientshi/ERM4

Folders and files

Latest commit

History

Repository files navigation

Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems

Overview

Motivation

Datasets

Key Findings

Installation

Usage

Key Considerations:

Prompt Design

Reproducibility and Research Development

Contact

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages