When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets
Official repository for the paper When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets including code to reproduce and links to the data generated by the models.
This project presents a comprehensive study on generative query and document expansions across various methods, retrievers, and datasets. It aims to identify when these expansions fail and provide insights into improving information retrieval systems.
The generations from the models can be found at orionweller/llm-based-expansions-generations, organized by dataset and expansion type.
- Python 3.10
- conda
- OpenAI API key (for using OpenAI models)
- Together.ai or Anthropic API keys (if using their services)
- GPU (if using Llama for generation)
- pyserini (for BM25 results reproduction)
-
Clone the repository:
git clone https://github.com/orionw/LM-expansions.git cd LM-expansions
-
Install the correct Python environment:
conda env create --file=environment.yaml -y && conda activate expansions
-
Download the local data:
git clone https://huggingface.co/datasets/orionweller/llm-based-expansions-eval-datasets
This dataset contains local data not available on Huggingface, such as
scifact-refute
and other datasets formatted in a common format. To reproduce the creation ofscifact-refute
, check outscripts/make_scifact_refute.py
.
-
Set up your environment variables (e.g.,
OPENAI_API_KEY
) if using OpenAI models. -
Create or modify a prompt config. Examples are in
prompt_configs/*
. For instance:bash generate_expansions.sh scifact_refute prompt_configs/chatgpt_doc2query.jsonl
-
Adjust parameters as needed:
num_examples
: maximum number of instances to predicttemperature
: controls the randomness of predictions
Note: If using Together.ai or Anthropic API keys, define them accordingly. For Llama generation, ensure you're using a GPU.
-
Run the model using the following command structure:
bash rerank.sh <dataset name> <name of run> <shard id> <num shards> <query expansion path or "none"> <"none" if not using document expansions otherwise "replace" or "append" the query with the expansion> <document expansion path or "none"> <"none" if not using query expansions otherwise "replace" or "append" the query with the expansion> <model name> <number of queries to run> <number of docs to run>
Example:
bash rerank.sh "scifact_refute" "testing" 0 1 "none" "none" "llm-based-expansions-generations/scifact_refute/expansion_hyde_chatgpt64.jsonl" "replace" "contriever_msmarco" 10 100
-
Results will be written to
results/<dataset name>/<name of run>/<dataset name>-<name of run>-run.txt
. -
Evaluate the results:
bash evaluate.sh scifact_refute testing
To reproduce the top 1000 BM25 results:
-
Install
pyserini
following their installation docs. -
Run the BM25 retrieval:
bash make_bm25_run.sh <your folder> <your dataset name> <document id field> <document text fields> <query id field> <query text fields>
Example:
bash make_bm25_run.sh bm25 scifact_refute doc_id "title,text" query_id text
This project is licensed under the MIT License - see the LICENSE file for details.
If you found the code, data or paper useful, please cite:
@inproceedings{weller-etal-2024-generative,
title = "When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets",
author = "Weller, Orion and
Lo, Kyle and
Wadden, David and
Lawrie, Dawn and
Van Durme, Benjamin and
Cohan, Arman and
Soldaini, Luca",
booktitle = "Findings of the Association for Computational Linguistics: EACL 2024",
month = mar,
year = "2024",
address = "St. Julian{'}s, Malta",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-eacl.134",
pages = "1987--2003",
}
This project also built off of many others (see the paper for a full list of references), including code from TART and InPars, please check them and the others out!