G-Retriever

This repository contains the source code for the paper "G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering".

We introduce G-Retriever, a flexible question-answering framework targeting real-world textual graphs, applicable to multiple applications including scene graph understanding, common sense reasoning, and knowledge graph reasoning.

G-Retriever integrates the strengths of Graph Neural Networks (GNNs), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG), and can be fine-tuned to enhance graph understanding via soft prompting.

News

[2024.09] PyG 2.6 now supports G-Retriever! 🎉 [Dataset][Model]

Citation

@article{he2024g,
  title={G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering},
  author={He, Xiaoxin and Tian, Yijun and Sun, Yifei and Chawla, Nitesh V and Laurent, Thomas and LeCun, Yann and Bresson, Xavier and Hooi, Bryan},
  journal={arXiv preprint arXiv:2402.07630},
  year={2024}
}

Environment setup

conda create --name g_retriever python=3.9 -y
conda activate g_retriever

# https://pytorch.org/get-started/locally/
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia

python -c "import torch; print(torch.__version__)"
python -c "import torch; print(torch.version.cuda)"
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.1+cu118.html

pip install peft
pip install pandas
pip install ogb
pip install transformers
pip install wandb
pip install sentencepiece
pip install torch_geometric
pip install datasets
pip install pcst_fast
pip install gensim
pip install scipy==1.12
pip install protobuf

Download the Llama 2 Model

Go to Hugging Face: https://huggingface.co/meta-llama/Llama-2-7b-hf. You will need to share your contact information with Meta to access this model.
Sign up for a Hugging Face account (if you don’t already have one).
Generate an access token: https://huggingface.co/docs/hub/en/security-tokens.
Add your token to the code file as follows:

From transformers import AutoModel
access_token = "hf_..."
model = AutoModel.from_pretrained("private/model", token=access_token)

Data Preprocessing

# expla_graphs
python -m src.dataset.preprocess.expla_graphs
python -m src.dataset.expla_graphs

# scene_graphs, might take
python -m src.dataset.preprocess.scene_graphs
python -m src.dataset.scene_graphs

# webqsp
python -m src.dataset.preprocess.webqsp
python -m src.dataset.webqsp

Training

Replace path to the llm checkpoints in the src/model/__init__.py, then run

1) Inference-Only LLM

python inference.py --dataset scene_graphs --model_name inference_llm --llm_model_name 7b_chat

2) Frozen LLM + Prompt Tuning

# prompt tuning
python train.py --dataset scene_graphs_baseline --model_name pt_llm

# G-Retriever
python train.py --dataset scene_graphs --model_name graph_llm

3) Tuned LLM

# finetune LLM with LoRA
python train.py --dataset scene_graphs_baseline --model_name llm --llm_frozen False

# G-Retriever with LoRA
python train.py --dataset scene_graphs --model_name graph_llm --llm_frozen False

Reproducibility

Use run.sh to run the codes and reproduce the published results in the main table.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
dataset		dataset
figs		figs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
run.sh		run.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

G-Retriever

News

Citation

Environment setup

Download the Llama 2 Model

Data Preprocessing

Training

1) Inference-Only LLM

2) Frozen LLM + Prompt Tuning

3) Tuned LLM

Reproducibility

About

Releases

Packages

Contributors 3

Languages

License

XiaoxinHe/G-Retriever

Folders and files

Latest commit

History

Repository files navigation

G-Retriever

News

Citation

Environment setup

Download the Llama 2 Model

Data Preprocessing

Training

1) Inference-Only LLM

2) Frozen LLM + Prompt Tuning

3) Tuned LLM

Reproducibility

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages