This repository contains the codebase for the following manuscript:
GraphArena: Benchmarking Large Language Models on Graph Computational Problems
Authors: Jianheng Tang, Qifan Zhang, Yuhan Li, Jia Li
Arxiv
To set up the required environment, please follow the steps below:
conda create -n GraphArena
source activate GraphArena
conda install openai pandas numpy networkx pip
pip install pybind11
pip install rdkit ogb graph-walker
The dataset dataset.zip
for benchmarking can be downloaded and unzipped directly from our Google Drive or OneDrive.
For those who prefer to prepare the dataset from scratch, download source.zip
, unzip it, and execute the script run_dataset.sh
.
Call LLM API using the command below:
python benchmark_LLM_api.py --task $task --problem_num $problem_num --example_num $example_num --results $results --llm $llm --difficulty $difficulty --resume $resume --sleep $sleep
For example, running GPT on the TSP task with small graphs and 500 problems:
python benchmark_LLM_api.py --task TSP --problem_num 500 --llm gpt --difficulty easy
If you want to evaluate LLMs locally, using the command below:
python benchmark_LLM_local.py --llm llama8b/GraphWiz
For comprehensive benchmarking across all tasks, run run_benchmark.sh
. Details about command-line arguments are available in both benchmark_LLM.py
and run_benchmark.sh
.
Supported LLM models:
{
"gpt4": "gpt-4o",
"gpt": "gpt-3.5-turbo-0125",
"claude": "claude-3-haiku-20240307",
"mixtral": "mistralai/Mixtral-8x7B-Instruct-v0.1",
"deepseek": "deepseek-chat",
"llama8b": "meta-llama/Llama-3-8b-chat-hf",
"llama": "meta-llama/Llama-3-70b-chat-hf",
"qwen7b": "qwen1.5-7b-chat",
"qwen": "qwen1.5-72b-chat",
"gemma": "gemma-7b-it"
}
To reproduce the results presented in our manuscript, please follow these steps:
- Unzip
final_results.zip
. - Run the following scripts in sequence:
reproduce_table1.ipynb
reproduce_figure2.py
reproduce_figure3.py
reproduce_figure4.py
Please note that the plotting process may take a few minutes to complete.
The full dataset of problems and corresponding LLM responses is available in final_results/GraphArena_all.json
. This JSON file organizes the data as follows:
{
"Task_name": [
{
"id": 0, // IDs range from 0-499 for small graphs (easy) to 500-999 for large graphs (hard)
"problem_text": "...",
"LLM responses": "..."
},
...
]
}
For more human-readable examples, please refer to examples.md
.
The dataset is available under the CC BY-SA 4.0 License. The code repository is licensed under the BSD-2 Clause.
This repository is maintained by Jianheng Tang ([email protected]). For long-term support and updates, Qifan Zhang ([email protected]) and Yuhan Li ([email protected]) are also key maintainers.