GraphArena Benchmark

This repository contains the codebase for the following manuscript:

GraphArena: Benchmarking Large Language Models on Graph Computational Problems
Authors: Jianheng Tang, Qifan Zhang, Yuhan Li, Jia Li
Arxiv

Environment Setup

To set up the required environment, please follow the steps below:

conda create -n GraphArena
source activate GraphArena
conda install openai pandas numpy networkx pip
pip install pybind11
pip install rdkit ogb graph-walker

Dataset Preparation

The dataset dataset.zip for benchmarking can be downloaded and unzipped directly from our Google Drive or OneDrive.

For those who prefer to prepare the dataset from scratch, download source.zip, unzip it, and execute the script run_dataset.sh.

LLM Inference

Call LLM API using the command below:

python benchmark_LLM_api.py --task $task --problem_num $problem_num --example_num $example_num --results $results --llm $llm --difficulty $difficulty --resume $resume --sleep $sleep

For example, running GPT on the TSP task with small graphs and 500 problems:

python benchmark_LLM_api.py --task TSP --problem_num 500 --llm gpt --difficulty easy

If you want to evaluate LLMs locally, using the command below:

python benchmark_LLM_local.py --llm llama8b/GraphWiz

For comprehensive benchmarking across all tasks, run run_benchmark.sh. Details about command-line arguments are available in both benchmark_LLM.py and run_benchmark.sh.

Supported LLM models:

{
    "gpt4": "gpt-4o",
    "gpt": "gpt-3.5-turbo-0125",
    "claude": "claude-3-haiku-20240307",
    "mixtral": "mistralai/Mixtral-8x7B-Instruct-v0.1",
    "deepseek": "deepseek-chat",
    "llama8b": "meta-llama/Llama-3-8b-chat-hf",
    "llama": "meta-llama/Llama-3-70b-chat-hf",
    "qwen7b": "qwen1.5-7b-chat",
    "qwen": "qwen1.5-72b-chat",
    "gemma": "gemma-7b-it"
}

Reproducing Evaluation Results

To reproduce the results presented in our manuscript, please follow these steps:

Unzip final_results.zip.
Run the following scripts in sequence:

reproduce_table1.ipynb
reproduce_figure2.py
reproduce_figure3.py
reproduce_figure4.py

Please note that the plotting process may take a few minutes to complete.

Case Demonstration

The full dataset of problems and corresponding LLM responses is available in final_results/GraphArena_all.json. This JSON file organizes the data as follows:

{
    "Task_name": [
        {
            "id": 0,  // IDs range from 0-499 for small graphs (easy) to 500-999 for large graphs (hard)
            "problem_text": "...",
            "LLM responses": "..."
        },
        ...
    ]
}

For more human-readable examples, please refer to examples.md.

Licensing

The dataset is available under the CC BY-SA 4.0 License. The code repository is licensed under the BSD-2 Clause.

Long-term Preservation

This repository is maintained by Jianheng Tang ([email protected]). For long-term support and updates, Qifan Zhang ([email protected]) and Yuhan Li ([email protected]) are also key maintainers.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
tasks		tasks
utils		utils
.gitignore		.gitignore
License.md		License.md
README.md		README.md
Revision_GraphArena.pdf		Revision_GraphArena.pdf
benchmark_GNN.py		benchmark_GNN.py
benchmark_LLM_API.py		benchmark_LLM_API.py
benchmark_LLM_local.py		benchmark_LLM_local.py
build_dataset.py		build_dataset.py
examples.md		examples.md
final_results.zip		final_results.zip
reproduce_figure2.py		reproduce_figure2.py
reproduce_figure3.py		reproduce_figure3.py
reproduce_figure4.py		reproduce_figure4.py
reproduce_table1.ipynb		reproduce_table1.ipynb
run_GNN.sh		run_GNN.sh
run_benchmark.sh		run_benchmark.sh
run_dataset.sh		run_dataset.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraphArena Benchmark

Environment Setup

Dataset Preparation

LLM Inference

Reproducing Evaluation Results

Case Demonstration

Licensing

Long-term Preservation

About

Releases

Packages

Contributors 2

Languages

License

squareRoot3/GraphArena

Folders and files

Latest commit

History

Repository files navigation

GraphArena Benchmark

Environment Setup

Dataset Preparation

LLM Inference

Reproducing Evaluation Results

Case Demonstration

Licensing

Long-term Preservation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages