Code and data for experiments on the evaluation of class membership relations in knowledge graphs using LLMs
Bradley P. Allen and Paul T. Groth
INtelligent Data Engineering Lab
University of Amsterdam, Amsterdam, The Netherlands
A backbone of knowledge graphs are their class membership relations, which assign entities to a given class. As part of the knowledge engineering process, we propose a new method for evaluating the quality of these relations by processing descriptions of a given entity and class using a zero-shot chain-of-thought classifier that uses a natural language intensional definition of a class (Figure 1). This repository contains the data and code involved in an evaluation of this method.
Figure 1: A zero-shot chain-of-thought classifier applied to the class clgo:Romania international rugby union player and the entity clgr:Iosif Nemes from the CaLiGraph knowledge graph.
We evaluated the method using two publicly available knowledge graphs, Wikidata and CaLiGraph, and 7 large language models. Using the gpt-4-0125-preview large language model, the method’s classification performance achieved a macro-averaged F1-score of 0.830 on data from Wikidata and 0.893 on data from CaLiGraph. Moreover, a manual analysis of the classification errors showed that 40.9% of errors were due to the knowledge graphs, with 16.0% due to missing relations and 24.9% due to incorrectly asserted relations.
The principal contributions of this work are 1) a formal approach to the design of a neurosymbolic knowledge engineering workflow integrating KGs and LLMs, and 2) experimental evidence that this method can assist knowledge engineers in addressing the correctness and completeness of KGs, potentially reducing the effort involved in knowledge acquisition and elicitation.
Allen, B.P. and Groth, P.T., 2024. Evaluating Class Membership Relations in Knowledge Graphs using Large Language Models. arXiv preprint arXiv:2404.17000. Accepted to the European Semantic Web Conference Special Track on Large Language Models for Knowledge Engineering, Hersonissos, Crete, GR, May 2024.
MIT.
- Python 3.11 or higher.
- OPENAI_API_KEY and HUGGINGFACE_API_TOKEN environment variables set to your respective OpenAI and Hugging Face API keys.
$ git clone https://github.com/bradleypallen/evaluating-kg-class-memberships-using-llms.git
$ cd evaluating-kg-class-memberships-using-llms
$ python -m venv env
$ source env/bin/activate
$ pip install -r requirements.txt
- Classifier implementation: classifier.py
- Utilities for running experiments and displaying results: utils.py
- Notebooks for executing experiments
- Wikidata: wikidata_experiment.ipynb
- CaLiGraph caligraph_experiment.ipynb
- Data sets
- Wikidata: wikidata_classes.json
- CaLiGraph: caligraph_classes.json
- Classification results
- Wikidata
- gemma-2b-it: gemma-2b-it-wikidata.json
- gemma-7b-it: gemma-7b-it-wikidata.json
- gpt-3.5-turbo: gpt-3.5-turbo-wikidata.json
- gpt-4.0-0125-preview: gpt-4-0125-preview-wikidata.json
- Llama-2-70b-chat-hf: Llama-2-70b-chat-hf-wikidata.json
- Mistral-7b-instruct-v0.2: Mistral-7B-Instruct-v0.2-wikidata.json
- Mixtral-8x7B-Instruct-v0.1: Mixtral-8x7B-Instruct-v0.1-wikidata.json
- CaLiGraph
- gemma-2b-it: gemma-2b-it-caligraph.json
- gemma-7b-it: gemma-7b-it-caligraph.json
- gpt-3.5-turbo: gpt-3.5-turbo-caligraph.json
- gpt-4.0-0125-preview: gpt-4-0125-preview-caligraph.json
- Llama-2-70b-chat-hf: Llama-2-70b-chat-hf-caligraph.json
- Mistral-7b-instruct-v0.2: Mistral-7B-Instruct-v0.2-caligraph.json
- Mixtral-8x7B-Instruct-v0.1: Mixtral-8x7B-Instruct-v0.1-caligraph.json
- Wikidata
- Wikidata: wikidata-classifier-performance.ipynb
- CaLiGraph: caligraph-classifier-performance.ipynb
- Classifier errors using gpt-4-0125-preview: gpt-4-0125-preview-errors.ipynb
- Notebook for generating CSV files for import into spreadsheet application in support of human annotation for error analysis: gpt-4-0125-preview-error-analysis-prep.ipynb
- Generated CSV files for import into spreadsheets for human annotation
- Wikidata: wd_err.csv
- CaLiGraph: cg_err.csv
- Spreadsheets and CSV files with human annotations:
- Wikidata: wd_err_annotated.numbers (Numbers), cg_err_annotated.csv (CSV)
- CaLiGraph: cg_err_annotated.numbers (Numbers), cg_err_annotated.csv (CSV)
- Error analysis: gpt-4-0125-preview-error-analysis.ipynb
- Delete the existing model-specific classification files in the
/experiments
subdirectory. - Execute
wikidata_experiment.ipynb
andcaligraph_experiment.ipynb
to run each of the seven LLMs over the data sets for Wikidata and CaLiGraph, respectively. - Occasionally, a given run will throw an error, typically due to an API timeout or other service-related problem. In those instances, simply re-execute the notebook, and the processing will restart after the last model and last class being processed.
- Execute
wikidata-classifier-performance.ipynb
andcaligraph-classifier-performance.ipynb
to view the performance statistics for each of the seven LLMs' classifications for Wikidata and CaLiGraph, respectively. This can be done while experiments are being run, after the first model has processed the first class.
- Execute
gpt-4-0125-preview-errors.ipynb
to view the false positives and false negatives by gpt-4-0125-preview for each class in both Wikidata and CaLiGraph. - To view errors for another model, replace
experiments/gpt-4-0125-preview-wikidata.json
andexperiments/gpt-4-0125-preview-caligraph.json
with the appropriate model classification results files in the calls todisplay_errors
.
- Execute
gpt-4-0125-preview-error-analysis-prep.ipynb
to generate the CSV files containing the classifications errors for the Wikidata and CaLiGraph experiments. - To generate CSV files for another model, replace
experiments/gpt-4-0125-preview-wikidata.json
andexperiments/gpt-4-0125-preview-caligraph.json
with the appropriate model classification results files in the calls tojson.load
. - Using a spreadsheet application (e.g Excel or Numbers), import the generated CSV files, adding four columns with headers "missing data", "missing relation", "incorrect relation", and "incorrect reasoning" to the right.
- For each row, annotate the cells in the new columns in the following manner, such that only one for these four cells in the row should be marked 'True', and the others marked 'False':
- If the error is due to missing data in the entity description, mark "missing data" 'True', else 'False'.
- If the error is due to a missing relation in the knowledge graph, mark "missing relation" 'True', else 'False'.
- If the error is due to an incorrect relation in the knowledge graph, mark "incorrect relation" 'True' else 'False'.
- If the error is due to missing data in the entity description, mark "incorrect reasoning" 'True' else 'False'.
- Export the annotated spreadsheets for Wikidata and CaLiGraph to
error-analysis/wd_err_annotated.csv
anderror-analysis/cg_err_annotated.csv
, respectively.
- Execute
gpt-4-0125-preview-error-analysis.ipynb
to view the results of the error analysis.