This repository contains code, data, prompts and results related to the (semi-)automatic pipeline of Ontology and Knowledge Graph Construction with four different Large Language Models (LLMs): Mixtral 8x22B Instruct v0.1-Instruct-v0.1, GPT-4o, GPT-3.5, Gemini.
In this project, we explore the potential of utilizing Large Language Models (LLMs) to generate Knowledge Graphs (KGs). This work explores the (semi-)automatic construction of KGs facilitated by different SOTA LLMs. Our pipeline involves formulating competency questions (CQs), developing an ontology (TBox) based on these CQs, constructing KGs using the developed ontology, and evaluating the resultant KG with minimal to no involvement of human experts. We showcase the feasibility of our semi-automated pipeline by creating a KG on deep learning methodologies by exploiting scholarly publications.
The repository encompasses various components, including code for data preprocessing, prompts used for LLMs, datasets employed in experiments, and the corresponding results obtained from four selected SOTA LLMs.
- CQs/: Contains the competency questions generated by the LLM.
- Code/: Contains code files for data preprocessing, Competency questions, ontology and KG generation, and evaluation of KGs for four SOTA LLMs.
- Data/: Contains the selected publication pdfs and their metadata
- Evaluation/: Contains human evaluations of RAG-generated CQ answers
- KG/: Contains individual KG for 30 publications for 4 SOTA LLMs
- NER/: Contains NERs for all the competency answers for SOTA LLMs
- NER_prompt/: Contains the prompts for all the selected publications, which have the instructions that are passed to different LLMs to extract named entities
- Ontology/: Contains generated ontologies from four SOTA LLMs
- RAG_CQ_Ans/: Contains the RAG-generated answers for 28 queries (selected publication)
- SOTAOntologies/: Contains the SOTA ontologies for reuse for the creation of DLProv Ontology
- Stats/: Contains all the stats generated for this publication (ontology stats, KG individual counts and others as described in manuscript)
Add the Hugging face access token in the helper_functions.py in the following line: access_token_read = os.getenv('access_token_read_hf')
Install the prerequisite using the requirements.txt file.
To run the whole pipeline, submit (execute) your job using main.py and use config.ini to customise the variables and paths
The repository is licensed under Apache License 2.0.
Kommineni, V. K., König-Ries, B., & Samuel, S. (2024). From human experts to machines: An LLM supported approach to ontology and knowledge graph construction. arXiv preprint arXiv:2403.08345. https://doi.org/10.48550/arXiv.2403.08345
Kommineni, V. K., König-Ries, B., & Samuel, S. (2024). Towards the Automation of Knowledge Graph Construction Using Large Language Models. In E. Vakaj, S. Iranmanesh, R. Stamartina, N. Mihindukulasooriya, S. Tiwari, F. Ortiz-Rodríguez & R. Mcgranaghan (Eds.), Proceedings of the 3rd International Workshop on Natural Language Processing for Knowledge Graph Creation (part of 20th International Conference on Semantic Systems (SEMANTiCS 2024)), Vol. 3874, pp. 19-34. CEUR-WS. Available at: https://ceur-ws.org/Vol-3874/paper2.pdf