Schema Matching with Large-Language Models

This repository contains artefacts for the paper "Schema Matching with Large Language Models".

Under benchmark, you can find the ground truth of matches that we aim to find using LLMs. In the ground_truth.csv file, each line corresponds to a match. type defines the type of match (currently only one_to_one), source is the full qualified name of the source attribute, relationship is currently always corresponds, and target is the full qualified name of the target attribute.

results

results contain our experimental results. gpt35_results.csv, gpt4_results.csv and baseline_results.csv contain the actual votes per attribute pair; all_decisions_df.csv is an integrated version of the three files where majority voting has been applied.

schema_documentations

The schema_docuemntations folder contains the relation and attribute descriptions we used for our experiments. The MIMIC desciptions are scraped from the MIMIC-IV documentation, the OMOP documentations stems from the OMOP CDM documentation. Refer to the corresponding documentation sites for more information and their publication licenses.

templates

The prompt templates that we applied to generate prompts for our experiments, in JSON format.

Requirements

Dependencies are listed in requirements.txt. Install them via pip install -r requirements.txt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Schema Matching with Large-Language Models

Table of Contents

benchmark

results

schema_documentations

templates

Requirements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Schema Matching with Large-Language Models

Table of Contents

benchmark

results

schema_documentations

templates

Requirements