FM-Matcher demonstrates the use of Large Language Models for Schema Matching. Find more information in our publication. This tool uses the OpenAI Python SDK to communicate with the OpenAI API. Other models are currently not supported out-of-the-box.
We use (and thus recommend) to install FM-Matcher using poetry:
poetry install
You may also choose to inspect the requirements.txt
file to install the tool manually via pip
.
You may also choose to use FM-Matcher containerized. We provide a Dockerfile
in this repository, based on a Python slim image. You can build an image with podman, for example, like this:
podman build -t fm_matcher .
Under Linux and in the containerized setting, you can use environment variables to configure FM-Matcher. Other OSes are not tested, but you can change the default configuration in utils/config.py
if needed.
OPENAI_API_KEY
: REQUIRED The OpenAI API key that will be used. There is no default, you will have to create an OpenAI API key yourself.QUERY_OPENAI
: Set this to False to generate a random result instead of prompting the LLM. Useful for testing and developing. Default:True
OPENAI_MODEL
: The OpenAI model that is used. Default:gpt-4o-mini-2024-07-18
OPENAI_N
: The number of answers that is requested from a model per prompt. Default:3
OPENAI_TEMPERATURE
: The models temperature setting. Default:1.0
OPENAI_TIMEOUT
: Timeout of the OpenAI API calls. There is some tenacity used to query the API, we would still recommend to test before setting this significantly lower. Default:60
TEMPLATE_DIR
: Directory where the prompt templates are stored. The template are filled with the schema information from FM-Matcher and sent to OpenAI. Default:resources/prompt_templates
PARALLEL_OPENAI_REQUESTS
: Maximum number of parallel requests that will be sent asynchronously to OpenAI. Lower this to fix RateLimitErrors.5
SQLITE_PATH
: Path to an SQLite database file, used for caching results. You may set this to""
to disable. Default:dev.sqlite3
Run FM-Matcher as you would run any Streamlit application:
poetry run streamlit run main.py
Assuming you have build the container as shown above, you can start a container like this:
podman run -d --name fm_matcher -e OPENAI_API_KEY="mySuperSecretApiKey" -p 8501:8501 fm_matcher