Generating and evaluating relevant documentation (GERD)

GERD is developed as an experimental library to investigate how large language models (LLMs) can be used to generate and analyze (sets of) documents.

This project was initially forked from Llama-2-Open-Source-LLM-CPU-Inference by Kenneth Leung.

Quickstart

If you just want to it try out, you can clone the project and install dependencies with pip:

git clone https://github.com/caretech-owl/gerd.git
cd gerd
pip install -e ".[full]"
python examples/hello.py

For more information on development look at DEV.md. If you want to try this out in your browser, head over to binder 👉 . Note that running LLMs on the CPU (and especially on limited virtual machines like binder) takes some time. If you are in a hurry you might be better off by cloning the repo and running the examples or notebooks locally.

Question and Answer example

Follow quickstart but execute gradio with the qa_frontend instead of the example file. When the server is done loading, open http://127.0.0.1:7860 in your browser.

gradio gerd/frontends/qa_frontend.py
# Some Llama.cpp outut
# ...
# * Running on local URL:  http://127.0.0.1:7860

Click the 'Click to Upload' button and search for a GRASCCO document named Caja.txt which is located in the tests/data/grascoo folder and upload it into the vector store. Next, you can query information from the document. For instance Wie heißt der Patient? (What is the patient called?).

Prompt Chaining

Prompt chaining is a prompt engineering approach to increase the 'reflection' of a large language model onto its given answer. Check examples/chaining.py for an illustration. Also, have a look at how chaining is configured and used with GERD. You can find the config at config/gen_chaining.yml

python examples/chaining.py
# ...
====== Resolved prompt =====

system: You are a helpful assistant. Please answer the following question in a truthful and brief manner.
user: What type of mammal lays the biggest eggs?

# ...
Result: Based on the given information, the largest egg-laying mammal is the blue whale, which can lay up to 100 million eggs per year. However, the other assertions provided do not align with this information.

As you see, the answer does not make much sense with the default model which is rather small. Give it a try with meta-llama/Llama-3.2-3B. To use this model, you need to login with the huggingface cli and accept the Meta Community License Agreement.

Used Tools

LangChain: Framework for developing applications powered by language models
C Transformers: Python bindings for the Transformer models implemented in C/C++ using GGML library
FAISS: Open-source library for efficient similarity search and clustering of dense vectors.
Sentence-Transformers (all-MiniLM-L6-v2): Open-source pre-trained transformer model for embedding text to a 384-dimensional dense vector space for tasks like
Poetry: Tool for dependency management and Python packaging

Files and Content

/assets: Images relevant to the project
/config: Configuration files for LLM applications
/examples: Examples that demonstrate the different usage scenarios
/gerd: Code related to GERD
/images: Images for the documentation
/models: Binary file of GGML quantized LLM model (i.e., Llama-2-7B-Chat)
/prompts: Plain text prompt files
/templates: Prompt files as jinja2 templates
/tests: Unit tests for GERD
/vectorstore: FAISS vector store for documents
pyproject.toml: TOML file to specify which versions of the dependencies used (Poetry)

Name		Name	Last commit message	Last commit date
Latest commit History 488 Commits
.github		.github
binder		binder
config		config
examples		examples
gerd		gerd
images		images
loras		loras
models		models
notebooks		notebooks
prompts		prompts
templates		templates
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
DEV.md		DEV.md
LICENSE		LICENSE
README.md		README.md
env.sample		env.sample
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
release.py		release.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generating and evaluating relevant documentation (GERD)

Quickstart

Question and Answer example

Prompt Chaining

Used Tools

Files and Content

References

About

Releases

Packages

Contributors 7

Languages

License

caretech-owl/gerd

Folders and files

Latest commit

History

Repository files navigation

Generating and evaluating relevant documentation (GERD)

Quickstart

Question and Answer example

Prompt Chaining

Used Tools

Files and Content

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages