Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement AI chat endpoint #539

Merged
merged 45 commits into from
Sep 27, 2024
Merged

Conversation

DavidMStraub
Copy link
Member

@DavidMStraub DavidMStraub commented Aug 27, 2024

This is a work in progress implementing an LLM-based AI chat assistant endpoint for Gramps Web, based on the RAG (retrieval augmented generation) technique.

Short summary of the architecture:

  • gramps_webapi.api.search.text_semantic converts Gramps objects to text for semantic indexing
  • v1.0 of Sifts is used for providing a search index for vector embeddings, enabling semantic search
    • The new semantic search index is managed via a --semantic switch on the CLI commands and a semantic boolean query argument on the index endpoints
    • Vector embeddings are computed purely locally via Sentence Transformers
  • A new /api/chat/ endpoint
    • Conducts a semantic search on the query
    • Feeds the top 10 search results along with the query to a LLM
    • returns the response

Configuration options:

  • VECTOR_EMBEDDING_MODEL: the sentence transformers model to use for vector embeddings. E.g. intfloat/multilingual-e5-small
  • LLM_BASE_URL: base URL for openai-python. If empty, uses the OpenAI API; but can use any OpenAI compatible API, e.g. using Ollama locally
  • LLM_MODEL: the model to use for chat. E.g., gpt-4o-mini for OpenAI or tinyllama for Ollama.

Guiding principles

  • Fully local if desired, optionally leveraging APIs
  • Disabled by default (resource consumption, privacy concerns, ...)

What's missing?

  • documentation, obviously
  • more unit tests
  • optional quota system per tree
  • limiting chat endpoint to user groups?
  • I'm still working on the text_semantic module to complete it for all object types and tweak it to improve semantic search recall and precision, which currently is not as good as I would like
  • A config option to limi context length rather than retrieving a fixed number of objects
  • Possibly a better way to handle long notes
  • Adding a default embedding model to the default docker image

I also plan to

  • write a blog post with details about the architecture
  • submit a PR to the frontend repo with a chat UI very soon

Note that this is meant to be an initial implementation laying the foundations; this can be made more powerful later on by introducing function calling etc.

@DavidMStraub
Copy link
Member Author

Docs: gramps-project/gramps-web-docs#18

@emyoulation
Copy link

This is a work in progress implementing an LLM-based AI chat assistant endpoint for Gramps Web, based on the RAG (retrieval augmented generation) technique.

Please introduce acronyms.
"This is a work in progress implementing a Large Language Model (LLM) based artificial intelligence (AI) chat assistant endpoint for Gramps Web, utilizing the Retrieval-Augmented Generation (RAG) technique."

@DavidMStraub
Copy link
Member Author

Almost done. Expect to finish in the next 7 days.

@DavidMStraub DavidMStraub marked this pull request as ready for review September 14, 2024 12:11
@DavidMStraub
Copy link
Member Author

DavidMStraub commented Sep 14, 2024

  • Update API docs for /metadata/

Copy link

@RahulVadisetty91 RahulVadisetty91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i5

After reviewing gramps_webapi/main.py script

The second definition overwrites the first. It seems the second function is the correct one since it includes the semantic option. Remove the first one to avoid overwriting.

@DavidMStraub DavidMStraub merged commit 82cfa86 into gramps-project:master Sep 27, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants