Semantic Caching is an In-Memory Database that support Semantic Search (Vector Search), it can be used in many different applications like RAG (Retrieval Augmented Generation), Database Assistant, and many more.. Designing a high performance applications that uses LLMs requires handling alot of issues like Time-Complexity, and avoidance of repeatable calls. Semantic Caching can help and save time and computational resources when designing applications like this. Tiny Semantic Caching is a project that uses Ollama and Vector Search in Duckdb to create complete semantic caching cycle.
- Install all Prerequisities Softwares required for this project.
- install requirements
poetry install
- copy all containt of .env.example to .env file / rename .env.example to .env .
- get an embedding model from Ollama like
nomic-embed-text
ollama pull nomic-embed-text
make sure to update model name/embedding size in .env file if you used other embedding model.
- to test the project locally
## use this directly
poetry run uvicorn main:app --reload
## or use this to activate the environment first
poetry shell
## then test the API
uvicorn main:app --reload
use the following URL to test the functionalities http:localhost:8000/docs
- if no issues locally, use the docker-compose file to build the containers
### build the images
docker-compose build
## run the docker-compose file
docker-compose up -d
There are 4 different Functionalities:
- vectorize (GET): convert Passed Text to Vector Using the Embedding Model
-
insertion (POST): insert data and its embeddings to caching database.
-
search (POST): search for similar/identical text based on passed text. here text is vectorized then search in caching database, last thing it to insert it.
- refresh (DELETE): refreshing database to clear all records from it.
- feel free to update the scripts based on your needs and run the docker compose file.
- use the direct image without any update by
## go to scripts directory
cd scripts
## run the docker compose file
docker-compose up -d