This project aims to help researchers find answers from a set of research papers with the help of a customized RAG pipeline and a powerfull LLM, all offline and free of cost.
For more details, please checkout the blog post about this project.
- Download some research papers from Arxiv
- Use Llamaindex to load, chunk, embed and store these documents to a Qdrant database
- FastAPI endpoint that receives a query/question, searches through our documents and find the best matching chunks
- Feed these relevant documents into an LLM as a context
- Generate an easy to understand answer and return it as an API response alongside citing the sources
docker run -p 6333:6333 -v ~/qdrant_storage:/qdrant/storage:z qdrant/qdrant
python rag/data.py --query "LLM" --max 10 --ingest
Follow this article for more infos on how to run models from hugging face locally with Ollama.
Create model from Modelfile
ollama create research_assistant -f ollama/Modelfile
Start the model server
ollama run research_assistant
By default, Ollama runs on http://localhost:11434
uvicorn app:app --reload