MojoQA is a RAG (Retrieval Augmented Generation) based LLM application that can answer queries related to Mojo programming language.
Introduced this year, Mojo is a novel programming language that seamlessly merges the strengths of Python syntax with elements of systems programming and metaprogramming, effectively bridging the divide between research and production.
Let's see what a LLAMA-2 7B model knows about the advantages of Mojo over Python!
As you can see the response isn't accurate and not very helpful.
Now let's see how Mojo QA bot performs.
Now that looks better!!
For creating Mojo QA Bot, I have extracted the official Mojo documentation and created a vector store containing corresponding embeddings. To answer each query, we retrieve the most similar embeddings and provide it to the LLM as context. Checkout the high level overview diagram below.
pip install -r requirements.txt
Please refer this link for the installation instructions of llama-cpp with OpenBLAS/ cuBLAS / CLBlast. The dependency I have added in the requirements.txt file is for CPU only installation.
pip install -e .
For this project I have used 4 bit quantized LLAMA-2 7B model (using llama-cpp). For better result for the text generation, it is better to use the chat model. Kindly download and place the model in the ./models directory of this project. You can use other models too.
Models used in this project:
4 bit quantized LLAMA-2-7B-Chat
For using different models, please update the model path parameters in config/config.yaml file and mojoqa/config/conf.py file.
python ./scripts/main.py
streamlit run ./streamlit/app.py