This application implements a GPU-accelerated Retrieval-Augmented Generation (RAG) based Question-Answering system using NVIDIA Inference Microservices (NIMs) and the LlamaIndex framework. It allows users to upload documents, process them, and then ask questions about the content.
- Document loading and processing
- Vector storage and retrieval using Milvus
- Question-answering capabilities using NIMs
- Interactive chat interface built with Gradio
-
Clone this repository:
git clone https://github.com/jayrodge/llm-assistant-cloud-app.git cd llm-assistant-cloud-app
-
Create a virtual environment (using Python 3.9 as an example):
- Using
venv
:python3.9 -m venv venv source venv/bin/activate
- Using
conda
:conda create -n llm-assistant-env python=3.9 conda activate llm-assistant-env
- Using
-
Install the required Python libraries using the requirements.txt file:
pip install -r requirements.txt
-
Set up your NVIDIA API Key:
- Sign up for an NVIDIA API Key on build.nvidia.com if you haven't already.
- Set the API key as an environment variable:
export NVIDIA_API_KEY='your-api-key-here'
- Alternatively, you can directly edit the script and add your API key to the line:
os.environ["NVIDIA_API_KEY"] = 'nvapi-XXXXXXXXXXXXXXXXXXXXXX' #Add NVIDIA API Key
-
Run the script:
python app.py
-
Open the provided URL in your web browser to access the Gradio interface.
-
Use the interface to:
- Upload document files
- Load and process the documents
- Ask questions about the loaded documents
-
Document Loading: Users can upload multiple document files through the Gradio interface.
-
Document Processing: The application uses LlamaIndex to read and process the uploaded documents, splitting them into chunks.
-
Embedding and Indexing: The processed documents are embedded using NVIDIA's embedding model and stored in a Milvus vector database.
-
Question Answering: Users can ask questions through the chat interface. The application uses NIM with Llama 3 70B Instruct hosted on cloud to generate responses based on the relevant information retrieved from the indexed documents.
You can customize various aspects of the application:
- Change the chunk size for text splitting
- Use different NVIDIA or open-source models for embedding or language modeling
- Adjust the number of similar documents retrieved for each query
If you encounter any issues:
- Ensure your NVIDIA API Key is correctly set.
- Check that all required libraries are installed correctly.
- Verify that the Milvus database is properly initialized.