This template is licensed under Apache 2.0 and contains the following components:
This reference project shows how to use Meta's Llama2 LLM to do Q&A over information that the Llama2 model has not been trained on and will not be able to provide answers out of the box. The project has the following files
-
Llama_Qdrant_RAG.ipynb : This file loads a PDF, converts it to embeddings, stores the embeddings in a local Qdrant Vector Store, defines a prompt, downloads and caches the Llama2 model then constructs a RetrievalQA chain and calls the model to get a response. this file contains instructions on tailoring for your own data files.
-
model.py : This file is used to deploy our model as a Domino Model API so we can call it programatically from our application. You must run the Llama_Qdrant_RAG.ipynb to initialise the Qdrant vector store first. It has a
generate
function that should be used as the Model API function. Follow the instructions in our documentation to deploy this. -
app.sh : The shell script needed to run the chat app
-
API_streamlit_app.py : Streamlit app code for the Q&A chatbot. This app requires the model to be deployed as a Domino Model API and the url / token updating to reference it.
-
sample_data/MLOps_whitepaper.pdf : A Domino MLOps whitepaper report that can be used as an example for the flow that has been described above.
-
images/domino_banner.png and images/domino_logo.png : Images used in the application.
This project requires the following compute environments to be present. Please ensure the "Automatically make compatible with Domino" checkbox is selected while creating the environment.
You must set your Workspace volume size to 20GB before running the code to ensure that there is enough space to store the model.
Note: you must run the Llama_Qdrant_RAG.ipynb to initialise the Qdrant vector database prior to deploying the Model. You should also deploy the model and copy across the model URL and access token to the API_streamlit_app.py before deploying it.
quay.io/domino/pre-release-environments:project-hub-gpu.main.latest
Add the following to your dockerfile instructions:
RUN pip install qdrant_client streamlit_chat pypdf
Pluggable Workspace Tools
jupyterlab:
title: "JupyterLab"
iconUrl: "/assets/images/workspace-logos/jupyterlab.svg"
start: [ "/opt/domino/workspaces/jupyterlab/start" ]
httpProxy:
internalPath: "/{{ownerUsername}}/{{projectName}}/{{sessionPathComponent}}/{{runId}}/{{#if pathToOpen}}tree/{{pathToOpen}}{{/if}}"
port: 8888
rewrite: false
requireSubdomain: false
vscode:
title: "vscode"
iconUrl: "/assets/images/workspace-logos/vscode.svg"
start: [ "/opt/domino/workspaces/vscode/start" ]
httpProxy:
port: 8888
requireSubdomain: false
Please change the value in start
according to your Domino version.
Use the GPU k8s hardware tier for the Workspace and the Model API. The App can be deployed using a Small k8s hardware tier.