Skip to content

Latest commit

 

History

History
62 lines (43 loc) · 3.5 KB

README.md

File metadata and controls

62 lines (43 loc) · 3.5 KB

License

This template is licensed under Apache 2.0 and contains the following components:

Llama2 custom Q&A Reference Project

This reference project shows how to use Meta's Llama2 LLM to do Q&A over information that the Llama2 model has not been trained on and will not be able to provide answers out of the box. The project has the following files

  • Llama_Qdrant_RAG.ipynb : This file loads a PDF, converts it to embeddings, stores the embeddings in a local Qdrant Vector Store, defines a prompt, downloads and caches the Llama2 model then constructs a RetrievalQA chain and calls the model to get a response. this file contains instructions on tailoring for your own data files.

  • model.py : This file is used to deploy our model as a Domino Model API so we can call it programatically from our application. You must run the Llama_Qdrant_RAG.ipynb to initialise the Qdrant vector store first. It has a generate function that should be used as the Model API function. Follow the instructions in our documentation to deploy this.

  • app.sh : The shell script needed to run the chat app

  • API_streamlit_app.py : Streamlit app code for the Q&A chatbot. This app requires the model to be deployed as a Domino Model API and the url / token updating to reference it.

  • sample_data/MLOps_whitepaper.pdf : A Domino MLOps whitepaper report that can be used as an example for the flow that has been described above.

  • images/domino_banner.png and images/domino_logo.png : Images used in the application.

Setup instructions

This project requires the following compute environments to be present. Please ensure the "Automatically make compatible with Domino" checkbox is selected while creating the environment.

You must set your Workspace volume size to 20GB before running the code to ensure that there is enough space to store the model.

Note: you must run the Llama_Qdrant_RAG.ipynb to initialise the Qdrant vector database prior to deploying the Model. You should also deploy the model and copy across the model URL and access token to the API_streamlit_app.py before deploying it.

Environment Requirements

quay.io/domino/pre-release-environments:project-hub-gpu.main.latest

Add the following to your dockerfile instructions: RUN pip install qdrant_client streamlit_chat pypdf

Pluggable Workspace Tools

jupyterlab:
  title: "JupyterLab"
  iconUrl: "/assets/images/workspace-logos/jupyterlab.svg"
  start: [ "/opt/domino/workspaces/jupyterlab/start" ]
  httpProxy:
    internalPath: "/{{ownerUsername}}/{{projectName}}/{{sessionPathComponent}}/{{runId}}/{{#if pathToOpen}}tree/{{pathToOpen}}{{/if}}"
    port: 8888
    rewrite: false
    requireSubdomain: false
vscode:
 title: "vscode"
 iconUrl: "/assets/images/workspace-logos/vscode.svg"
 start: [ "/opt/domino/workspaces/vscode/start" ]
 httpProxy:
    port: 8888
    requireSubdomain: false

Please change the value in start according to your Domino version.

Hardware Requirements

Use the GPU k8s hardware tier for the Workspace and the Model API. The App can be deployed using a Small k8s hardware tier.