Architecting Scalable AI RAG Systems

How to deploy a flask RAG application on Azure, specifically Azure Container App service- A serverless engine.

Embeddings storage

QdrantDB(Container)

Chat history storage

Redis(Container)

API server

Python Flask

LLM

OpenAI - gpt-3.5-turbo

In the python flask project we have-

Added Qdrant as a vector db store
Methods that have been enhanced with inclusion of Qdrant APIs(Initialize, search_results, generate methods)
Redis implementation to store the chat history, which will be used as context
Basic chat UI in HTML, CSS to communicate with the bot

Running this code on your local machine

Pre requistes to be installed

Docker
Azure CLI
Python 3.9
Create a .env file under src folder with below data

OPENAI_API_KEY=""

QDRANT_HOST="localhost"

QDRANT_PORT="6333"

REDIS_HOST="localhost"

REDIS_PORT="6379"

REDIS_PASSWORD="None"

On CMD line

Create and activate your python environment for this project
Install the required packages pip install -r requirements.txt
Run Qdrant docker container docker run -d -p 6333:6333 qdrant/qdrant
Run Redis docker container docker run --name redis -d -p 6379:6379 redis
Run flask server and check locally if all works(from src folder). Access the app from localhost:5000/openai. python flask_rag_llm_openai_hf.py

Azure deployment

Prerequisites

Provison a suitable Azure container app service environment, based on scale, redundancy and region of choice
Providon a azure container registry. You could also use a docker public image, unless you want it to be available for everyone else to use.

Login to Azure on your command line and push your image to registry

az login

az account set --subscription <your specific subscription>

az acr login -n <registry name created on Azure>

cd to correct folder having dockerfile, and build the image

docker build . -t speakerscornerregistry.azurecr.io/openai

takes about 100s-200s

Then push the image to registry with tag

docker push <registry name created on Azure>.azurecr.io/openai:latest

On container options page pick the specific registry, image and tag of the image just pushed. Key in your environment variables manually while creating the conatiner app, i.e. your openAi key and its value.
On bindings create and select your two sidecars qdrant and redis
In the network tab while creating the ACA check "Enable ingress", and "Traffic from anywhere" to allow receiving traffic from internet when you open the container app link. Update the target port to 5000 as this is the port where the flask api is being served.
Once you hit Create. You can check status of you deployment under "Revisons and replicas" menu.
Once the app is ready, you can access the bot page. The link would be similar to below https://{container app name}.{random word}.{region}.azurecontainerapps.io/openai

Substitutes for Serverless GPU for inferencing purpose

Replicate
Runpod
Huggingface

This code release is being done as part of speakers corner session that was conducted on 16th April 2024 https://www.landing.ciklum.com/sc-architecting-scalable-ai

Ciklum

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
llm-flask-rag-azure		llm-flask-rag-azure
.gitignore		.gitignore
README.md		README.md
group.png		group.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Architecting Scalable AI RAG Systems

Embeddings storage

Chat history storage

API server

LLM

In the python flask project we have-

Running this code on your local machine

Pre requistes to be installed

On CMD line

Azure deployment

Substitutes for Serverless GPU for inferencing purpose

About

Releases

Packages

Languages

saikumaru/speakers-corner-llm-rag-application

Folders and files

Latest commit

History

Repository files navigation

Architecting Scalable AI RAG Systems

Embeddings storage

Chat history storage

API server

LLM

In the python flask project we have-

Running this code on your local machine

Pre requistes to be installed

On CMD line

Azure deployment

Substitutes for Serverless GPU for inferencing purpose

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages