llm-flask-rag-aws

Hi, pleased to meet you all.

Today We will deploy Llama-2-13B open-source LLM (can be any other HF LLM) on an AWS EC2 instance armed with a robust 24GB GPU and using Cloud Formation YAML script we will establish the given infrastructure.

We will Pack, build and run everything as a single docker-compose file which provides communication between GenAI RAG application with both open-source GPU powered LLM TGI inference and proprietary OpenAI LLM API for comparison purpose

This code release is being done as part of the speakers corner session that was conducted on 16th April 2024 https://www.landing.ciklum.com/sc-architecting-scalable-ai

So, let's start

Commands to run LLM inference instance and Flask GenAI RAG app on AWS

Create AWS CloudFormation stack with all AWS infrastructure

how to create stack using console

Download EC2 key pair

how to download ec2 key pair and put it in the project root

Change permissions of EC2 key

chmod 400 llm-key.pem

Change EC2 Public IP format

Required "-" instead of "." for SSH connection

PUBLIC_IP=X.XXX.XXX.XXX
PUBLIC_IP=$(echo "$PUBLIC_IP" | sed 's/\./-/g')

SSH Into EC2 Instance using .pem key

ssh -i llm-key.pem ec2-user@ec2-${PUBLIC_IP}.compute-1.amazonaws.com

Copy all the files from local pc to remote (git alternative)

Required to avoid messing up with git credentials during the demo

scp -r -i llm-flask-rag-aws/llm-key.pem llm-flask-rag-aws ec2-user@ec2-${PUBLIC_IP}.compute-1.amazonaws.com:~/

Install Docker-compose

sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

Run docker-compose which includes LLM inference instance and Flask GenAI RAG app

cd llm-flask-rag-aws
docker-compose build
docker-compose up

Now, you can start communicate with both chats (openai & hf llama-2 13b) from the browser

http://{PUBLIC_IP}:5000/hf
http://{PUBLIC_IP}:5000/openai

TGI overrides and set MAX_TOTAL_TOKENS automatically for Flash Attention models

huggingface/text-generation-inference#653

Do not forget to clean all AWS CloudFormation resources

how to delete aws cloudformation stack

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ec2-deploy		ec2-deploy
imgs		imgs
src		src
.env_example		.env_example
.gitignore		.gitignore
Dockerfile.flask		Dockerfile.flask
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-flask-rag-aws

Commands to run LLM inference instance and Flask GenAI RAG app on AWS

Create AWS CloudFormation stack with all AWS infrastructure

Download EC2 key pair

Change permissions of EC2 key

Change EC2 Public IP format

SSH Into EC2 Instance using .pem key

Copy all the files from local pc to remote (git alternative)

Install Docker-compose

Run docker-compose which includes LLM inference instance and Flask GenAI RAG app

Now, you can start communicate with both chats (openai & hf llama-2 13b) from the browser

TGI overrides and set MAX_TOTAL_TOKENS automatically for Flash Attention models

Do not forget to clean all AWS CloudFormation resources

About

Releases

Packages

Languages

1vash/sc-llm-flask-rag-aws

Folders and files

Latest commit

History

Repository files navigation

llm-flask-rag-aws

Commands to run LLM inference instance and Flask GenAI RAG app on AWS

Create AWS CloudFormation stack with all AWS infrastructure

Download EC2 key pair

Change permissions of EC2 key

Change EC2 Public IP format

SSH Into EC2 Instance using .pem key

Copy all the files from local pc to remote (git alternative)

Install Docker-compose

Run docker-compose which includes LLM inference instance and Flask GenAI RAG app

Now, you can start communicate with both chats (openai & hf llama-2 13b) from the browser

TGI overrides and set MAX_TOTAL_TOKENS automatically for Flash Attention models

Do not forget to clean all AWS CloudFormation resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages