Serving ML models in a high-load manner with cortex and traffic splitter.
With this template, you can deploy real-time recommender systems behind the multi-armed bandit and balance traffic. No knowledge of Kubernetes or autoscaling is needed! It's all there out of the box.
At the heart of this project is the open-source Cortex project and its unique feature: TrafficSplitter. You only need to prepare models, this project will do the rest. Cheers!
Here is an example multi-armed bandit with two models behind: one return only postirive random numbers, second – only negatives.
Also simple executor.py
provided. It allows you to execute requests to the models and provide some feedback on it.
```bash
git clone https://github.com/puhoshville/cortex-multiarmed-bandit.git
```
Pls, make sure that you have AWS CLI installed.
For more information read this.
We have to install cortex explicitly through pip! No go-binary installation!
# install the CLI
pip install cortex
! More actual information you can find here: https://docs.cortex.dev
We have two separate files: model_a.py
and model_b.py
. Model A returns random positive numbers, Model B – negative.
So, we always can identify the model – it will help us later.
For each model we have to create separate docker container. For this purpose we use Docker's multi-stage builds.
This images use the base, but serves different models.
For building models:
docker build . --target model-a -t cortex-bandit:model-a
docker build . --target model-b -t cortex-bandit:model-b
To make sure image working correctly we can run it locally:
docker run --rm -it -p 8080:8080 cortex-bandit:model-a
And do some requests:
curl -X POST -H "Content-Type: application/json" -d '{"msg": "hello world"}' localhost:8080
We will se something like this:
$ curl -X POST -H "Content-Type: application/json" -d '{"msg": "hello world"}' localhost:8080
78
- Make sure, that aws cli tool is installed
- Login into AWS ECR
aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin <AWS_ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com
- Create repository (needed only once)
aws ecr create-repository --repository-name cortex-bandit
- Tag images
docker tag cortex-bandit:model-a <AWS_ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com/cortex-bandit:model-a docker tag cortex-bandit:model-b <AWS_ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com/cortex-bandit:model-b
- Push it
docker push <AWS_ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com/cortex-bandit:model-a docker push <AWS_ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com/cortex-bandit:model-b
Specify links <AWS_ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com/cortex-bandit:model-a
and
<AWS_ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com/cortex-bandit:model-b
in cortex.yaml
! If you are using Apple M1 core, please use this command to build and push docker images:
docker buildx build --platform linux/amd64 . --target model-a --push -t 385626522460.dkr.ecr.us-east-2.amazonaws.com/cortex-bandit:model-a
docker buildx build --platform linux/amd64 . --target model-b --push -t 385626522460.dkr.ecr.us-east-2.amazonaws.com/cortex-bandit:model-b
In cluster.yaml
you can find simple Kubernetes cluster configuration, which includes 1 or 2 instances of t3.large
type.
cortex cluster up cluster.yaml
Be patient! It can take a while!
For more information about cluster configuration look here
Specify your docker images links in cortex.yaml.
After that you can run this command:
cortex deploy cortex.yaml