This repo containerizes BART into a serving container using fastapi.
CPU and GPU inference supported.
The model license can be found here
-
Clone repo if you haven't, Navigate to the
bart
folder. -
Build container. Don't forget to change the
project_id
to yours.docker build . -t gcr.io/{project_id}/bart:latest
-
Run container. No GPU is needed for this model.
docker run --rm -p 80:8080 -e AIP_HEALTH_ROUTE=/health -e AIP_HTTP_PORT=8080 -e AIP_PREDICT_ROUTE=/predict gcr.io/{project_id}/bart:latest
-
Make predictions
python test_container.py
You'll need to enable Vertex AI and have authenticated with a service account that has the Vertex AI admin or editor role.
-
Push the image
gcloud auth configure-docker docker push gcr.io/{project_id}/bart:latest
-
Deploy in Vertex AI Endpoints.
python ../gcp_deploy.py --image-uri gcr.io/<project_id>/bart:latest --accelerator-count 0 --model-name bart --endpoint-name bart-endpoint --endpoint-deployed-name bart-deployed-name
-
Test the endpoint.
python generate_request_vertex.py