Uses cool Azure features (ACI) to run compute worker docker container in serverless environment:
Adds support for nvidia GPUs
Adds support for real time detailed results
Note: this will make a /tmp/codalab
directory
mkdir -p /tmp/codalab && docker run \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /tmp/codalab:/tmp/codalab \
-d \
--name compute_worker \
--env BROKER_URL=<queue broker url> \
--restart unless-stopped \
--log-opt max-size=50m \
--log-opt max-file=3 \
codalab/competitions-v1-compute-worker:latest
Edit .env_sample
and save it as .env
Make sure the temp directory you select is created and pass it in this command
docker run \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /tmp/codalab:/tmp/codalab \
-d \
--name compute_worker \
--env-file .env \
--restart unless-stopped \
--log-opt max-size=50m \
--log-opt max-file=3 \
codalab/competitions-v1-compute-worker:latest
Make sure that you have nvidia-container-toolkit set up -- this also involves updating to Docker 19.03 and installing NVIDIA drivers.
Edit .env_sample
and save it as .env
. Make sure to uncomment USE_GPU=True
.
Then make sure the temp directory you select is created and pass it in this command
docker run \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /tmp/codalab:/tmp/codalab \
-d \
--name compute_worker \
--env-file .env \
--restart unless-stopped \
--log-opt max-size=50m \
--log-opt max-file=3 \
--gpus all \
codalab/competitions-v1-compute-worker:latest
$ docker logs -f compute_worker
$ docker kill compute_worker
To re-build the image:
docker build -t competitions-v1-compute-worker .
Updating the image
docker build -t codalab/competitions-v1-compute-worker:latest .
docker push codalab/competitions-v1-compute-worker
Default False, does not pass --gpus all
flag
Note: Also requires Docker v19.03 or greater, nvidia-container-toolkit, and NVIDIA drivers.
Default /tmp/codalab
Default /tmp/cache
Default socket.gethostname()
Default False
Sometimes it may be useful to pause the compute worker and return instead of finishing a submission. This leaves the submission in a state where it hasn't been cleaned up yet and you can attempt to re-run it manually.