You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
After following the docker-compose-triton-gpu.yml instructions for the pytorch example the server fails to spin up. The service fails due to the following error:
model_repository_manager.cc:1152] failed to load 'test_model_pytorch' version 1: Internal: unable to create stream: the provided PTX was compiled with an unsupported toolchain.
To fix this issue I had to update my Nvidia drivers to 510.
Okay just for clarity...
Originally, my Nvidia drivers were running on an incompatible version for the triton server. To figure this out I just ran the Nvidia Triton image on docker:
docker run -it --gpus=all nvcr.io/nvidia/tritonserver:22.02-py3
If you get the following error:
This container was built for NVIDIA Driver Release 510.39 or later, but version 470.103.01 was detected and compatibility mode is UNAVAILABLE.
You'll need to update your Nvidia drivers.
To fix this issue I updated the drivers on my base OS i.e.
sudo apt install nvidia-driver-510 -y
sudo reboot
Then it worked. The docker-compose logs from clearml-serving-triton container did not make this clear (i.e. by running docker-compose -f docker/docker-compose-triton-gpu.yml logs -f) might be good to throw this as an error in the logs
Thanks @rg314 !
This is exactly the fix.
BTW, notice we just released v1.0.0, there is no need to change the Nvidia drivers (v510+), the Triton version is now 22.04,
but based Nvidia's release notes the next version of Triton (22.05) will need another driver bump
Describe the bug
After following the
docker-compose-triton-gpu.yml
instructions for the pytorch example the server fails to spin up. The service fails due to the following error:To Reproduce
Steps to reproduce the behavior:
clearml-serving/examples/pytorch/readme.md
Line 1 in 827905c
Expected behavior
The service spins up without the
model_repository_manager.cc:1152
error message.Screenshots
n/a
Desktop (please complete the following information):
docker --version & docker-compose --version
)docker --version & docker-compose --version [1] 1611412 Docker version 20.10.16, build aa7e414 docker-compose version 1.29.2, build 5becea4c [1]+ Done docker --version
Additional context
See similar issue here: triton-inference-server/server#3877
The text was updated successfully, but these errors were encountered: