You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enable Node Auto-Provisioning with L4 GPU capacity
Try to create a pod with nvidia.com/gpu resource request
The pod will be stuck in PodInitializing state
Analysis
Using GPU nodes with Node-Auto-Provisioning in GKE will try to run nvidia-gpu-device-plugin job on GPU nodes. It's only after this job finishes successfully that the node will have allocatable nvidia.com/gpu resources. However, this job is stuck, and no pod with GPU requests can be scheduled onto it.
$ kubectl -n kube-system get pod
kube-system nvidia-gpu-device-plugin-small-cos-4fpl4 0/2 Init:0/2 0 16m
$ kubectl -n kube-system logs nvidia-gpu-device-plugin-small-cos-hzvds -c nvidia-driver-installer
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 598 100 598 0 0 273k 0 --:--:-- --:--:-- --:--:-- 291k
GPU driver auto installation is disabled.
Waiting for GPU driver libraries to be available.
The text was updated successfully, but these errors were encountered:
How to reproduce
nvidia.com/gpu
resource requestAnalysis
Using GPU nodes with Node-Auto-Provisioning in GKE will try to run
nvidia-gpu-device-plugin
job on GPU nodes. It's only after this job finishes successfully that the node will have allocatablenvidia.com/gpu
resources. However, this job is stuck, and no pod with GPU requests can be scheduled onto it.The text was updated successfully, but these errors were encountered: