-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nvidia-driver-installer for CoreOS Container Linux #54
Add nvidia-driver-installer for CoreOS Container Linux #54
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed, please reply here (e.g.
|
1 similar comment
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed, please reply here (e.g.
|
I signed it! |
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. |
1 similar comment
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. |
a1e6b20
to
889ffb0
Compare
CLAs look good, thanks! |
1 similar comment
CLAs look good, thanks! |
889ffb0
to
0a93fb2
Compare
@lsjostro I'm trying this out right now. This only addresses the driver parts though, right? It's doesn't seem to install the libnvidia-container nor nvidia-container-runtime, correct? |
@discordianfish correct. We are using googles own nvidia k8s device plugin instead of the official nvidia plugin, which doesn't require a custom docker runtime. Works really well for us. Currently we have coreos nvidia installer Dockerfile hosted here and docker image here |
@lsjostro Ah thanks for that hint! So this installer container and the k8s device plugin should be enough? I thought it's needed since my pods still fail to schedule (Insufficient nvidia.com/gpu). I'm using the same k8s device plugin manifest. Guess I've been down the wrong route. When dropping the
The k8s plugin is running fine though:
Well, guess I have to dig deeper. Looks like the installer worked fine. Thanks a lot for that! |
I think with the approach of bind-mounting the shared libs from the host, once has to run ldconfig to update the ld cache before starting an application. The official digits container isn't doing that. |
make sure you add example here |
Yep, figured that out. Maybe the default should be changed, /home/kubernetes is an odd choice. |
I'd recommend having a single daemonset perform driver installation and run the device plugin and that way sharing driver artifacts are controlled via a single config file. On the other hand, we at Google do not have bandwidth to setup CI and maintain installations for additional OSes. I would like to not merge this until we identify a maintenance plan. |
@vishh thanks for the feedback! I totally understand that! We’ll host it here in the meanwhile https://github.com/shelmangroup/coreos-gpu-installer |
@lsjostro Thanks for the patch. Nvidia-drivers are installed on the V100. However, I am having difficulty in getting the runtime set to NVIDIA. Because of this, I am not able to run nvidia-docker or use the nvidia-smi inside the Kubernetes pod which needs GPU acceleration. Request you to help me out. CoreOS version : 2079.3.0 Also, I guess the docker tag has to be used as "master" in the daemonset.yml since "latest" is not being pulled. |
@amoghkashyap sure! mind coping the issue to https://github.com/shelmangroup/coreos-gpu-installer ? |
This PR adds nvidia driver installer for CoreOS Container Linux.