Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CUDA docs to use k3s suggested method #1430

Merged
merged 4 commits into from
Apr 15, 2024
Merged

Update CUDA docs to use k3s suggested method #1430

merged 4 commits into from
Apr 15, 2024

Conversation

dbreyfogle
Copy link
Contributor

@dbreyfogle dbreyfogle commented Apr 15, 2024

What

This PR updates the existing documentation for how to run CUDA workloads on k3d.

Why

It's mentioned in this issue that the documentation is outdated and will run into errors without some edits to the config files. The issue creator suggested some edits the config files to work around the errors. Some of them are implemented in this PR, but there is also a way to get k3s to automatically detect the NVIDIA container runtime without the need for a custom config.toml file. This should result in a more robust solution that will smoothly handle upgrades. This is also pointed out in a comment on the same issue.

The k3s documentation outlines how this can be setup, and this PR basically just implements those suggestions and cleans up sections that became obsolete. The NVIDIA container runtime should be automatically detected as long as you add a RuntimeClass definition to your cluster, and deploy Pods that explicitly request the appropriate runtime by setting runtimeClassName: nvidia in the Pod spec.

Implications

No internal changes here.

Using this new method, you have to explicitly define runtimeClassName: nvidia in the Pod spec for a GPU setup, which you did not have to do before. This is the suggested method in the k3s docs however.

It also appears that this method will work in WSL2. I have run basic tests on my local setup (Debian and WSL2) and did not run into any issues.

Fixes #658
Fixes #1108

@iwilltry42
Copy link
Member

Hi @dbreyfogle , thanks a lot for taking the time and writing this up!
I cannot do extensive testing of this right now, but given that the current version of the documentation is non-fuctional, I'm pretty sure that we'll get more feedback on this by just putting this out there.

Thank you 🙏

Copy link
Contributor

@dbreyfogle

I've put up a pull request to add @dbreyfogle! 🎉

Copy link
Contributor

@dbreyfogle

I could not determine your intention.

Basic usage: @all-contributors please add @Someone for code, doc and infra

For other usages see the documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to use GPU with k3d [ENHANCEMENT] Docs: Update CUDA Guide
2 participants