Update CUDA docs to use k3s suggested method #1430

dbreyfogle · 2024-04-15T00:18:27Z

What

This PR updates the existing documentation for how to run CUDA workloads on k3d.

Why

It's mentioned in this issue that the documentation is outdated and will run into errors without some edits to the config files. The issue creator suggested some edits the config files to work around the errors. Some of them are implemented in this PR, but there is also a way to get k3s to automatically detect the NVIDIA container runtime without the need for a custom config.toml file. This should result in a more robust solution that will smoothly handle upgrades. This is also pointed out in a comment on the same issue.

The k3s documentation outlines how this can be setup, and this PR basically just implements those suggestions and cleans up sections that became obsolete. The NVIDIA container runtime should be automatically detected as long as you add a RuntimeClass definition to your cluster, and deploy Pods that explicitly request the appropriate runtime by setting runtimeClassName: nvidia in the Pod spec.

Implications

No internal changes here.

Using this new method, you have to explicitly define runtimeClassName: nvidia in the Pod spec for a GPU setup, which you did not have to do before. This is the suggested method in the k3s docs however.

It also appears that this method will work in WSL2. I have run basic tests on my local setup (Debian and WSL2) and did not run into any issues.

Fixes #658
Fixes #1108

iwilltry42 · 2024-04-15T05:10:51Z

Hi @dbreyfogle , thanks a lot for taking the time and writing this up!
I cannot do extensive testing of this right now, but given that the current version of the documentation is non-fuctional, I'm pretty sure that we'll get more feedback on this by just putting this out there.

Thank you 🙏

allcontributors · 2024-04-15T05:45:00Z

@dbreyfogle

I've put up a pull request to add @dbreyfogle! 🎉

allcontributors · 2024-04-17T06:32:14Z

@dbreyfogle

I could not determine your intention.

Basic usage: @all-contributors please add @Someone for code, doc and infra

For other usages see the documentation

dbreyfogle added 4 commits April 14, 2024 15:35

update to v0.15.0-rc.2

710535a

add RuntimeClass as suggested in k3s docs

c958f17

Update to use nvidia container toolkit. Custom config.toml not needed

d1a58bf

update cuda docs

dff7547

dbreyfogle mentioned this pull request Apr 15, 2024

[ENHANCEMENT] Docs: Update CUDA Guide #658

Closed

iwilltry42 self-requested a review April 15, 2024 05:08

iwilltry42 approved these changes Apr 15, 2024

View reviewed changes

iwilltry42 merged commit e9babb7 into k3d-io:main Apr 15, 2024

iwilltry42 mentioned this pull request Apr 15, 2024

How to use GPU with k3d #1108

Closed

allcontributors bot mentioned this pull request Apr 15, 2024

docs: add dbreyfogle as a contributor for doc #1431

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update CUDA docs to use k3s suggested method #1430

Update CUDA docs to use k3s suggested method #1430

dbreyfogle commented Apr 15, 2024 •

edited by iwilltry42

Loading

iwilltry42 commented Apr 15, 2024

allcontributors bot commented Apr 15, 2024

allcontributors bot commented Apr 17, 2024

Update CUDA docs to use k3s suggested method #1430

Update CUDA docs to use k3s suggested method #1430

Conversation

dbreyfogle commented Apr 15, 2024 • edited by iwilltry42 Loading

What

Why

Implications

iwilltry42 commented Apr 15, 2024

allcontributors bot commented Apr 15, 2024

allcontributors bot commented Apr 17, 2024

dbreyfogle commented Apr 15, 2024 •

edited by iwilltry42

Loading