Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: build nvidia open source kernel module #220

Merged
merged 6 commits into from
Aug 7, 2024

Conversation

p5
Copy link
Member

@p5 p5 commented Jul 22, 2024

Enable the open kernel module builds.
This should not affect anything downstream. Just gets us ready for the switch next Nvidia driver release.

Please note: I have not yet tried booting into an image with this driver.

@p5 p5 marked this pull request as ready for review July 22, 2024 09:12
@p5 p5 requested a review from castrojo as a code owner July 22, 2024 09:12
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this Containerfile because I didn't want this PR to be rewriting the workflow logic too. If we wanted to use the same Containerfile, we would need to rework the GHA jobs to supply additional build args only for Nvidia.

castrojo
castrojo previously approved these changes Jul 22, 2024
@p5 p5 enabled auto-merge July 22, 2024 22:10
@dylanmtaylor
Copy link
Contributor

dylanmtaylor commented Jul 22, 2024

"For cutting-edge platforms such as NVIDIA Grace Hopper or NVIDIA Blackwell, you must use the open-source GPU kernel modules. The proprietary drivers are unsupported on these platforms.

For newer GPUs from the Turing, Ampere, Ada Lovelace, or Hopper architectures, NVIDIA recommends switching to the open-source GPU kernel modules."
https://developer.nvidia.com/blog/nvidia-transitions-fully-towards-open-source-gpu-kernel-modules/

I think we should invert this -- default to nvidia-open with nvidia-closed being the edge case for older GPUs as that is what Nvidia is pushing for.

This wouldn't affect this PR per-se, as the default choice would be set in the downstream image builds.


akmods --force --kernels "${KERNEL_VERSION}" --kmod "nvidia"

modinfo /usr/lib/modules/${KERNEL_VERSION}/extra/nvidia/nvidia{,-drm,-modeset,-peermem,-uvm}.ko.xz > /dev/null || \
(cat /var/cache/akmods/nvidia/${NVIDIA_AKMOD_VERSION}-for-${KERNEL_VERSION}.failed.log && exit 1)

# View license information
modinfo -l /usr/lib/modules/${KERNEL_VERSION}/extra/nvidia/nvidia{,-drm,-modeset,-peermem,-uvm}.ko.xz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this lists it out, can we do a condition check to make sure that the correct licensed kmod was built given the input.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we requested to do a check here but we didn't?

@@ -0,0 +1,63 @@
###
### Containerfile.nvidia - used to build ONLY NVIDIA kmods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just symlink this instead of it's going to just be an exact copy of the Nvidia containerfile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it's separate but it could be an if in the containerfile

Containerfile.nvidia Outdated Show resolved Hide resolved
KyleGospo
KyleGospo previously approved these changes Aug 7, 2024
castrojo
castrojo previously approved these changes Aug 7, 2024
@KyleGospo KyleGospo dismissed stale reviews from castrojo and themself via 3dbeaf2 August 7, 2024 22:52
@KyleGospo KyleGospo disabled auto-merge August 7, 2024 22:53
Containerfile.nvidia-open Outdated Show resolved Hide resolved
@p5
Copy link
Member Author

p5 commented Aug 7, 2024

LGTM! (Can't approve my own PR)

@KyleGospo KyleGospo added this pull request to the merge queue Aug 7, 2024
Merged via the queue into main with commit ab12663 Aug 7, 2024
42 checks passed
@KyleGospo KyleGospo deleted the enable-nvidia-open-gpu-builds branch August 7, 2024 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants