Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

packages/nixos: pin nvidia driver #955

Merged
merged 1 commit into from
Nov 4, 2024
Merged

Conversation

msanft
Copy link
Contributor

@msanft msanft commented Oct 28, 2024

The production nvidia driver in nixpkgs got updated to a version that's incompatible with H100s. This pins the nvidia driver to the specific version needed for H100s by using the upstream mkDriver abstraction that builds a driver compatible with the used kernel from exact driver sources.

@msanft msanft added the no changelog PRs not listed in the release notes label Oct 28, 2024
@msanft msanft requested a review from katexochen as a code owner October 28, 2024 12:07
@msanft msanft requested a review from Freax13 October 28, 2024 12:07
@Freax13
Copy link
Contributor

Freax13 commented Oct 28, 2024

The production nvidia driver in nixpkgs [...] [is] incompatible with H100s.

How do you know that? Are there any issues for this?

@msanft
Copy link
Contributor Author

msanft commented Oct 28, 2024

How do you know that? Are there any issues for this?

Things broke when we updated. This might not be true for use with H100s in general, but at least for the CC case. Maybe @derpsteb knows more about the exact driver compatibilities.

Copy link
Contributor

@Freax13 Freax13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things broke when we updated. This might not be true for use with H100s in general, but at least for the CC case. Maybe @derpsteb knows more about the exact driver compatibilities.

IIRC @derpsteb is on vacation for the next couple of weeks. I'm fine downgrading the version for now and delaying investigating whether or not/how we can upgrade.

packages/nixos/gpu.nix Outdated Show resolved Hide resolved
Copy link
Member

@katexochen katexochen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm beside #955 (comment)

The `production` nvidia driver in nixpkgs got updated to a version that's incompatible with H100s. This pins the nvidia driver to the specific version needed for H100s by using the upstream `mkDriver` abstraction that builds a driver compatible with the used kernel from exact driver sources.
@msanft msanft force-pushed the msanft/podvm-img/pin-nvidia-driver branch from 75a2d1b to 496accb Compare November 4, 2024 10:43
@msanft msanft merged commit 65d123f into main Nov 4, 2024
9 of 10 checks passed
@msanft msanft deleted the msanft/podvm-img/pin-nvidia-driver branch November 4, 2024 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no changelog PRs not listed in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants