Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/nvidia: various fixes #337289

Merged
merged 3 commits into from
Aug 25, 2024
Merged

Conversation

Kiskae
Copy link
Contributor

@Kiskae Kiskae commented Aug 25, 2024

Description of changes

Cleaning up my backlog

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: documentation This PR adds or changes documentation 8.has: changelog 8.has: module (update) This PR changes an existing module in `nixos/` labels Aug 25, 2024
@Kiskae Kiskae requested review from NickCao and kanashimia August 25, 2024 16:32
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 101-500 labels Aug 25, 2024
@Kiskae Kiskae force-pushed the nvidia/fixes_2024_08_25 branch from d9dc6df to ef3b6ad Compare August 25, 2024 19:51
@Kiskae Kiskae requested a review from NickCao August 25, 2024 19:52
Copy link
Member

@NickCao NickCao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, one question is: do cfg.gsp.enable require additional kernel parameters on non-open drivers?

@Kiskae
Copy link
Contributor Author

Kiskae commented Aug 25, 2024

Overall LGTM, one question is: do cfg.gsp.enable require additional kernel parameters on non-open drivers?

Nope, per #313440:

The GSP firmware is now used by default on all GPUs which support it. It can be disabled by setting the kernel module parameter NVreg_EnableGpuFirmware=0.

Copy link
Member

@NickCao NickCao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested evaluating various permutations of the config.

@NickCao NickCao merged commit 2a7a221 into NixOS:master Aug 25, 2024
23 checks passed
@Kiskae Kiskae deleted the nvidia/fixes_2024_08_25 branch August 25, 2024 21:45
@jcaesar
Copy link
Contributor

jcaesar commented Aug 28, 2024

Is it possible that this is causing breakage, by removing the default value for hardware.nvidia.open?

I'm seeing

       … while evaluating the option `hardware.nvidia.gsp.enable':
       … while evaluating the option `hardware.nvidia.open':
       error: The option `hardware.nvidia.open' is used but not defined.

(See nix eval --show-trace github:jcaesar/sysflake/943e73289543eba136640505343ef4cfcacd6b8f#nixosConfigurations.korsika.config.system.build.toplevel for the full error I'm seeing on my flake.)

Re-adding that makes the error go away:

--- a/nixos/modules/hardware/video/nvidia.nix
+++ b/nixos/modules/hardware/video/nvidia.nix
@@ -256,6 +256,7 @@ in

       open = lib.mkOption {
         example = true;
+        default = false;
         description = "Whether to enable the open source NVIDIA kernel module.";
         type = lib.types.bool;
         defaultText = lib.literalExpression ''

Was removing the default intentional? If not, I'll do a PR.

(For now, I'll just set config.hardware.nvidia.open = false; in my flake. It's a bit odd that I seem to be the only one?)

@Kiskae
Copy link
Contributor Author

Kiskae commented Aug 28, 2024

Yes, that breakage was intentional. Users need to explicitly pick which version of the driver they want since nvidia recommends the open one, but that is incompatible with older drivers. (16xx and earlier if I recall)

@superherointj
Copy link
Contributor

superherointj commented Aug 29, 2024

[...] It's a bit odd that I seem to be the only one?)

Nope.

image

@Mindavi
Copy link
Contributor

Mindavi commented Aug 29, 2024

Heh, I guess I expected a nicer error message. Anyway, I quickly found it in the release notes so maybe that's good enough.

@CertainLach
Copy link
Member

CertainLach commented Aug 29, 2024

Given that proprietary drivers will go away some time, wouldn't it better to set default value of nvidia.open to nvidia.package.preferOpen (This flag does not yet exists, it should be added and set to true for recent versions) depending on stateVersion?
Thus making all the new installations default to open-source driver for new gpus, and remain closed for older gpus

config.nvidia.open = mkDefault (config.system.stateVersion > "24.XX" && config.nvidia.package.preferOpen);

@Mindavi
Copy link
Contributor

Mindavi commented Aug 29, 2024

I think I´d prefer the build failure over introducing stateVersion switches for this. Doing a new installation doesn't mean that your PC is new.

@CertainLach
Copy link
Member

Doing a new installation doesn't mean that your PC is new.

That's why I propose to add preferOpen flag to the package, only newest drivers, which have the same supported hardware list (If I recall correctly) will default to open versions.

@necauqua
Copy link
Contributor

necauqua commented Aug 29, 2024

Found this because of the option used but not defined error, would've preferred if it had a nicer message (like such breaking changes often do), something like "blah-blah, nvidia opensourced driver, for newer cards <the condition of 'newer'>, you'd want to set this new mandatory option to true" instead of (as it also sometimes happens) searching the nixos github for the context - could've prevented a couple of issues already created for this

eg-ayoub added a commit to eg-ayoub/nixos-hardware that referenced this pull request Aug 30, 2024
This fixes error induced intentionally, see NixOS/nixpkgs#337289 (comment)

Signed-off-by: Ayoub Nasr <[email protected]>
open = lib.mkOption {
example = true;
description = "Whether to enable the open source NVIDIA kernel module.";
type = lib.types.bool;
Copy link
Member

@Mic92 Mic92 Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this affect quite a few people, we might want to introduce some guiding message here. If we also accept "unset" here, then an assertion could reproduce the changelog above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see how people will get confused here: NixOS/nixos-hardware#1092 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we also accept "unset" here, then an assertion could reproduce the changelog above.

That's the original version of this pr, accepting null. But then we would have to cast nullOr types.bool to types.bool on every place referencing cfg.open, or adding a pretty ugly isOpen indirection. I would rather prefer improving the error message for all mandatory options as in #338362

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this affect quite a few people, we might want to introduce some guiding message here. If we also accept "unset" here, then an assertion could reproduce the changelog above.

Yep, just helped someone that updated their flake and got this issue. Would love to see a PR to make this message cleaner, at least for the short term.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/hardware-nvidia-open-is-used-but-not-defined-error-when-updating-nixos-flake-config/51359/7

getchoo added a commit to getchoo-contrib/nixpkgs that referenced this pull request Sep 16, 2024
This requirement was introduced in
NixOS#337289 as a way to make sure users
"explicitly pick which version of the driver they want since nvidia
recommends the open one, but that is incompatible with older drivers".
This is reasonable, however the user isn't informed in any real way
aside from the upcoming release notes

This has caused a
[good](NixOS#337289 (comment))
[amount](NixOS#337289 (comment))
[of](NixOS#338196)
[confusion](NixOS/nixos-hardware#1092) amongst
users. By introducing this assertion and using a new `useOpenModules`
local variable, we can have the same behavior but display a proper error
message to hopefully clear things up until we can safely make this a
default
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: changelog 8.has: documentation This PR adds or changes documentation 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 101-500
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants