Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenCL doesn't work with Nvidia #325378

Closed
diniamo opened this issue Jul 7, 2024 · 15 comments
Closed

OpenCL doesn't work with Nvidia #325378

diniamo opened this issue Jul 7, 2024 · 15 comments
Labels
0.kind: bug Something is broken

Comments

@diniamo
Copy link
Contributor

diniamo commented Jul 7, 2024

Describe the bug

OpenCL apps don't work with the Nvidia driver, even though the library files are clearly preset in /run/opengl-drivers

Steps To Reproduce

Steps to reproduce the behavior:

  1. Own an Nvidia GPU
  2. Try to run an OpenCL app, eg.: clinfo
  3. Error appears saying that there are no platforms

Expected behavior

A platform is returned by the driver, and the program works correctly.

Additional context

I have tried adding ocl-icd to hardware.graphics.extraPackages, but no luck.

Notify maintainers

@Kiskae @NickCao

Metadata

 - system: `"x86_64-linux"`
 - host os: `Linux 6.6.36, NixOS, 24.11 (Vicuna), 24.11.20240703.9f4128e`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Lix, like Nix) 2.90.0-beta.1`
 - nixpkgs: `/nix/store/dk2rpyb6ndvfbf19bkb2plcz5y3k8i5v-source`

Note: I am on the LTS kernel, because wlroots compositors and Hyprland don't work on latest+Nvidia for some reason.


Add a 👍 reaction to issues you find important.

@diniamo diniamo added the 0.kind: bug Something is broken label Jul 7, 2024
@NickCao
Copy link
Member

NickCao commented Jul 8, 2024

Could you try disabling the GSP firmware as suggested in #324252 (comment)

@diniamo
Copy link
Contributor Author

diniamo commented Jul 8, 2024

No luck, clinfo still outputs 0 platforms. Here is my /proc/cmdline for reference:

❯ cat /proc/cmdline
init=/nix/store/kqd6ca16w7gyrm77vzqjrds0li0sjz0y-nixos-system-diniamo-PC-24.11.20240703.9f4128e/init nvidia.NVreg_UsePageAttributeTable=1 nvidia.NVreg_InitializeSystemMemoryAllocations=0 nvidia.NVreg_EnableStreamMemOPs=1 nvidia.NVreg_RegistryDwords=__REGISTRYDWORDS nvidia.NVreg_EnableGpuFirmware=0 quiet splash systemd.unified_cgroup_hierarchy=0 loglevel=4 nvidia-drm.modeset=1 nvidia.NVreg_PreserveVideoMemoryAllocations=1 vt.default_red=0x24,0xed,0xa6,0xee,0x8a,0xc6,0x8b,0xca,0x49,0xed,0xa6,0xee,0x8a,0xc6,0x8b,0xf4 vt.default_grn=0x27,0x87,0xda,0xd4,0xad,0xa0,0xd5,0xd3,0x4d,0x87,0xda,0xd4,0xad,0xa0,0xd5,0xdb vt.default_blu=0x3a,0x96,0x95,0x9f,0xf4,0xf6,0xca,0xf5,0x64,0x96,0x95,0x9f,0xf4,0xf6,0xca,0xd6

@Kiskae
Copy link
Contributor

Kiskae commented Jul 8, 2024

Thats quite a lot of custom parameters for the nvidia kernel module...

Lets validate some things:

  • ls -la /run/opengl-driver/etc/OpenCL/vendors/
    • content of nvidia.icd if it exists
  • __RM_ENABLE_VERBOSE_OUTPUT=1 clinfo to see if the driver is having trouble
  • LD_DEBUG=libs clinfo very noisy, but should show clinfo loading the nvidia driver if it is working.

@diniamo
Copy link
Contributor Author

diniamo commented Jul 8, 2024

❯ cat /run/opengl-driver/etc/OpenCL/vendors/nvidia.icd
/nix/store/fgh9qwggvjlwqdyyyx60zgx7hybww2py-nvidia-x11-555.58.02-6.6.36/lib/libnvidia-opencl.so.1

Setting the __RM_ENABLE_VERBOSE_OUTPUT=1 doesn't change the output:

Number of platforms                               0

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.3.2
  ICD loader Profile                              OpenCL 3.0

Ahha, the last command's output includes this:

    145072:	/nix/store/fgh9qwggvjlwqdyyyx60zgx7hybww2py-nvidia-x11-555.58.02-6.6.36/lib/libnvidia-opencl.so.1: error: symbol lookup error: undefined symbol: clIcdGetPlatformIDsKHR (fatal)

@Kiskae
Copy link
Contributor

Kiskae commented Jul 8, 2024

I get that error as well, but after that it loads libcuda and works as intended.
Is there any indication that it tries to load libcuda?

@diniamo
Copy link
Contributor Author

diniamo commented Jul 8, 2024

No, that is the last line I'm pretty sure. Here is the full output in case I missed anything anyway: http://0x0.st/XBhc.txt

@diniamo
Copy link
Contributor Author

diniamo commented Jul 14, 2024

Looks like

boot.kernelParams = ["nvidia.NVreg_PreserveVideoMemoryAllocations=1"];

solves the issue.

Thanks to elFarto/nvidia-vaapi-driver#299 (comment)

Should I close?

@Kiskae
Copy link
Contributor

Kiskae commented Jul 14, 2024

Looks like

boot.kernelParams = ["nvidia.NVreg_PreserveVideoMemoryAllocations=1"];

solves the issue.

Thanks to elFarto/nvidia-vaapi-driver#299 (comment)

Should I close?

If you are no longer experiencing the issue, you can close it.

That kernel parameter is enabled by the powerManagement.enable option:

++ lib.optional cfg.powerManagement.enable "nvidia.NVreg_PreserveVideoMemoryAllocations=1"

As a personal curiosity, does this issue only happen if you suspended the machine? Or does it happen after a clean boot?

@diniamo
Copy link
Contributor Author

diniamo commented Jul 14, 2024

Huhh?? But I had that enabled.

@diniamo
Copy link
Contributor Author

diniamo commented Jul 14, 2024

Now it works without. I have no clue what fixed it then. The only other thing I did recently is update nixpkgs.

@Kiskae
Copy link
Contributor

Kiskae commented Jul 14, 2024

Now it works without. I have no clue what fixed it then. The only other thing I did recently is update nixpkgs.

Oh you're on nixos-unstable, is it possible you did an upgrade that updated the nvidia driver on a running system? Because that definitely isn't supported by nvidia.

EDIT: since you changed the kernel parameter, it might be the full restart that fixed it.

@diniamo
Copy link
Contributor Author

diniamo commented Jul 14, 2024

No, nothing to do with that.

The kernel parameter has always been set, since I had power management enabled. The reason I assume the update fixed it is because I recently did an update which I hadn't done in a long time, and I haven't done anything else related that could have fixed it.

Well either way, I'm closing.

@sagikazarmark
Copy link
Member

For the record, this is what solved the problem for me in the end:

  boot.kernelParams = [
    # Required to make OpenCL (and Davinci Resolve) work
    # https://github.com/NixOS/nixpkgs/issues/325378#issuecomment-2212732797
    # https://github.com/NixOS/nixpkgs/issues/324252#issuecomment-2205385051
    "nvidia.NVreg_EnableGpuFirmware=0"
  ];

  hardware.nvidia = {
    powerManagement = {
      enable = true;
      finegrained = true;
    };

    # Required to make OpenCL (and Davinci Resolve) work
    open = false;

    # Required for Wayland
    modesetting.enable = true;
  };

@diniamo
Copy link
Contributor Author

diniamo commented Oct 30, 2024

Yes, open = false solves it, I've been doing that for a while, forgot to mention it here though.

@sagikazarmark
Copy link
Member

I believe I needed the combination of open = false and the nvidia.NVreg_EnableGpuFirmware=0 kernel param.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken
Projects
None yet
Development

No branches or pull requests

4 participants