Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failing to load driver nvml from wsl2 #51

Open
tgruben opened this issue Nov 10, 2023 · 3 comments
Open

failing to load driver nvml from wsl2 #51

tgruben opened this issue Nov 10, 2023 · 3 comments
Labels

Comments

@tgruben
Copy link

tgruben commented Nov 10, 2023

I am just getting started in rust development so I may just need some guidance. I have developed a simple program which just dumps the details of the driver that it loads, on my ubuntu 22.04 lts notebook i run my app and the output is (and as expected)

Device 0: "NVIDIA GeForce GTX 1060 with Max-Q Design"
Memory Info MemoryInfo { free: 6220742656, total: 6442450944, used: 221708288 }
Clock Info 405
Num Cores 1280

and nvidia-smi reports

nvidia-smi
Fri Nov 10 09:29:24 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.06              Driver Version: 545.23.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1060 ...    On  | 00000000:01:00.0 Off |                  N/A |
| N/A   45C    P8               5W /  60W |    139MiB /  6144MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1513      G   /usr/lib/xorg/Xorg                           81MiB |
|    0   N/A  N/A      2378      G   /usr/bin/gnome-shell                         55MiB |
+---------------------------------------------------------------------------------------+

no when i go and comile and run my app on my windows 11 wsl2 ubuntu 22.04 lts instance I get

Error: DriverNotLoaded

however when i run nvidia-smi i am presented with a driver. I was guessing since smi worked that the driver was accessible.

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.06              Driver Version: 546.01       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        On  | 00000000:07:00.0 Off |                  N/A |
| 43%   25C    P8               7W / 170W |    500MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A       333      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

My question is there any additional path or environment variables that I need to setup in order to get this to work or is it simply an issue with wsl2?

just for fun i'm including the code becuase its very basic

extern crate nvml_wrapper as nvml;
use nvml::Nvml;
use nvml::error::NvmlError;
use nvml_wrapper::enum_wrappers::device::Clock;
fn main() -> Result<(),NvmlError>{

    let nvml = Nvml::init()?;
    let device_count = nvml.device_count()?;
    for di in 0..device_count{
        let device = nvml.device_by_index(di)?;
        println!("Device {}: {:?}",di,device.name()?);
        println!("Memory Info {:?}",device.memory_info()?);
        println!("Clock Info {:?}",device.clock_info(Clock::Memory)?);
        println!("Num Cores {:?}",device.num_cores()?);
    }
    Ok(())
}
@Cldfire
Copy link
Owner

Cldfire commented Feb 10, 2024

Hello! I do apologize for the delay in replying here 😅 but I hope your time with Rust has been fun so far!

I'm not entirely sure why you're getting the DriverNotLoaded error in this scenario. I get the following error when running the basic_usage example in this repository via cargo run --example basic_usage in WSL2:

Error: LibloadingError(DlOpen { desc: "libnvidia-ml.so: cannot open shared object file: No such file or directory" })

and, if we strace nvidia-smi, we see:

openat(AT_FDCWD, "/usr/lib/wsl/drivers/nv_dispi.inf_amd64_31dab972145ae5a9/libnvidia-ml.so.1", O_RDONLY|O_CLOEXEC) = 4

By default nvml-wrapper looks for libnvidia-ml.so on Linux systems; but in WSL the library is named libnvidia-ml.so.1. The name mismatch is causing nvml-wrapper to be unable to find the library, and we can fix that by initializing NVML like so:

    let nvml = Nvml::builder()
        .lib_path("libnvidia-ml.so.1".as_ref())
        .init()?;

Then I got a bit farther, but I hit another error:

Error: NotSupported

I was able to work around that by removing:

let cuda_cores = device.num_cores()?;

from the basic_usage example, and then I was able to get the following output in WSL:

Your NVIDIA GeForce RTX 3080 (architecture: Ampere, CUDA cores: ) is currently sitting at 59 °C with a graphics clock of 1800 MHz and a memory clock of 9501 MHz. Memory usage is 1.82 GB out of an available 10.74 GB. Right now the device is connected via a PCIe gen 4 x16 interface with a transfer rate of 16 GT/s per lane; the max your hardware supports is PCIe gen 4 x16 at a transfer rate of 16 GT/s per lane.

This device is not on a multi-GPU board.

System CUDA version: 12.3

Let me know if that helps!

@dmitryduev
Copy link

Hi! In the official Go bindings for NVML, they use libnvidia-ml.so.1: https://github.com/NVIDIA/go-nvml/blob/0e815c71ca6e8184387d8b502b2ef2d2722165b9/pkg/nvml/lib.go#L30. I think it's the same in pynvml bindings.
Maybe change the default in Nvml::init() as well?

@dmitryduev
Copy link

Put up a PR: #63

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants