Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building fails when nvidia-smi is from a newer CUDA version than nvcc #8

Open
FlippingBinary opened this issue Aug 14, 2024 · 0 comments

Comments

@FlippingBinary
Copy link

This issue is similar to #4, but slightly different. In that issue, nvidia-smi wasn't present at all, so autodetection of the CUDA compute capability level failed.

This issue occurs when the nvidia-smi is newer than nvcc.

In my Windows 11 environment, nvcc --list-gpu-code reports:

sm_35
sm_37
sm_50
sm_52
sm_53
sm_60
sm_61
sm_62
sm_70
sm_72
sm_75
sm_80
sm_86
sm_87

As you can see, the highest version is 87, and 89 is missing. But nvidia-smi --query-gpu=compute_cap --format=csv reports:

compute_cap
8.9

This apparently corresponds to 89, so the build process panics.

The environment became out of synch accidentally because I installed CUDA 11.7 quite a while ago and recently installed (or partially installed) 12.5.

As I understand it, there are two ways of fixing this in my environment. Either I can cleanup my installation so both applications report matching versions, or I can set the CUDA_COMPUTE_CAP environment variable during build to be 87 (or one of the other listed versions).

However, I noticed that a candle developer left a comment on a related issue saying they expected the version conflict to automatically resolve by dropping down to the highest supported version instead of the one reported by nvidia-smi. That seems like reasonable behavior to me.

Perhaps this code:

bindgen_cuda/src/lib.rs

Lines 526 to 535 in a6b0c89

if !supported_nvcc_codes.contains(&compute_cap) {
panic!(
"nvcc cannot target gpu arch {compute_cap}. Available nvcc targets are {supported_nvcc_codes:?}."
);
}
if compute_cap > max_nvcc_code {
panic!(
"CUDA compute cap {compute_cap} is higher than the highest gpu code from nvcc {max_nvcc_code}"
);
}

Could be changed to:

if compute_cap > max_nvcc_code {
    if std::env::var("CUDA_COMPUTE_CAP").is_none() {
        // The environment variable was not set, so assume the user wants it to "just work", but warn them anyway
        warn!(
            "CUDA compute cap {compute_cap} is higher than the highest gpu code {max_nvcc_code} from nvcc. Using {max_nvcc_code}."
        );
        compute_cap = max_nvcc_code;
    } else {
        // The environment variable was set, so assume the user wants to know when their requested version isn't available
        panic!(
            "CUDA compute cap {compute_cap} is higher than the highest gpu code from nvcc {max_nvcc_code}"
        );
    }
}
if !supported_nvcc_codes.contains(&compute_cap) {
    panic!(
        "nvcc cannot target gpu arch {compute_cap}. Available nvcc targets are {supported_nvcc_codes:?}."
    );
}

(please forgive minor errors. I typed this in the browser)

Does this sound like a reasonable solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant