Building fails when `nvidia-smi` is from a newer CUDA version than `nvcc` #8

FlippingBinary · 2024-08-14T20:12:50Z

This issue is similar to #4, but slightly different. In that issue, nvidia-smi wasn't present at all, so autodetection of the CUDA compute capability level failed.

This issue occurs when the nvidia-smi is newer than nvcc.

In my Windows 11 environment, nvcc --list-gpu-code reports:

sm_35
sm_37
sm_50
sm_52
sm_53
sm_60
sm_61
sm_62
sm_70
sm_72
sm_75
sm_80
sm_86
sm_87

As you can see, the highest version is 87, and 89 is missing. But nvidia-smi --query-gpu=compute_cap --format=csv reports:

compute_cap
8.9

This apparently corresponds to 89, so the build process panics.

The environment became out of synch accidentally because I installed CUDA 11.7 quite a while ago and recently installed (or partially installed) 12.5.

As I understand it, there are two ways of fixing this in my environment. Either I can cleanup my installation so both applications report matching versions, or I can set the CUDA_COMPUTE_CAP environment variable during build to be 87 (or one of the other listed versions).

However, I noticed that a candle developer left a comment on a related issue saying they expected the version conflict to automatically resolve by dropping down to the highest supported version instead of the one reported by nvidia-smi. That seems like reasonable behavior to me.

Perhaps this code:

bindgen_cuda/src/lib.rs

Lines 526 to 535 in a6b0c89

    
           if !supported_nvcc_codes.contains(&compute_cap) { 
        
               panic!( 
        
                   "nvcc cannot target gpu arch {compute_cap}. Available nvcc targets are {supported_nvcc_codes:?}." 
        
               ); 
        
           } 
        
           if compute_cap > max_nvcc_code { 
        
               panic!( 
        
                   "CUDA compute cap {compute_cap} is higher than the highest gpu code from nvcc {max_nvcc_code}" 
        
               ); 
        
           }

Could be changed to:

if compute_cap > max_nvcc_code {
    if std::env::var("CUDA_COMPUTE_CAP").is_none() {
        // The environment variable was not set, so assume the user wants it to "just work", but warn them anyway
        warn!(
            "CUDA compute cap {compute_cap} is higher than the highest gpu code {max_nvcc_code} from nvcc. Using {max_nvcc_code}."
        );
        compute_cap = max_nvcc_code;
    } else {
        // The environment variable was set, so assume the user wants to know when their requested version isn't available
        panic!(
            "CUDA compute cap {compute_cap} is higher than the highest gpu code from nvcc {max_nvcc_code}"
        );
    }
}
if !supported_nvcc_codes.contains(&compute_cap) {
    panic!(
        "nvcc cannot target gpu arch {compute_cap}. Available nvcc targets are {supported_nvcc_codes:?}."
    );
}

(please forgive minor errors. I typed this in the browser)

Does this sound like a reasonable solution?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building fails when `nvidia-smi` is from a newer CUDA version than `nvcc` #8

Building fails when `nvidia-smi` is from a newer CUDA version than `nvcc` #8

FlippingBinary commented Aug 14, 2024

Building fails when nvidia-smi is from a newer CUDA version than nvcc #8

Building fails when nvidia-smi is from a newer CUDA version than nvcc #8

Comments

FlippingBinary commented Aug 14, 2024

Building fails when `nvidia-smi` is from a newer CUDA version than `nvcc` #8

Building fails when `nvidia-smi` is from a newer CUDA version than `nvcc` #8