Relax cuda-python run dependencies #66

bdice · 2024-01-30T22:56:23Z

Solution to issue cannot be found in the documentation.

I checked the documentation.

Issue

I faced an issue with cuda-python, where the run dependencies were requiring cuda-cudart and cuda-nvrtc and resulting in an undesired environment solution that crossed channel priorities. rapidsai/integration#695

The current run dependencies for linux-64/cuda-python-12.3.0-py38hd750a71_0.conda are:

__glibc >=2.17,<3.0.a0, cuda-cudart >=12.3.101,<13.0a0, cuda-nvrtc >=12.3.107,<13.0a0, cuda-version >=12,<13.0a0, libgcc-ng >=12, libstdcxx-ng >=12, python >=3.8,<3.9.0a0, python_abi 3.8.* *_cp38

However, it doesn't seem that the cuda-python package needs the run dependencies on cuda-cudart or cuda-nvrtc. They're not dynamically linked, and they're not explicitly listed as run requirements. They are coming from run-exports of dependencies in the host section:

cuda-python-feedstock/recipe/meta.yaml

Lines 47 to 49 in a5117f8

    
           host: 
        
             - cuda-cudart-dev 
        
             - cuda-nvrtc-dev

From discussions with @jakirkham, the cuda-python package should be able to leverage CUDA Enhanced Compatibility, meaning that we should be able to add ignore_run_exports_from for dev packages like cuda-cudart-dev.

Installed packages

n/a

Environment info

n/a

The text was updated successfully, but these errors were encountered:

leofang · 2024-01-30T23:56:16Z

However, it doesn't seem that the cuda-python package needs the run dependencies on cuda-cudart or cuda-nvrtc. They're not dynamically linked, and they're not explicitly listed as run requirements.

CUDA Python reimplements CUDA runtime so not depending on cuda-cudart is fine. But it dynamically links to NVRTC so we still need to keep the latter in the run section.

jakirkham · 2024-01-31T00:09:09Z

But it dynamically links to NVRTC so we still need to keep the latter in the run section.

Looking at the build logs, see the following on CI:

WARNING (cuda-python): dso library package conda-forge/linux-64::cuda-nvrtc==12.3.107=hd3aeb46_0 in requirements/run but it is not used (i.e. it is overdepending or perhaps statically linked? If that is what you want then add it to `build/ignore_run_exports`)
WARNING (cuda-python): dso library package conda-forge/linux-64::cuda-cudart==12.3.101=hd3aeb46_0 in requirements/run but it is not used (i.e. it is overdepending or perhaps statically linked? If that is what you want then add it to `build/ignore_run_exports`)

Also seeing this dlopen logic for nvrtc

Though please let us know if we are still missing something

leofang · 2024-01-31T00:13:17Z

I am not sure what would that be different from what I said above. Are you suggesting NVRTC should be listed in run_constrained not run? If so, I am fine with it.

jakirkham · 2024-01-31T00:21:34Z

It doesn't appear to be dynamically linking to nvrtc. It is using dlopen

cuda-nvrtc can still be in run, but cuda-nvrtc doesn't need a pin (it is not dynamically linked)

leofang · 2024-01-31T00:35:32Z

OK, sorry for using the wrong terminology if we want to be pedantic here. It's "runtime" not "dynamically" linked. But it doesn't change the fact that cuda-nvrtc needs to be at least added to run_constained using pin_compatible, can we agree on this? Otherwise, if there's any API breaking change in NVRTC (despite highly unlikely), the Python binding would not work and all sorts of weird errors could happen.

This was similarly done in CuPy v13, where everything is also technically runtime linked (through the Python import system instead of dlopen), and we've seen and agreed with the treatment there.

jakirkham · 2024-01-31T00:44:33Z

Dynamic linking and dlopen have important differences that affect the behavior of the resulting binary. Think it is important for us to distinguish the two

If you have not already, could you please read over the code linked above ( #66 (comment) )? The dlopen logic is including NVRTC's SONAME there. If NVRTC were to break their API/ABI, then they would change the SONAME (meaning it wouldn't be dlopened in this case)

leofang · 2024-01-31T00:56:24Z

If you have not already, could you please read over the code linked above ( #66 (comment) )? The dlopen logic is including NVRTC's SONAME there.

Sorry, but I have a you-know-which project that implements exactly the same approach (after studying CUDA Python carefully) so you can assume my utter familiarity with the code. I know what I was talking (and I can tell you where CUDA Python can improve) 🙂

The point is, in order to implement runtime linking through dlopen/dlsym, we need to make strong assumptions about the ABI compatibility (all function signatures must not change, all struct sizes and layouts must not change, etc). I am against us lowering the guard only in CUDA Python, while keeping a high bar for safety considerations elsewhere. The standard treatment in conda-forge would have been

run_constrained:
  - {{ pin_compatible('cuda-nvrtc', max_pin='x') }}

to say the least. What's the issue that we can't implement this change?

If NVRTC were to break their API/ABI, then they would change the SONAME (meaning it wouldn't be dlopened in this case)

This is not necessarily true. I've seen multiple CUDA API/ABI incidents in the past, in both CUDA runtime and libraries. There was no one-size-fit-all sanity checks that could possibly capture all these accidents, and what we do on the packaging side is the last line of defense. What I am asking is simply to proceed with caution.

jakirkham · 2024-01-31T01:06:53Z

If you have not already, could you please read over the code linked above ( #66 (comment) )? The dlopen logic is including NVRTC's SONAME there.

Sorry, but I have a you-know-which project that implements exactly the same approach (after studying CUDA Python carefully) so you can assume my utter familiarity with the code. I know what I was talking (and I can tell you where CUDA Python can improve) 🙂

Sorry if that came off as blunt. Wanted to be sure we were looking at the same thing as it felt like we were talking past each other

What's the issue that we can't implement this change?

Have included you in an offline discussion where this came up. At this point think it is better to discuss there for a bit before discussing here further

Fixes attempted in rapidsai/integration#695 & rapidsai/integration#697 don't play well with CEC at large, but this repo can pin `cuda-python` more restrictively. This PR pins `cuda-python` to the CUDA `major.minor.*` version. See also conda-forge/cuda-python-feedstock#66 Authors: - Ray Douglass (https://github.com/raydouglass) Approvers: - Bradley Dice (https://github.com/bdice) - Jake Awe (https://github.com/AyodeAwe) URL: #624

bdice added the bug Something isn't working label Jan 30, 2024

bdice mentioned this issue Jan 30, 2024

Add support for CUDA Enhanced Compatibility. #67

Merged

5 tasks

This was referenced Jan 31, 2024

Pin cuda-python=12.0.* for CUDA 12 builds. rapidsai/integration#697

Closed

Pin cuda-python rapidsai/docker#624

Merged

github-actions bot closed this as completed in #67 Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relax cuda-python run dependencies #66

Relax cuda-python run dependencies #66

bdice commented Jan 30, 2024

leofang commented Jan 30, 2024

jakirkham commented Jan 31, 2024

leofang commented Jan 31, 2024

jakirkham commented Jan 31, 2024

leofang commented Jan 31, 2024

jakirkham commented Jan 31, 2024 •

edited

Loading

leofang commented Jan 31, 2024

jakirkham commented Jan 31, 2024

Relax cuda-python run dependencies #66

Relax cuda-python run dependencies #66

Comments

bdice commented Jan 30, 2024

Solution to issue cannot be found in the documentation.

Issue

Installed packages

Environment info

leofang commented Jan 30, 2024

jakirkham commented Jan 31, 2024

leofang commented Jan 31, 2024

jakirkham commented Jan 31, 2024

leofang commented Jan 31, 2024

jakirkham commented Jan 31, 2024 • edited Loading

leofang commented Jan 31, 2024

jakirkham commented Jan 31, 2024

jakirkham commented Jan 31, 2024 •

edited

Loading