-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding cuda-cudart #21723
Adding cuda-cudart #21723
Conversation
Hi! This is the friendly automated conda-forge-linting service. I wanted to let you know that I linted all conda-recipes in your PR ( Here's what I've got... For recipes/cuda-cudart:
|
Hi! This is the friendly automated conda-forge-linting service. I wanted to let you know that I linted all conda-recipes in your PR ( Here's what I've got... For recipes/cuda-cudart:
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
Co-authored-by: jakirkham <[email protected]>
Co-authored-by: jakirkham <[email protected]>
Also, if the |
The headers weren't being properly copied. This is fixed now, |
@conda-forge/core, this is ready for review! This is a runtime package needed by the compiler ( |
It looks like the pkg-config files aren't currently being installed and that they have a hardcoded path of |
Great catch, thanks! The pkgconfigs are getting installed now and |
Co-authored-by: Keith Kraus <[email protected]>
recipes/cuda-cudart/meta.yaml
Outdated
build: | ||
number: 0 | ||
skip: true # [osx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows CI appears to be failing because of overlinking.
conda_build.exceptions.OverLinkingError: overlinking check failed
[' ERROR (cuda-cudart,Library/bin/cudart64_12.dll): $RPATH/api-ms-win-core-libraryloader-l1-2-0.dll not found in packages, sysroot(s) nor the missing_dso_whitelist.\n.. is this binary repackaging?', ' ERROR (cuda-cudart,Library/bin/cudart64_12.dll): $RPATH/api-ms-win-security-systemfunctions-l1-1-0.dll not found in packages, sysroot(s) nor the missing_dso_whitelist.\n.. is this binary repackaging?']
##[error]Cmd.exe exited with code '1'.
Seems related to the problems mentioned in previous PRs here:
- Add
libcusolver
,libcusparse
, andlibnvjitlink
recipes #21924 (comment) - Add
libcusolver
,libcusparse
, andlibnvjitlink
recipes #21924 (comment) - Add
windows-terminal
#21913
The solution in #21924 was disabling the overlinking check. Is this what we should do, or is there a different approach?
build: | |
number: 0 | |
skip: true # [osx] | |
build: | |
number: 0 | |
skip: true # [osx] | |
error_overlinking: false # [win] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At one point the recipes in the aforementioned PR had error_overlinking
( 3ed95cf ), but it seems to be reverted later ( 96f6ad0 ). Am not seeing error_overlinking
in the final changes in that PR or in the recipes. So think Leo came up with another solution, but it is not obvious what it was based on the commit history of the PR what it was. Might be worth studying the libnvjitlink
recipe more closely (since this is where it was needed and later dropped)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I looked over the diff between those commits and compared the requirements of those packages to cuda-cudart
but I'm not seeing much that points to the original cause or solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it wasn't obvious to me how this was fixed after a cursory look. Though assumed I had just missed something
Let's try merging in main
(as we discussed offline) and see if the issue persists. Did this below ( 01063be )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still seeing the same issue after updating the branch. This Microsoft support thread looks relevant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I do see api-ms-win-core-libraryloader-*.dll
in the vc
package, but the SONAME(?) appears different. Maybe we are getting the wrong version?
With api-ms-win-security-systemfunctions-*.dll
, it doesn't not appear to be in that list. Unclear as to whether it should be (possibly in a newer version?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My impression was that after I followed the libcublas approach and added compiler to build, I no longer saw warnings on the dlls, so I assumed it's safe to undo the WAR and gave it a shot.
Another issue was that I noticed the internal recipe had a wrong cusparse/nvjitlink version, mismatching the CUDA 12.0 one, so I fixed it and that change might also explain. Not sure which is the real key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Leo! 🙏 Really appreciate you taking time to chime in here 🙂
Yeah the compiler
s part seems like the key. Did look at that. Though this recipe already seems to include these for multiple packages.
That said, I do see noarch: generic
in a few places where we are adding compiler
s and doing Linux/Windows specific things. Maybe these are holdovers from before we started doing architecture specific things. Not sure if they are causing this problem, but would expect them to cause some issues. So we may want to drop them. Have outlined them in review comments below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this overlinking issue with @jakirkham / @adibbley. We agreed to skip Windows for now, and then work on fixing this problem in the feedstock. That way Linux packages can be unblocked and we can continue on cuda-nvcc
while we investigate the Windows overlinking further.
build: | |
number: 0 | |
skip: true # [osx] | |
build: | |
number: 0 | |
skip: true # [osx or win] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised feedstock issue ( conda-forge/cuda-cudart-feedstock#1 ) for further discussion/investigation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed noarch: generic
is showing up in packages that now have architecture specific steps (Windows/Linux). Maybe these were holdovers before the architecture changes. In any event, think these should now be dropped. It's possible noarch: generic
is causing other issues we are seeing as alluded to in a different comment
recipes/cuda-cudart/meta.yaml
Outdated
build: | ||
number: 0 | ||
skip: true # [osx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Leo! 🙏 Really appreciate you taking time to chime in here 🙂
Yeah the compiler
s part seems like the key. Did look at that. Though this recipe already seems to include these for multiple packages.
That said, I do see noarch: generic
in a few places where we are adding compiler
s and doing Linux/Windows specific things. Maybe these are holdovers from before we started doing architecture specific things. Not sure if they are causing this problem, but would expect them to cause some issues. So we may want to drop them. Have outlined them in review comments below
Co-authored-by: Bradley Dice <[email protected]>
Fix cuda-cudart tests.
Co-authored-by: Bradley Dice <[email protected]>
JFYI we ran into an issue with Windows packages missing runtime libraries they needed from the system ( #21723 (comment) ). So we have disabled Windows builds for now. This will need to be followed up on in the feedstock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not the best person to comment on the internals here. I assume you all have that covered. I don't see a ton of documentation of the different parts. This is definitely something that should go in the docs or in a comment in the recipe. Otherwise, LGTM!
Thanks Matt! 🙏 We have a doc that we have been working on for tracking how we are updating these libraries and what was needed. Recently we did discuss where we want this information to live long term. Happy to discuss how we might include that info in the conda-forge docs. Raised issue ( conda-forge/conda-forge.github.io#1927 ) for that discussion |
Planning on merging EOD tomorrow if no comments |
Thanks all! 🙏 Let's follow up on anything else in the feedstock 🙂 |
Checklist
url
) rather than a repo (e.g.git_url
) is used in your recipe (see here for more details).