Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding cuda-cudart #21723

Merged
merged 45 commits into from
Apr 12, 2023
Merged

Adding cuda-cudart #21723

merged 45 commits into from
Apr 12, 2023

Conversation

adibbley
Copy link
Contributor

@adibbley adibbley commented Jan 13, 2023

Checklist

  • Title of this PR is meaningful: e.g. "Adding my_nifty_package", not "updated meta.yaml".
  • License file is packaged (see here for an example).
  • Source is from official source.
  • Package does not vendor other packages. (If a package uses the source of another package, they should be separate packages or the licenses of all packages need to be packaged).
  • If static libraries are linked in, the license of the static library is packaged.
  • Package does not ship static libraries. If static libraries are needed, follow CFEP-18.
  • Build number is 0.
  • A tarball (url) rather than a repo (e.g. git_url) is used in your recipe (see here for more details).
  • GitHub users listed in the maintainer section have posted a comment confirming they are willing to be listed there.
  • When in trouble, please check our knowledge base documentation before pinging a team.

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipes/cuda-cudart) and found some lint.

Here's what I've got...

For recipes/cuda-cudart:

  • The recipe must have some tests.

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipes/cuda-cudart) and found some lint.

Here's what I've got...

For recipes/cuda-cudart:

  • The top level meta keys are in an unexpected order. Expecting ['package', 'source', 'build', 'requirements', 'test', 'outputs', 'about', 'extra'].

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/cuda-cudart) and found it was in an excellent condition.

@jakirkham jakirkham mentioned this pull request Jan 14, 2023
49 tasks
@jakirkham jakirkham mentioned this pull request Jan 14, 2023
10 tasks
@kkraus14
Copy link
Contributor

Also, if the cuda-driver-dev and cuda-cudart-dev packages don't ship the cuda.h and cuda_runtime.h headers respectively, what packages will?

@adibbley
Copy link
Contributor Author

Also, if the cuda-driver-dev and cuda-cudart-dev packages don't ship the cuda.h and cuda_runtime.h headers respectively, what packages will?

The headers weren't being properly copied. This is fixed now, cuda.h and cuda_runtime.h are both in cuda-cudart-dev

@adibbley adibbley marked this pull request as ready for review January 18, 2023 16:10
@jakirkham
Copy link
Member

@conda-forge/core, this is ready for review! This is a runtime package needed by the compiler (cuda-nvcc)

@kkraus14
Copy link
Contributor

It looks like the pkg-config files aren't currently being installed and that they have a hardcoded path of /usr/local/cuda-12.0 in them. Should we be installing them and having them correctly point into the conda-build host environment?

@adibbley
Copy link
Contributor Author

It looks like the pkg-config files aren't currently being installed and that they have a hardcoded path of /usr/local/cuda-12.0 in them. Should we be installing them and having them correctly point into the conda-build host environment?

Great catch, thanks! The pkgconfigs are getting installed now and post-link.sh should be setting the path correctly.

recipes/cuda-cudart/meta.yaml Outdated Show resolved Hide resolved
recipes/cuda-cudart/meta.yaml Outdated Show resolved Hide resolved
recipes/cuda-cudart/meta.yaml Show resolved Hide resolved
recipes/cuda-cudart/meta.yaml Show resolved Hide resolved
Comment on lines 26 to 28
build:
number: 0
skip: true # [osx]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows CI appears to be failing because of overlinking.

conda_build.exceptions.OverLinkingError: overlinking check failed 
['  ERROR (cuda-cudart,Library/bin/cudart64_12.dll): $RPATH/api-ms-win-core-libraryloader-l1-2-0.dll not found in packages, sysroot(s) nor the missing_dso_whitelist.\n.. is this binary repackaging?', '  ERROR (cuda-cudart,Library/bin/cudart64_12.dll): $RPATH/api-ms-win-security-systemfunctions-l1-1-0.dll not found in packages, sysroot(s) nor the missing_dso_whitelist.\n.. is this binary repackaging?']
##[error]Cmd.exe exited with code '1'.

Seems related to the problems mentioned in previous PRs here:

The solution in #21924 was disabling the overlinking check. Is this what we should do, or is there a different approach?

Suggested change
build:
number: 0
skip: true # [osx]
build:
number: 0
skip: true # [osx]
error_overlinking: false # [win]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At one point the recipes in the aforementioned PR had error_overlinking ( 3ed95cf ), but it seems to be reverted later ( 96f6ad0 ). Am not seeing error_overlinking in the final changes in that PR or in the recipes. So think Leo came up with another solution, but it is not obvious what it was based on the commit history of the PR what it was. Might be worth studying the libnvjitlink recipe more closely (since this is where it was needed and later dropped)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I looked over the diff between those commits and compared the requirements of those packages to cuda-cudart but I'm not seeing much that points to the original cause or solution.

Copy link
Member

@jakirkham jakirkham Apr 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it wasn't obvious to me how this was fixed after a cursory look. Though assumed I had just missed something

Let's try merging in main (as we discussed offline) and see if the issue persists. Did this below ( 01063be )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still seeing the same issue after updating the branch. This Microsoft support thread looks relevant

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I do see api-ms-win-core-libraryloader-*.dll in the vc package, but the SONAME(?) appears different. Maybe we are getting the wrong version?

With api-ms-win-security-systemfunctions-*.dll, it doesn't not appear to be in that list. Unclear as to whether it should be (possibly in a newer version?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression was that after I followed the libcublas approach and added compiler to build, I no longer saw warnings on the dlls, so I assumed it's safe to undo the WAR and gave it a shot.

Another issue was that I noticed the internal recipe had a wrong cusparse/nvjitlink version, mismatching the CUDA 12.0 one, so I fixed it and that change might also explain. Not sure which is the real key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Leo! 🙏 Really appreciate you taking time to chime in here 🙂

Yeah the compilers part seems like the key. Did look at that. Though this recipe already seems to include these for multiple packages.

That said, I do see noarch: generic in a few places where we are adding compilers and doing Linux/Windows specific things. Maybe these are holdovers from before we started doing architecture specific things. Not sure if they are causing this problem, but would expect them to cause some issues. So we may want to drop them. Have outlined them in review comments below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this overlinking issue with @jakirkham / @adibbley. We agreed to skip Windows for now, and then work on fixing this problem in the feedstock. That way Linux packages can be unblocked and we can continue on cuda-nvcc while we investigate the Windows overlinking further.

Suggested change
build:
number: 0
skip: true # [osx]
build:
number: 0
skip: true # [osx or win]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised feedstock issue ( conda-forge/cuda-cudart-feedstock#1 ) for further discussion/investigation.

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed noarch: generic is showing up in packages that now have architecture specific steps (Windows/Linux). Maybe these were holdovers before the architecture changes. In any event, think these should now be dropped. It's possible noarch: generic is causing other issues we are seeing as alluded to in a different comment

recipes/cuda-cudart/meta.yaml Show resolved Hide resolved
recipes/cuda-cudart/meta.yaml Show resolved Hide resolved
recipes/cuda-cudart/meta.yaml Show resolved Hide resolved
Comment on lines 26 to 28
build:
number: 0
skip: true # [osx]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Leo! 🙏 Really appreciate you taking time to chime in here 🙂

Yeah the compilers part seems like the key. Did look at that. Though this recipe already seems to include these for multiple packages.

That said, I do see noarch: generic in a few places where we are adding compilers and doing Linux/Windows specific things. Maybe these are holdovers from before we started doing architecture specific things. Not sure if they are causing this problem, but would expect them to cause some issues. So we may want to drop them. Have outlined them in review comments below

@jakirkham
Copy link
Member

JFYI we ran into an issue with Windows packages missing runtime libraries they needed from the system ( #21723 (comment) ). So we have disabled Windows builds for now. This will need to be followed up on in the feedstock

Copy link
Member

@beckermr beckermr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not the best person to comment on the internals here. I assume you all have that covered. I don't see a ton of documentation of the different parts. This is definitely something that should go in the docs or in a comment in the recipe. Otherwise, LGTM!

@jakirkham
Copy link
Member

Thanks Matt! 🙏

We have a doc that we have been working on for tracking how we are updating these libraries and what was needed. Recently we did discuss where we want this information to live long term. Happy to discuss how we might include that info in the conda-forge docs. Raised issue ( conda-forge/conda-forge.github.io#1927 ) for that discussion

@jakirkham
Copy link
Member

Planning on merging EOD tomorrow if no comments

@jakirkham jakirkham merged commit 92896e2 into conda-forge:main Apr 12, 2023
@jakirkham
Copy link
Member

Thanks all! 🙏

Let's follow up on anything else in the feedstock 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

10 participants