-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CuPy #9959
Add CuPy #9959
Conversation
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
@conda-forge/core, this is ready for review! 😄 |
@jakirkham this is awesome! Thanks for working on it. Could you point me to where |
Of course! Happy to help 😄
It needs to be. 😅 This is on my todo list. ( conda-forge/conda-forge.github.io#901 )
Yeah it is pretty new. We plan to advertise it a bit more. Stay tuned. The short answer is we are using NVIDIA-based Docker images that already have the toolchain to do the build.
Agreed! On a different note, would you be interested in helping maintain the CuPy package? 😉 |
Can you change the docker image and make tests pass? We can revert the docker image change before merging. (This is what we did for openmm) |
3607d94
to
a1784b5
Compare
Sure I've pushed a commit to build the package on CUDA 9.2. |
@@ -7,7 +7,7 @@ jobs: | |||
matrix: | |||
linux: | |||
CONFIG: azure-linux-64-comp7 | |||
IMAGE_NAME: condaforge/linux-anvil-comp7 | |||
IMAGE_NAME: condaforge/linux-anvil-cuda:9.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to not have to change this file in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we'll have to use docker on docker to do that. That would also help with testing CentOS7 packages that you wanted to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question: would it imply that all CUDA packages built this way for conda-forge, and the - {{ compiler("cuda") }}
flag, will be pinned at CUDA 9.2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is why we need docs. 😅
So currently conda-forge builds against 3 CUDA versions. As some recipes have a CPU-only option, we support 1 CPU version + 3 CUDA versions in total. These can be seen here (None
is the CPU version). These versions map one-to-one to these Docker images where the build occurs. Each package then gains a dependency on the cudatoolkit
package with the same major minor version.
Now what does this mean for CuPy. This means it will be built against all of the CUDA versions, but no CPU versions. Each build of CuPy will pin to a major minor version of CUDA (so 9.2, 10.0, and 10.1). Users can then constrain which build they get by selecting the version of cudatoolkit
they want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the detailed explanation and the links, @jakirkham! These are very useful to know for building CUDA packages, not just CuPy.
Yes, it's be great to have a central place documenting all these. I've been trying to follow the progress of cudatoolkit for a while, but I completely missed these new developments
Do you know which CUDA 10.1 exactly is in the Docker image? You probably have noticed this new issue: cupy/cupy#2569. CuPy doesn't support 10.1 Update 2 currently.
btw, Nvidia is awful in their versioning -- why not just call it 10.1.2, or even simpler 10.3 (there's already an Update 1, which should have been 10.2)... This versioning semantics might need to be taken care in conda-forge's CUDA stack...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad that helps. Thanks for the feedback. At some point it would be good if you can share your thoughts on what has been confusing and useful as well as what would be valuable to have in the docs. Have been collecting feedback from others in this doc issue ( conda-forge/conda-forge.github.io#901 ). 🙂
Sorry we have not done a great job in publicizing it. Have been working through rough edges with individual contributors to try and smooth out the overall experience. That said, it would be useful to write up (even an incomplete) blogpost about the state of things from a high-level. Thanks for the nudge. 😉
FWICT the image is using CUDA 10.1 Update 2 (nvcc
output below). Thanks for letting me know. IIUC that just affects master
and the latest RC right (asked on the issue too)? In either case will follow that issue closely.
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
Good point. Have also been thinking about this. Could you please raise an issue about more detailed versioning on the nvcc
feedstock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK will do. Also listing version strings here for future ref:
10.1: 10.1.105
10.1 Update 1: 10.1.168
10.1 Update 2: 10.1.243
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ef3bd6f
to
c599579
Compare
b8a3061
to
a10b883
Compare
recipes/cupy/meta.yaml
Outdated
run: | ||
- python | ||
- setuptools | ||
- {{ pin_compatible("cudnn") }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I misunderstood: why just pinned cudnn
but not nccl
? There are incompatibilities even within the v2 series, and I am not sure if the built cupy.cuda.nccl
module would be happy with a different nccl version discovered at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, the CuPy team bundles a specific NCCL version in the CuPy wheel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah not pinning NCCL is dangerous. The NCCL version is determined at built time:
https://github.com/cupy/cupy/blob/b9db374f4d0ccee358a998d7b283248f77cef351/cupy/cuda/cupy_nccl.h#L131-L133
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the plan is to pin that globally conda-forge/conda-forge-pinning-feedstock#309 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @CJ-Wright. I wasn't aware of that discussion.
But still, don't we need a {{ pin_compatible("nccl") }}
here? Once a CuPy package is built, it is supposed to be pinned to the precise NCCL version it was built against. I feel the global pin might be a bit fragile when a newer NCCL comes out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the compiler might warn that the symbol is unused, but I don't think it will just get rid of the symbol altogether. There may be flags to enable this kind of elimination, but am pretty sure it is opt-in. I could be wrong about this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyways we should probably move this discussion to CuPy issue ( cupy/cupy#2568 ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well currently we are using the latest major minor version of NCCL with CuPy. Now that we are aware of this issue, I think we just need to stay on top of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until that issue is resolved, shouldn't you pin exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NCCL issue? We are using the latest major minor version of NCCL and the current release of CuPy does too. So they are not adding any symbols at the moment that will clash with NCCL. Also they've acknowledge the issue upstream and seem willing to work on it. We certainly need to keep an eye on things as new releases come out and adjust our strategy accordingly. That said, I think we are doing all we can with this issue ATM.
@conda-forge/core, would it be possible to get another review of this? I think all comments from the last round were addressed. 🙂 |
See #9959 (comment). |
(responding in that thread :) |
Oh, almost forgot:
I am more than happy to help if needed. I think I know CuPy internal enough. But, I am a bit hesitating to be listed as a maintainer. Last time I worked on a package that has non-Python dependencies conda-forge/pyqt-feedstock#61 it was super painful. I explored in the dark without understanding how conda-build really works (lots of things either undocumented or have special treatments that it's impossible for a new comer to know), nor was I able to successfully debug myself (lack of resource and knowledge) when seeing issues in CI. @isuruf eventually took over and helped finish it (thanks again btw). The time investment is a bit daunting from that experience. |
I think the Qt stack is one of the hardest pieces of our ecosystem. So props for diving in the deep end. Most of the tricky bits that I found from building CuPy were not specific to CuPy, but were more general GPU building things that have been upstreamed into the tooling in conda-forge. At this point, I would argue that building CuPy is not harder than building your average Python C extension (not PyQt 😉). Should add I'm happy to help here too. Am using CuPy regularly in my day-to-day and expect that to continue. So will feel some pressure to keep things working here. Though I think it would be good if some people that are not me become familiar with the CuPy recipe and updating it. Am happy to handhold as we move in that direction. In the end, I think you and other CuPy developers know way more about how CuPy works than I do. So it will be good to transition to having you and other CuPy developers directing how the package looks going forward. We've found this model successful with several packages. 🙂 |
FYI, the CuPy team pins at NCCL 2.4.2.1 for CUDA 9.2: |
It would be good to coordinate, although we also need to think of other downstream consumers of |
We don't currently have a build of NCCL 2.4.2.1. It could be added though. |
Do you know why they chose NCCL 2.4.2.1? |
This sounds against a global pin that was planned...
Wasn't explained here: cupy/cupy-release-tools#29 Probably it was the latest version at that time? |
@kmaehashi would you have time to comment / review on the NCCL pinning issue and the overall CuPy recipe here? Would be great to have an extra pair of eyes, especially someone from the CuPy/Chainer team! |
This is the reason to have a global pin, so that all consumers of |
Oh I see. So keep the pain at the coordination stage. |
Yes, this is also why we're so interested in the exact level of pinning needed. If the pin is very strict then we will need to rebuild all consumers every time a new bug fix release is produced, which could be a lot of overhead for maintainers. |
Changes the image to use the CUDA 9.2 Docker image and constrain the recipe to only build for CUDA 9.2.
a10b883
to
d82b766
Compare
Note this cupy/cupy#2568 (comment):
|
@conda-forge/core, I think things are basically ready to go. Can revert the CI hack once we are ready to merge. What do people think? |
Oh sorry. I thought I posted that comment, but I guess that didn't happen. Thanks for raising @leofang. Currently we are just using the latest version of cuDNN at all times. So this shouldn't cause the same problem. |
This reverts commit d82b766.
Have gone ahead and done this. Can readd if further iteration is needed. |
Seems like this is failing on linux |
Right that's because we don't have GPU builds on staged-recipes as discussed earlier in the thread. Also as noted I've removed the hack that enabled one GPU build. If you are interested in the CI status of the recipe, please look one commit prior. 🙂 |
@jakirkham add me to the maintainer. I'd feel less nervous if I could work with someone knowledgable like you 🙂 |
@jakirkham btw see my new question in #9959 (comment):
|
81741d3
to
2d954f3
Compare
Thanks Leo! Would be great to have you. Added you in the latest commit. 🙂
Ah sorry! Missed that one before. Have responded in ( #9959 (comment) ). |
Thanks all for the reviews and feedback here! Am really happy with how much nicer this recipe turned out as a result. 🎉 😄 |
|
||
build: | ||
number: 0 | ||
skip: True # [not (linux64 and cuda_compiler_version != "None")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using De Morgan's Laws, this skip statement could be made easier to read:
[(not linux64) or cuda_compiler_version == "None"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A PR to the feedstock would be welcome. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closes #8638
Based off PR ( #8638 ), this adds the
cupy
package. Tidied the PR a bit. Also dropped some changes tostaged-recipes
that are not needed.Checklist
url
) rather than a repo (e.g.git_url
) is used in your recipe (see here for more details)