-
Notifications
You must be signed in to change notification settings - Fork 23k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump dlpack.h to latest version #65047
Conversation
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 4ef92f4 (more details on the Dr. CI page):
🕵️ 2 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages: pytorch_xla_linux_bionic_py3_6_clang9_build (1/2)Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I wonder if we should add the supported version somewhere in the documentation?
Right now that doesn't seem so useful, because it is not possible to tell from the DLPack version itself whether there's an issue or not. Right now it should be possible to mix one library which is on version
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too, thanks @t-vi!
The one CI failure seems unrelated.
Codecov Report
@@ Coverage Diff @@
## master #65047 +/- ##
==========================================
+ Coverage 66.37% 66.39% +0.01%
==========================================
Files 739 725 -14
Lines 94299 93457 -842
==========================================
- Hits 62595 62047 -548
+ Misses 31704 31410 -294 |
@mruberry would you be able to land this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stamped! Thank you @t-vi, @rgommers, @emcastillo!
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@mruberry was there a hiccup with landing this? I'd like to work on a follow-up PR, that's easier to do if this is in so I don't have to do "manual stacking". |
Yes, sorry, there's a mystery internal issue we need to debug. I'll try to get someone on it soon. |
caffe2/python/pybind_state_dlpack.h
Outdated
@@ -28,7 +28,7 @@ class DLPackWrapper { | |||
: tensor(tensor), device_option(device_option) {} | |||
|
|||
py::object data() { | |||
DLContext tensor_context; | |||
DLDevice tensor_context; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The internal error is
caffe2/python/pybind_state_dlpack.h:31:5: error: unknown type name 'DLDevice'; did you mean 'Device'?
DLDevice tensor_context;
which I don't understand.
@t-vi, what would happen if we didn't update Caffe2 and just updated ATen/dlpack.h?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My interest is exclusively with PyTorch /ATen, so I neither know nor could I say whether any particular outcome was good or bad. I was more concerned that you might have internal interfaces between them and caffe2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind creating a PyTorch-exclusive version of this PR I can try?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the changes in caffe2/ (and rebased to master from a few days ago).
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow For more information, please take a look at the CI Flow Wiki. |
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Looks like the PyTorch-only version worked! |
Thank you for merging @mruberry! |
This is causing some internal build failures that weren't immediately detected, unlanding and reopening while we investigate. |
So what is the perspective? If the PR is doomed to rot, I would prefer closing it. |
I don't think we're at "doomed to rot" yet, but I understand and appreciate your perspective. I (or someone else with internal FB access) is going to have to do some digging for a bit. Let's give that a chance to happen. |
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
So here's what I think is happening. Within Facebook there are multiple systems that depend on DLPack. This includes PyTorch, Caffe2 and TVM. Somehow when we're updating PyTorch's DLPack header I think the issue we're hitting is that PyTorch files are still being compiled against the dlpack headers found in those other frameworks. Here's the DLPack header used by the internal version of TVM:
So when we see errors like
I think that's because DLDevice isn't in the above header! This is just a guess because what the heck C++-based build systems. Maybe we should switch to Rust. @t-vi is the v60 version of dlpack backwards compatible? I could try updating all our internal dlpack headers to resolve this issue. Alternatively I could look at updating all the dlpack headers to v60 and modifying the files that depend on them. |
I have no idea. It's not API compatible, as we have seen. I think it might be ABI compatible as it's mostly renaming things. @tqchen , my apologies for dragging you into this, but I wonder if we could benefit from your wisdom here w.r.t. upgrade paths for projects using dlpack. |
It should be ABI compatible back to at least DLPack
Hmm, that does |
To add to that: this was ensured not only in PR reviews, but at least CuPy went through this upgrade cycle and didn't have any problems. There are no ABI breakage issues on the DLPack repo. And the release note for v0.2 says it's compatible with v0.1: dmlc/dlpack#23 (comment) |
Sorry for being late to the thread @rgommers is right that the change is ABI compatible, so the same compiled DLPack symbol should be compatible to every other versions after v0.2. ABI compatibility means that different frameworks can use different versions of DLPack but still correctly exchange |
Given the complications inside facebook, I don't think I am the best person to propose this change, so I would have an inclination to close this PR. |
Either of these should work. They're also very similar - the former solution is basically a partial version of the latter. Is it possible to update to v60 for all these in sync internally @mruberry? If the changes land sequentially, then it seems like you're going to have failures somewhere at some point. |
@mruberry has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Thank you @mruberry ! |
Thanks for your patience, @t-vi, and sorry this took so long. I had to update dlpack.h throughout Meta. |
Summary: Fixes pytorch/pytorch#64995 Pull Request resolved: pytorch/pytorch#65047 Reviewed By: VitalyFedyunin Differential Revision: D32468916 Pulled By: mruberry fbshipit-source-id: 3e0a17a3a264a77956ea7b795bd472c6fc79566c (cherry picked from commit bd480b9)
Summary: Fixes pytorch/pytorch#64995 Pull Request resolved: pytorch/pytorch#65047 Reviewed By: VitalyFedyunin Differential Revision: D32468916 Pulled By: mruberry fbshipit-source-id: 3e0a17a3a264a77956ea7b795bd472c6fc79566c (cherry picked from commit bd480b9)
Summary: Fixes pytorch/pytorch#64995 Pull Request resolved: pytorch/pytorch#65047 Reviewed By: VitalyFedyunin Differential Revision: D32468916 Pulled By: mruberry fbshipit-source-id: 3e0a17a3a264a77956ea7b795bd472c6fc79566c (cherry picked from commit bd480b9)
Summary: Fixes pytorch/pytorch#64995 Pull Request resolved: pytorch/pytorch#65047 Reviewed By: VitalyFedyunin Differential Revision: D32468916 Pulled By: mruberry fbshipit-source-id: 3e0a17a3a264a77956ea7b795bd472c6fc79566c (cherry picked from commit bd480b9)
Fixes #64995