-
Notifications
You must be signed in to change notification settings - Fork 27.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(DPT,Depth-Anything) torch.export
#34103
fix(DPT,Depth-Anything) torch.export
#34103
Conversation
aa7d562
to
2e15b57
Compare
torch.export
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thanks for unlocking more models for torch export, this is very valuable!
The same comment as for Mask2Former PR, would be great to have this PR tested, and please push run-slow commit to trigger all tests at the end!
torch.export
torch.export
2e15b57
to
4c65b67
Compare
I've added Depth-anything to this PR, I'm not entirely sure if I've triggered the run_slow test for it and DPT correctly. Happy to split it off into a separate PR. Also ran into an issue with zoedepth not working because of the beit backend. I suspect that will take more time to properly addres, I added a skipped test to zoedepth, but I can also remove that test entirely and add it in a WIP PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! Regarding ZoeDepth I suggest either fixing the model export or excluding this model from the PR. A skipped test is not the best solution, cause it might stuck in this state for a very long time 😄
To trigger multiple models' slow tests you can list them as follows [run_slow] depth_anything, dpt, zoedepth
I have to add some of the model changes because of the copy-consistency check, but I'll remove the Relu change and the torch.export test! Thanks for the heads up on slow tests. |
4c65b67
to
bc0633a
Compare
@qubvel could you approve the slow workflow? |
|
I'm not 100% sure that this is part of the CI, but the contributing guide asks you to run repo-consistency Line 217 in bc0633a
which throws an error in So basically I have to include those shared changes: https://github.com/huggingface/transformers/pull/34103/files#diff-02337c86e3fba49173cf2cb6fa1595ed168db19726938aec925b8b010a3b6a8c The current crux of ZoeDepth is that the BEIT model, the backbone of all the HF hub models for ZoeDepth, isn't compatible. So you have to address that issue, which I have not had time to address yet. |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
The
The following also has checks on the output of slices and those checks seem to work.
I went ahead and made a PR to try and address this issue: #34518 |
Ahh, ok, it's because model modules contain "Copied from" statements and these parts are synced across models. No worries then! |
Signed-off-by: Phillip Kuznetsov <[email protected]>
Signed-off-by: Phillip Kuznetsov <[email protected]>
Signed-off-by: Phillip Kuznetsov <[email protected]>
Signed-off-by: Phillip Kuznetsov <[email protected]>
Signed-off-by: Phillip Kuznetsov <[email protected]>
Signed-off-by: Phillip Kuznetsov <[email protected]>
Signed-off-by: Phillip Kuznetsov <[email protected]>
fd1b352
to
c7ddd0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating it!
cc @guangy10
fused_hidden_state = None | ||
for hidden_state, layer in zip(hidden_states, self.layers): | ||
if fused_hidden_state is None: | ||
# first layer only uses the last hidden_state | ||
fused_hidden_state = layer(hidden_state) | ||
else: | ||
fused_hidden_state = layer(fused_hidden_state, hidden_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment for final review:
This change included in the ZoeDepth model because of the "Copied from" statement, it doesn't unlock torch export for the model, however will be useful if we decide to enable it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the contribution.
Extend the script in pytorch/executorch#6509 to support lowering (with the simplest recipe) the The |
Any luck on the compiler pass? Also do you think that this gates the support for |
@philkuz Sorry, haven't got a chance to write the pass yet.
Right, it's ExecuTorch specific, i.e. all tensors need to be contiguous. BTW, do you happen to know where the channel_last tensor may come from the eager, we can fix it here, otherwise, having a separate PR for ExecuTorch is fine. Please note that unlike compiled artifact the exported program is just an intermediate representation, typically should only being used as the entry for further optimizations, i.e. ExecuTorch. |
I did a very quick scan for the channel_last tensor, and I believe it's in DINOv2 (the backbone) which is not part of this particular modeling code. I think we should move it to another PR IMO. |
Any other block for merging this PR? |
@guangy10 no blocks IMO, waiting for @ArthurZucker's review, he has quite a few in line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great contribution! Thanks all for iterating 🤗
Super good in general as slicing does not go well with compile either!
Sorry for the delay @guangy10 we were on a company wide offsite! 🌴 |
What does this PR do?
Small modification of the DPT modeling code to remove a new object creation in a
forward()
method of a Module. This object creation makes the model incompatible withtorch.export
, which is a key part of preparing a model to run on a variety of hardware backends through projects such as ExecuTorch (related issue: #32253)Motivation
torch.export allows you to export PyTorch models into standardized model representations, intended to be optimized and run efficiently using frameworks such as TensorRT or ExecuTorch.
The Bug
They key issue was the slice on
self.layers
:transformers/src/transformers/models/dpt/modeling_dpt.py
Line 696 in 617b212
self.layers[1:]
creates a newModuleList()
each time this line is executed.https://github.com/pytorch/pytorch/blob/69bcf1035e7f06f2eefd8986d000cc980e9ebd37/torch/nn/modules/container.py#L330
The model tracer in
torch.export
monkey-patches nn.Module constructors during evaluation of theforward()
pass, so the original DPT modeling code raises the following error:The Solution
Pytorch recommends users update the modeling code. My team and I figured this could be helpful to the broader community, especially a future where Export to Executorch becomes more widely available: #32253
This also removes an unnecessary creation of a new module list as a bonus.
Tests
I ensured that
tests/models/dpt/test_modeling_dpt.py
passes, which appears to test a portion of the outputs. I also verified that the entire output of the modelbefore and after my changes matched with the following script:
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@amyeroberts, @qubvel