-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
could not support gelu? #776
Comments
It seems like the issue is because Nvidia's PyTorch is using the unmerged pytorch/pytorch#61439 pull request. This means that if you train your model (or at least covert your model to TorchScript) using "regular" (non-nvidia) PyTorch and then try to use it in other Nvidia properties, it does not work. Hopefully, Nvidia will either release a PyTorch version which is compatible with regular PyTorch or the person who is working on that pull request will make sure this pull request isn't breaking "regular" PyTorch. But for now, probably using Nvidia's PyTorch for everything will solve your issue. Hope that helps. |
Many thanks, I'll try to use the same docker to train and inference. |
@daeing Yes. Gelu implementation was changed in the master during 1.10 development I think (bool parameter was added and removed). So please use, the same docker for inference. 21.11 docker container had the bool parameter but the next docker container release 21.12 should have gelu without bool parameter (similar to regular pytorch). |
@peri044 21.12 just released, it doesn't seem like it's fixed. This seems like a major issue because since 21.09 (at least) some models (including Bert models) are not compatible with "regular" PyTorch. All the issues I've seen raised about it are only answered with "try the new release" (in another post someone suggested 21.11 and you suggested 21.12) and it's not even tested that it is actually fixed. Is there any way to re-release 21.12 or maybe a 21.12.1 or whatever to fix this? We run into this issue for a while now and made many hacks to make it work but it really breaks our automatic pipeline and prevents us from releasing new models as much as we would have wanted. Note: I only tested it based on the Pytorch code in Nvidia's Pytorch 21.12. I assume Nvidia compiles the code that is in this image and uses it in all their other products. If my assumption is incorrect, it may be fixed in Nvidia's other 21.12 products but I don't think that would be the case. |
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days |
I use this docker( nvcr.io/nvidia/pytorch:21.11-py3 ) you suggested to test torch-tensorrt, but can not trans pytorch model to torchscript model. It seems like gelu is not support, but I also use this docker (pytorch-20.12-py3) to trans pytorch model to torchscript model, it can work well.
File "/opt/conda/lib/python3.8/site-packages/torch/jit/_serialization.py", line 161, in load
cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
RuntimeError:
Arguments for call are not valid.
The following variants are available:
aten::gelu(Tensor self, bool approximate) -> (Tensor):
Argument approximate not provided.
aten::gelu.out(Tensor self, bool approximate, *, Tensor(a!) out) -> (Tensor(a!)):
Argument approximate not provided.
The original call is:
tools/pytorch2torchscript.py(123): pytorch2libtorch
tools/pytorch2torchscript.py(186):
Serialized File "code/torch/torch/nn/modules/activation.py", line 27
def forward(self: torch.torch.nn.modules.activation.GELU,
argument_1: Tensor) -> Tensor:
return torch.gelu(argument_1)
~~~~~~~~~~ <--- HERE
The text was updated successfully, but these errors were encountered: