-
-
Notifications
You must be signed in to change notification settings - Fork 16.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export with ONNX Simplifier with --grid error #2558
Comments
👋 Hello @antlamon, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution. If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you. If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available. For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at [email protected]. RequirementsPython 3.8 or later with all requirements.txt dependencies installed, including $ pip install -r requirements.txt EnvironmentsYOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit. |
@antlamon thanks for the bug report. We don't generally provide support for code customizations and external package not in requirements.txt. If an external package is causing an error you may also want to raise an issue with the package authors. |
I would like to add that without any modifications to export.py, the --grid option also results in an unusable .onnx file when ran on the yolov5s.pt model. It works fine without the --grid option During the running of the script, the following warning was produced (By torchscipt, not onnx though):
Attempting to run a inference session results in
|
This error occurred to me when exporting the onnx model using torch==1.8.1 with torchvision==0.9.1. When i export using torch==1.7.1, the loading of the onnx model works fine in both torch==1.7.1 and torch==1.8.1. |
Thank you for pointing that out. That indeed was the issue |
Will there be a fix, since torchvision==0.8.2 (required by torch 1.7.1) doesn't exist for windows? |
I am also getting the same error,downgrading torch and torchvision versions didn't help me out to fix this issue. |
When I downgrade the pytorch version and export with --dynamic --grid, I can load the model, but it fails when doing inference on a (1, 3, 1088, 1920) tensor with this: 2021-04-15 20:26:39.079097920 [E:onnxruntime:, sequential_executor.cc:339 Execute] Non-zero status code returned while running Add node. Name:'Add_945' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/math/element_wise_ops.h:487 void onnxruntime::BroadcastIterator::Append(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 34 by 60 Traceback (most recent call last): It does work with just --dynamic. I've seen some approaches where people re-implement the last layer with onnx. I guess that's probably the best approach for now. |
@antlamon @Lucashsmello @thestonehead @timstokman we've integrated onnx-simplifier into export.py now in ONNX Simplifier PR #2815 and verified it's passing CI on all operating systems. I'm not sure if this resolves the original issue, but hopefully it's a step in the right direction. |
@glenn-jocher Unfortunately, the problem still persists: I am using the docker image (version v5.0) and |
@glenn-jocher I'm seeing the same issues, both --grid and --dynamic don't work with the simplifier. --grid export only seems to work in a few cases, even without the simplifier. I made a pull request for the "--dynamic" export issue: #2856 |
@timstokman thanks for the PR, I'll take a look over there! |
To give a reproduction of the grid export issue now that the PR is merged:
Without --simplify the model simply can't be loaded by the runtime. It looks like the last layer has incompatible dimensions when exported. |
@timstokman hmm, so the onnx runtime only succeeds with a --simplify model, but --simplify fails when --grid is also used? |
@glenn-jocher They both fail:
The root cause is in how the last layer is exported seemingly. Some sort of tensor dimension mismatch. |
@glenn-jocher I managed to make simplify with grid work by rolling back pytorch to 1.8 (1.9 used in the latest docker image did not work, I don't know what happens if installed on host OS, not in docker) Perhaps it's ONNX version that causes the issue? In the older yolov5 image (v4.0) it is 1.7.0 AFAIR |
@piotlinski I used the latest version, and the one you suggested. With pytorch 1.8 it works with the default options, but as soon as you use --dynamic or --img-size it stops working. With the latest version, it doesn't work at all. |
@timstokman interesting, I tried pytorch 1.8 and can set img-size, (did not try dynamic though). I use the older version, where simplifier is always run. (the log says YOLOv5 v4.0, but I manually check out a newer commit)
EDIT: with
|
@piotlinski Update to the latest yolo version to fix the error with |
@timstokman no error with
when running with
|
Looks like the docker image also has different versions of onnx and onnx-simplifier. Maybe the requirements.txt of the yolo project needs to start pinning a few versions for this to work reliably. @piotlinski Can you actually do inference with the exported model? |
@timstokman the ones exported earlier (without the |
I change this line in # y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * torch.tensor(self.anchor_grid[i].tolist()).float() # wh |
I found that the cause of the bug is the inconsistent behavior of the [i] symbol in pytorch and onnx. the shape of
|
btw, the exported onnx cannot be converted to tensorrt engine because subscript assignments generate unsupported ScatterND nodes. I rewrite the code to avoid generating ScatterND # y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i] # xy
# y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh
# z.append(y.view(bs, -1, self.no))
xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i] # xy
wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(bs, self.na, 1, 1, 2) # wh
rest = y[..., 4:]
yy = torch.cat((xy, wh, rest), -1)
z.append(yy.view(bs, -1, self.no)) |
@jylink Tried your code, exporting works fine now, when I try to use dynamic axes it still seems to fail when running the model:
Here I tried a tensor of 1088x1920x3 as input (stride 32 padded) for an image that was originally 1080x1920x3. When using a fully padded tensor, 1920x1920x3, the predict layer does seem to work correctly, so this is a big improvement. I suggest you create a pull request for it. Personally I still can't use --grid exports, dynamic axes gives me an almost 2x speed improvement and helps with CUDA memory usage. |
@timstokman Hi, I found that the # model/yolo.py
class Detect(nn.Module):
stride = None # strides computed during build
export = False # onnx export
dynamic = False # <--NEW
...
if not self.training: # inference
if self.dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]: # <--NEW
self.grid[i] = self._make_grid(nx, ny).to(x[i].device)
y = x[i].sigmoid()
xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i] # xy
wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(bs, self.na, 1, 1, 2) # wh
rest = y[..., 4:]
y_ = torch.cat((xy, wh, rest), -1)
z.append(y_.view(bs, -1, self.no))
# model/export.py
model.model[-1].export = not opt.grid # set Detect() layer grid export
model.model[-1].dynamic = opt.dynamic # <--NEW
for _ in range(2):
y = model(img) # dry runs Test: # gen onnx
!python models/export.py --img 352 608 --batch 1 --dynamic --grid --simplify --weights weights/best.pt
# onnxruntime
sess = rt.InferenceSession('weights/best.onnx')
input_name = sess.get_inputs()[0].name
output_name = []
for output in sess.get_outputs():
output_name.append(output.name)
for i in range(-5, 5):
input = np.random.rand(1, 3, 608 + 32 * i, 608).astype(np.float32)
pred = sess.run(output_name, {input_name: input})
input = np.random.rand(1, 3, 608, 608 + 32 * i).astype(np.float32)
pred = sess.run(output_name, {input_name: input}) |
Yes, that fixes all the issues for me. Outputs seem exactly the same, with and without dynamic, and it works for different image sizes. Guess I can throw away my own numpy implementation of the detect layer. It also fixes the framework version compatibility issues. To me, the implementation seems good. Pull request time? @glenn-jocher Looks like this fixes the remaining options with "--grid". |
PR #2982 |
@antlamon @tommy2is @timstokman good news 😃! Your original issue may now been fixed ✅ in merged PR #2982 by @jylink. To receive this update you can:
Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
so the bug is in |
use torch=1.7.1 instead see ultralytics/yolov5#2558, maybe related
🐛 Bug
An exported model as ONNX using --grid parameter cannot be used by onnx-runtime or simplified by onnx-simplifier
A Mul Node triggers a shape inference error Incompatible dimensions
To Reproduce
Replace ONNX export in export.py with this code and run with command
python3 models/export.py --grid
Output:
Expected behavior
Any yolov5 model exported as ONNX should be valid
The text was updated successfully, but these errors were encountered: