Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] ONNX export failed on adaptive_avg_pool2d at tensorrt micro bench. #352

Open
LeiWang1999 opened this issue May 7, 2022 · 1 comment
Assignees

Comments

@LeiWang1999
Copy link

I am currently working on the superbench/superbench:v0.4.0-cuda11.1.1 docker workspace to measure benchmark.

To get different model's benchmark with tensorrt, I customized the superbenchmark/examples/benchmarks/tensorrt_inference_performance.py like below

# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

"""Micro benchmark example for TensorRT inference performance.

Commands to run:
    python3 examples/benchmarks/tensorrt_inference_performance.py
"""
import sys
from statistics import mode
from superbench.benchmarks import BenchmarkRegistry, Platform
from superbench.common.utils import logger

if __name__ == '__main__':
    batch = int(sys.argv[1])
    model = sys.argv[2]
    precision = sys.argv[3]
    parameters = '--batch_size {0} --pytorch_models {1} --precision {2} --seq_length 8 --iterations 105'.format(batch, model, precision)

    context = BenchmarkRegistry.create_benchmark_context('tensorrt-inference', platform=Platform.CUDA, parameters=parameters)
    benchmark = BenchmarkRegistry.launch_benchmark(context)
    if benchmark:
        logger.info(
            'benchmark: {}, return code: {}, result: {}'.format(
                benchmark.name, benchmark.return_code, benchmark.result
            )
        )

execution:

nvprof --log-file benches/TensorRT/vgg11/fp32_batch_1_prof.txt /opt/conda/bin/python /opt/superbench/examples/benchmarks/tensorrt_inference_performance.py 1 vgg11 fp32 | tee benches/TensorRT/vgg11/fp32_batch_1_time.txt

log :

root@616b67a69ab7:/opt/superbench# nvprof --log-file benches/TensorRT/vgg11/fp32_batch_1_prof.txt /opt/conda/bin/python /opt/superbench/examples/benchmarks/tensorrt_inference_performance.py 1 vgg11 fp32 | tee benches/TensorRT/vgg11/fp32_batch_1_time.txt
/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:256: UserWarning: `add_node_names' can be set to True only when 'operator_export_type' is `ONNX`. Since 'operator_export_type' is not set to 'ONNX', `add_node_names` argument will be ignored.
warnings.warn("`{}' can be set to True only when 'operator_export_type' is "
/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:256: UserWarning: `do_constant_folding' can be set to True only when 'operator_export_type' is `ONNX`. Since 'operator_export_type' is not set to 'ONNX', `do_constant_folding` argument will be ignored.
warnings.warn("`{}' can be set to True only when 'operator_export_type' is "
/opt/conda/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py:182: UserWarning: ONNX export failed on adaptive_avg_pool2d because input size not accessible not supported
warnings.warn("ONNX export failed on " + op + " because " + msg + " not supported")
[2022-05-06 12:33:25,995 616b67a69ab7:18330][micro_base.py:167][INFO] Execute command - round: 0, benchmark: tensorrt-inference, command: /opt/tensorrt/bin/trtexec --onnx=/root/.cache/torch/hub/onnx/vgg11.onnx --explicitBatch --optShapes=input:1x3x224x224 --workspace=8192 --iterations=105 --percentile=99.
[2022-05-06 12:33:40,844 616b67a69ab7:18330][micro_base.py:176][ERROR] Microbenchmark execution failed - round: 0, benchmark: tensorrt-inference, error message: &&&& RUNNING TensorRT.trtexec # /opt/tensorrt/bin/trtexec --onnx=/root/.cache/torch/hub/onnx/vgg11.onnx --explicitBatch --optShapes=input:1x3x224x224 --workspace=8192 --iterations=105 --percentile=99
[05/06/2022-12:33:26] [I] === Model Options ===
[05/06/2022-12:33:26] [I] Format: ONNX
[05/06/2022-12:33:26] [I] Model: /root/.cache/torch/hub/onnx/vgg11.onnx
[05/06/2022-12:33:26] [I] Output:
[05/06/2022-12:33:26] [I] === Build Options ===
[05/06/2022-12:33:26] [I] Max batch: explicit
[05/06/2022-12:33:26] [I] Workspace: 8192 MiB
[05/06/2022-12:33:26] [I] minTiming: 1
[05/06/2022-12:33:26] [I] avgTiming: 8
[05/06/2022-12:33:26] [I] Precision: FP32
[05/06/2022-12:33:26] [I] Calibration:
[05/06/2022-12:33:26] [I] Refit: Disabled
[05/06/2022-12:33:26] [I] Safe mode: Disabled
[05/06/2022-12:33:26] [I] Save engine:
[05/06/2022-12:33:26] [I] Load engine:
[05/06/2022-12:33:26] [I] Builder Cache: Enabled
[05/06/2022-12:33:26] [I] NVTX verbosity: 0
[05/06/2022-12:33:26] [I] Tactic sources: Using default tactic sources
[05/06/2022-12:33:26] [I] Input(s)s format: fp32:CHW
[05/06/2022-12:33:26] [I] Output(s)s format: fp32:CHW
[05/06/2022-12:33:26] [I] Input build shape: input=1x3x224x224+1x3x224x224+1x3x224x224
[05/06/2022-12:33:26] [I] Input calibration shapes: model
[05/06/2022-12:33:26] [I] === System Options ===
[05/06/2022-12:33:26] [I] Device: 0
[05/06/2022-12:33:26] [I] DLACore:
[05/06/2022-12:33:26] [I] Plugins:
[05/06/2022-12:33:26] [I] === Inference Options ===
[05/06/2022-12:33:26] [I] Batch: Explicit
[05/06/2022-12:33:26] [I] Input inference shape: input=1x3x224x224
[05/06/2022-12:33:26] [I] Iterations: 105
[05/06/2022-12:33:26] [I] Duration: 3s (+ 200ms warm up)
[05/06/2022-12:33:26] [I] Sleep time: 0ms
[05/06/2022-12:33:26] [I] Streams: 1
[05/06/2022-12:33:26] [I] ExposeDMA: Disabled
[05/06/2022-12:33:26] [I] Data transfers: Enabled
[05/06/2022-12:33:26] [I] Spin-wait: Disabled
[05/06/2022-12:33:26] [I] Multithreading: Disabled
[05/06/2022-12:33:26] [I] CUDA Graph: Disabled
[05/06/2022-12:33:26] [I] Separate profiling: Disabled
[05/06/2022-12:33:26] [I] Skip inference: Disabled
[05/06/2022-12:33:26] [I] Inputs:
[05/06/2022-12:33:26] [I] === Reporting Options ===
[05/06/2022-12:33:26] [I] Verbose: Disabled
[05/06/2022-12:33:26] [I] Averages: 10 inferences
[05/06/2022-12:33:26] [I] Percentile: 99
[05/06/2022-12:33:26] [I] Dump refittable layers:Disabled
[05/06/2022-12:33:26] [I] Dump output: Disabled
[05/06/2022-12:33:26] [I] Profile: Disabled
[05/06/2022-12:33:26] [I] Export timing to JSON file:
[05/06/2022-12:33:26] [I] Export output to JSON file:
[05/06/2022-12:33:26] [I] Export profile to JSON file:
[05/06/2022-12:33:26] [I]
[05/06/2022-12:33:26] [I] === Device Information ===
[05/06/2022-12:33:26] [I] Selected Device: NVIDIA Tesla V100-PCIE-16GB
[05/06/2022-12:33:26] [I] Compute Capability: 7.0
[05/06/2022-12:33:26] [I] SMs: 80
[05/06/2022-12:33:26] [I] Compute Clock Rate: 1.38 GHz
[05/06/2022-12:33:26] [I] Device Global Memory: 16160 MiB
[05/06/2022-12:33:26] [I] Shared Memory per SM: 96 KiB
[05/06/2022-12:33:26] [I] Memory Bus Width: 4096 bits (ECC enabled)
[05/06/2022-12:33:26] [I] Memory Clock Rate: 0.877 GHz
[05/06/2022-12:33:26] [I]
----------------------------------------------------------------
Input filename: /root/.cache/torch/hub/onnx/vgg11.onnx
ONNX IR version: 0.0.6
Opset version: 10
Producer name: pytorch
Producer version: 1.8
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
[05/06/2022-12:33:40] [W] [TRT] /workspace/TensorRT/parsers/onnx/onnx2trt_utils.cpp:218: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/06/2022-12:33:40] [I] [TRT] /workspace/TensorRT/parsers/onnx/ModelImporter.cpp:139: No importer registered for op: adaptive_avg_pool2d. Attempting to import as plugin.
[05/06/2022-12:33:40] [I] [TRT] /workspace/TensorRT/parsers/onnx/builtin_op_importers.cpp:3716: Searching for plugin: adaptive_avg_pool2d, plugin_version: 1, plugin_namespace:
[05/06/2022-12:33:40] [E] [TRT] INVALID_ARGUMENT: getPluginCreator could not find plugin adaptive_avg_pool2d version 1
While parsing node number 22 [adaptive_avg_pool2d]:
ERROR: /workspace/TensorRT/parsers/onnx/builtin_op_importers.cpp:3718 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
[05/06/2022-12:33:40] [E] Failed to parse onnx file
[05/06/2022-12:33:40] [E] Parsing model failed
[05/06/2022-12:33:40] [E] Engine creation failed
[05/06/2022-12:33:40] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # /opt/tensorrt/bin/trtexec --onnx=/root/.cache/torch/hub/onnx/vgg11.onnx --explicitBatch --optShapes=input:1x3x224x224 --workspace=8192 --iterations=105 --percentile=99
.
[2022-05-06 12:33:40,844 616b67a69ab7:18330][tensorrt_inference_performance.py:23][INFO] benchmark: tensorrt-inference, return code: 32, result: {'return_code': [32]}

It seems that the trt onnx importer can not support the adaptive_avg_pool2d op?

Please cc.

@LeiWang1999
Copy link
Author

I compared the the vgg11 generated by superbench(left) and a vgg net manually converted from pth(right).

I guess the adaptive_avg_pool2d should be converted into global_avg_pool so it can be imported by tensorrt onnx importer.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants