We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I am currently working on the superbench/superbench:v0.4.0-cuda11.1.1 docker workspace to measure benchmark.
To get different model's benchmark with tensorrt, I customized the superbenchmark/examples/benchmarks/tensorrt_inference_performance.py like below
superbenchmark/examples/benchmarks/tensorrt_inference_performance.py
# Copyright (c) Microsoft Corporation. # Licensed under the MIT license. """Micro benchmark example for TensorRT inference performance. Commands to run: python3 examples/benchmarks/tensorrt_inference_performance.py """ import sys from statistics import mode from superbench.benchmarks import BenchmarkRegistry, Platform from superbench.common.utils import logger if __name__ == '__main__': batch = int(sys.argv[1]) model = sys.argv[2] precision = sys.argv[3] parameters = '--batch_size {0} --pytorch_models {1} --precision {2} --seq_length 8 --iterations 105'.format(batch, model, precision) context = BenchmarkRegistry.create_benchmark_context('tensorrt-inference', platform=Platform.CUDA, parameters=parameters) benchmark = BenchmarkRegistry.launch_benchmark(context) if benchmark: logger.info( 'benchmark: {}, return code: {}, result: {}'.format( benchmark.name, benchmark.return_code, benchmark.result ) )
execution:
nvprof --log-file benches/TensorRT/vgg11/fp32_batch_1_prof.txt /opt/conda/bin/python /opt/superbench/examples/benchmarks/tensorrt_inference_performance.py 1 vgg11 fp32 | tee benches/TensorRT/vgg11/fp32_batch_1_time.txt
log :
root@616b67a69ab7:/opt/superbench# nvprof --log-file benches/TensorRT/vgg11/fp32_batch_1_prof.txt /opt/conda/bin/python /opt/superbench/examples/benchmarks/tensorrt_inference_performance.py 1 vgg11 fp32 | tee benches/TensorRT/vgg11/fp32_batch_1_time.txt /opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:256: UserWarning: `add_node_names' can be set to True only when 'operator_export_type' is `ONNX`. Since 'operator_export_type' is not set to 'ONNX', `add_node_names` argument will be ignored. warnings.warn("`{}' can be set to True only when 'operator_export_type' is " /opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:256: UserWarning: `do_constant_folding' can be set to True only when 'operator_export_type' is `ONNX`. Since 'operator_export_type' is not set to 'ONNX', `do_constant_folding` argument will be ignored. warnings.warn("`{}' can be set to True only when 'operator_export_type' is " /opt/conda/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py:182: UserWarning: ONNX export failed on adaptive_avg_pool2d because input size not accessible not supported warnings.warn("ONNX export failed on " + op + " because " + msg + " not supported") [2022-05-06 12:33:25,995 616b67a69ab7:18330][micro_base.py:167][INFO] Execute command - round: 0, benchmark: tensorrt-inference, command: /opt/tensorrt/bin/trtexec --onnx=/root/.cache/torch/hub/onnx/vgg11.onnx --explicitBatch --optShapes=input:1x3x224x224 --workspace=8192 --iterations=105 --percentile=99. [2022-05-06 12:33:40,844 616b67a69ab7:18330][micro_base.py:176][ERROR] Microbenchmark execution failed - round: 0, benchmark: tensorrt-inference, error message: &&&& RUNNING TensorRT.trtexec # /opt/tensorrt/bin/trtexec --onnx=/root/.cache/torch/hub/onnx/vgg11.onnx --explicitBatch --optShapes=input:1x3x224x224 --workspace=8192 --iterations=105 --percentile=99 [05/06/2022-12:33:26] [I] === Model Options === [05/06/2022-12:33:26] [I] Format: ONNX [05/06/2022-12:33:26] [I] Model: /root/.cache/torch/hub/onnx/vgg11.onnx [05/06/2022-12:33:26] [I] Output: [05/06/2022-12:33:26] [I] === Build Options === [05/06/2022-12:33:26] [I] Max batch: explicit [05/06/2022-12:33:26] [I] Workspace: 8192 MiB [05/06/2022-12:33:26] [I] minTiming: 1 [05/06/2022-12:33:26] [I] avgTiming: 8 [05/06/2022-12:33:26] [I] Precision: FP32 [05/06/2022-12:33:26] [I] Calibration: [05/06/2022-12:33:26] [I] Refit: Disabled [05/06/2022-12:33:26] [I] Safe mode: Disabled [05/06/2022-12:33:26] [I] Save engine: [05/06/2022-12:33:26] [I] Load engine: [05/06/2022-12:33:26] [I] Builder Cache: Enabled [05/06/2022-12:33:26] [I] NVTX verbosity: 0 [05/06/2022-12:33:26] [I] Tactic sources: Using default tactic sources [05/06/2022-12:33:26] [I] Input(s)s format: fp32:CHW [05/06/2022-12:33:26] [I] Output(s)s format: fp32:CHW [05/06/2022-12:33:26] [I] Input build shape: input=1x3x224x224+1x3x224x224+1x3x224x224 [05/06/2022-12:33:26] [I] Input calibration shapes: model [05/06/2022-12:33:26] [I] === System Options === [05/06/2022-12:33:26] [I] Device: 0 [05/06/2022-12:33:26] [I] DLACore: [05/06/2022-12:33:26] [I] Plugins: [05/06/2022-12:33:26] [I] === Inference Options === [05/06/2022-12:33:26] [I] Batch: Explicit [05/06/2022-12:33:26] [I] Input inference shape: input=1x3x224x224 [05/06/2022-12:33:26] [I] Iterations: 105 [05/06/2022-12:33:26] [I] Duration: 3s (+ 200ms warm up) [05/06/2022-12:33:26] [I] Sleep time: 0ms [05/06/2022-12:33:26] [I] Streams: 1 [05/06/2022-12:33:26] [I] ExposeDMA: Disabled [05/06/2022-12:33:26] [I] Data transfers: Enabled [05/06/2022-12:33:26] [I] Spin-wait: Disabled [05/06/2022-12:33:26] [I] Multithreading: Disabled [05/06/2022-12:33:26] [I] CUDA Graph: Disabled [05/06/2022-12:33:26] [I] Separate profiling: Disabled [05/06/2022-12:33:26] [I] Skip inference: Disabled [05/06/2022-12:33:26] [I] Inputs: [05/06/2022-12:33:26] [I] === Reporting Options === [05/06/2022-12:33:26] [I] Verbose: Disabled [05/06/2022-12:33:26] [I] Averages: 10 inferences [05/06/2022-12:33:26] [I] Percentile: 99 [05/06/2022-12:33:26] [I] Dump refittable layers:Disabled [05/06/2022-12:33:26] [I] Dump output: Disabled [05/06/2022-12:33:26] [I] Profile: Disabled [05/06/2022-12:33:26] [I] Export timing to JSON file: [05/06/2022-12:33:26] [I] Export output to JSON file: [05/06/2022-12:33:26] [I] Export profile to JSON file: [05/06/2022-12:33:26] [I] [05/06/2022-12:33:26] [I] === Device Information === [05/06/2022-12:33:26] [I] Selected Device: NVIDIA Tesla V100-PCIE-16GB [05/06/2022-12:33:26] [I] Compute Capability: 7.0 [05/06/2022-12:33:26] [I] SMs: 80 [05/06/2022-12:33:26] [I] Compute Clock Rate: 1.38 GHz [05/06/2022-12:33:26] [I] Device Global Memory: 16160 MiB [05/06/2022-12:33:26] [I] Shared Memory per SM: 96 KiB [05/06/2022-12:33:26] [I] Memory Bus Width: 4096 bits (ECC enabled) [05/06/2022-12:33:26] [I] Memory Clock Rate: 0.877 GHz [05/06/2022-12:33:26] [I] ---------------------------------------------------------------- Input filename: /root/.cache/torch/hub/onnx/vgg11.onnx ONNX IR version: 0.0.6 Opset version: 10 Producer name: pytorch Producer version: 1.8 Domain: Model version: 0 Doc string: ---------------------------------------------------------------- [05/06/2022-12:33:40] [W] [TRT] /workspace/TensorRT/parsers/onnx/onnx2trt_utils.cpp:218: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [05/06/2022-12:33:40] [I] [TRT] /workspace/TensorRT/parsers/onnx/ModelImporter.cpp:139: No importer registered for op: adaptive_avg_pool2d. Attempting to import as plugin. [05/06/2022-12:33:40] [I] [TRT] /workspace/TensorRT/parsers/onnx/builtin_op_importers.cpp:3716: Searching for plugin: adaptive_avg_pool2d, plugin_version: 1, plugin_namespace: [05/06/2022-12:33:40] [E] [TRT] INVALID_ARGUMENT: getPluginCreator could not find plugin adaptive_avg_pool2d version 1 While parsing node number 22 [adaptive_avg_pool2d]: ERROR: /workspace/TensorRT/parsers/onnx/builtin_op_importers.cpp:3718 In function importFallbackPluginImporter: [8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?" [05/06/2022-12:33:40] [E] Failed to parse onnx file [05/06/2022-12:33:40] [E] Parsing model failed [05/06/2022-12:33:40] [E] Engine creation failed [05/06/2022-12:33:40] [E] Engine set up failed &&&& FAILED TensorRT.trtexec # /opt/tensorrt/bin/trtexec --onnx=/root/.cache/torch/hub/onnx/vgg11.onnx --explicitBatch --optShapes=input:1x3x224x224 --workspace=8192 --iterations=105 --percentile=99 . [2022-05-06 12:33:40,844 616b67a69ab7:18330][tensorrt_inference_performance.py:23][INFO] benchmark: tensorrt-inference, return code: 32, result: {'return_code': [32]}
It seems that the trt onnx importer can not support the adaptive_avg_pool2d op?
adaptive_avg_pool2d
Please cc.
The text was updated successfully, but these errors were encountered:
I compared the the vgg11 generated by superbench(left) and a vgg net manually converted from pth(right).
I guess the adaptive_avg_pool2d should be converted into global_avg_pool so it can be imported by tensorrt onnx importer.
global_avg_pool
Sorry, something went wrong.
abuccts
No branches or pull requests
I am currently working on the superbench/superbench:v0.4.0-cuda11.1.1 docker workspace to measure benchmark.
To get different model's benchmark with tensorrt, I customized the
superbenchmark/examples/benchmarks/tensorrt_inference_performance.py
like belowexecution:
nvprof --log-file benches/TensorRT/vgg11/fp32_batch_1_prof.txt /opt/conda/bin/python /opt/superbench/examples/benchmarks/tensorrt_inference_performance.py 1 vgg11 fp32 | tee benches/TensorRT/vgg11/fp32_batch_1_time.txt
log :
It seems that the trt onnx importer can not support the
adaptive_avg_pool2d
op?Please cc.
The text was updated successfully, but these errors were encountered: