Inconsistent results between TensorRT and ONNX inference for ReduceMax operator #3467

hongliyu0716 · 2023-11-19T09:53:09Z

Description

I am encountering an issue when using TensorRT to load an ONNX model that contains a ReduceMax operator. The inference results obtained from TensorRT are inconsistent with the results obtained from running the same model using ONNX runtime.
The model structure is as below:

Environment

TensorRT Version: 8.6.1.6

NVIDIA GPU: NVIDIA GeForce MX330

NVIDIA Driver Version: 470.182.03

CUDA Version: 11.4

CUDNN Version: 8.9.5

Operating System: Ubuntu 18.04

Python Version (if applicable): 3.8

Relevant Files

Model link: https://github.com/hongliyu0716/onnx_model/blob/main/ReduceMax.onnx

Steps To Reproduce

Download the model
Commands or scripts:

polygraphy run ReduceMax.onnx --onnxrt --trt --workspace 256M --save-engine test.plan --fp16 --verbose

The error message is as below:

[I] onnxrt-runner-N0-11/18/23-22:37:07  | Activating and starting inference
[V] Loaded Module: onnxruntime | Version: 1.16.1 | Path: ['/home/hll/anaconda3/envs/trt/lib/python3.8/site-packages/onnxruntime']
[I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider']
[V] Loaded Module: numpy | Version: 1.21.6 | Path: ['/home/hll/anaconda3/envs/trt/lib/python3.8/site-packages/numpy']
[V] Loading inputs from data loader
[V] Generating data using numpy seed: 1
[V] Input tensor: data | Generating input data in range: [0.0, 1.0]
[V] Input tensor: axes | Generating input data in range: [0, 1]
[I] onnxrt-runner-N0-11/18/23-22:37:07 
    ---- Inference Input(s) ----
    {data [dtype=float32, shape=(3, 2, 2)],
     axes [dtype=int64, shape=(1,)]}
[V] onnxrt-runner-N0-11/18/23-22:37:07  | Input metadata is: {data [dtype=float32, shape=(3, 2, 2)],
     axes [dtype=int64, shape=(1,)]}
[I] onnxrt-runner-N0-11/18/23-22:37:07 
    ---- Inference Output(s) ----
    {reduced [dtype=float32, shape=(3, 2)]}
[I] onnxrt-runner-N0-11/18/23-22:37:07  | Completed 1 iteration(s) in 0.7496 ms | Average inference time: 0.7496 ms.
[I] trt-runner-N0-11/18/23-22:37:07     | Activating and starting inference
[V] [MemUsageChange] Init CUDA: CPU +134, GPU +0, now: CPU 150, GPU 591 (MiB)
[V] [MemUsageChange] Init builder kernel library: CPU +226, GPU +38, now: CPU 451, GPU 629 (MiB)
[W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[V] ----------------------------------------------------------------
[V] Input filename:   /home/hll/workplace/trt/onnx/reproduce/ReduceMax.onnx
[V] ONNX IR version:  0.0.7
[V] Opset version:    18
[V] Producer name:    
[V] Producer version: 
[V] Domain:           
[V] Model version:    0
[V] Doc string:       
[V] ----------------------------------------------------------------
[V]     Setting TensorRT Optimization Profiles
[V]     Input tensor: data (dtype=DataType.FLOAT, shape=(3, 2, 2)) | Setting input tensor shapes to: (min=[3, 2, 2], opt=[3, 2, 2], max=[3, 2, 2])
[V]     Input tensor: axes (dtype=DataType.INT32, shape=(1,)) | Setting input tensor shapes to: (min=[1], opt=[1], max=[1])
[I]     Configuring with profiles: [Profile().add('data', min=[3, 2, 2], opt=[3, 2, 2], max=[3, 2, 2]).add('axes', min=[1], opt=[1], max=[1])]
[I] Building engine with configuration:
    Flags                  | [FP16]
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 256.00 MiB, TACTIC_DRAM: 2002.62 MiB]
    Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
    Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
[W] Unused Input: axes
[W] [RemoveDeadLayers] Input Tensor axes is unused or used only at compile-time, but is not being removed.
[V] Graph optimization time: 0.000251676 seconds.
[V] Global timing cache in use. Profiling results in this builder pass will be stored.
[V] Detected 2 inputs and 1 output network tensors.
[V] Total Host Persistent Memory: 0
[V] Total Device Persistent Memory: 0
[V] Total Scratch Memory: 0
[V] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 4 MiB
[V] Total Activation Memory: 0
[V] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[I] Finished engine building in 0.062 seconds
[V] Loaded engine size: 0 MiB
[V] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[V] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[V] Found candidate CUDA libraries: ['/home/hll/CUDA/cuda-11.4/lib64/libcudart.so', '/home/hll/CUDA/cuda-11.4/lib64/libcudart.so.11.4.148', '/home/hll/CUDA/cuda-11.4/lib64/libcudart.so.11.0']
[W] Input tensor: axes | Buffer dtype (int64) does not match expected input dtype (int32), attempting to cast. 
[I] trt-runner-N0-11/18/23-22:37:07    
    ---- Inference Input(s) ----
    {data [dtype=float32, shape=(3, 2, 2)],
     axes [dtype=int32, shape=(1,)]}
[V] trt-runner-N0-11/18/23-22:37:07     | Input metadata is: {data [dtype=float32, shape=(3, 2, 2)],
     axes [dtype=int32, shape=(1,)]}
[I] trt-runner-N0-11/18/23-22:37:07    
    ---- Inference Output(s) ----
    {reduced [dtype=float32, shape=()]}
[I] trt-runner-N0-11/18/23-22:37:07     | Completed 1 iteration(s) in 0.4239 ms | Average inference time: 0.4239 ms.
[V] Successfully ran: ['onnxrt-runner-N0-11/18/23-22:37:07', 'trt-runner-N0-11/18/23-22:37:07']
[I] Accuracy Comparison | onnxrt-runner-N0-11/18/23-22:37:07 vs. trt-runner-N0-11/18/23-22:37:07
[I]     Comparing Output: 'reduced' (dtype=float32, shape=(3, 2)) with 'reduced' (dtype=float32, shape=())
[I]         Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[E]         Will not compare outputs of different shapes. Note: Output shapes are (3, 2) and ().
[E]         Note: Use --no-shape-check or set check_shapes=False to attempt to compare values anyway.
[E]         FAILED | Output: 'reduced' | Difference exceeds tolerance (rel=1e-05, abs=1e-05)
[E]     FAILED | Mismatched outputs: ['reduced']
[E] Accuracy Summary | onnxrt-runner-N0-11/18/23-22:37:07 vs. trt-runner-N0-11/18/23-22:37:07 | Passed: 0/1 iterations | Pass Rate: 0.0%
[E] FAILED | Runtime: 5.772s | Command: /home/hll/anaconda3/envs/trt/bin/polygraphy run ReduceMax.onnx --onnxrt --trt --workspace 256M --save-engine test.plan --fp16 --verbose

The text was updated successfully, but these errors were encountered:

zerollzeng · 2023-11-22T12:46:37Z

Filed internal bug 4389301 for this.

zerollzeng · 2023-11-29T06:22:05Z

trt currently doesn't support axes as inputs for reduce op, in TRT 10.0 we add a check to reject this onnx. close this.

zerollzeng self-assigned this Nov 22, 2023

zerollzeng added triaged Issue has been triaged by maintainers internal-bug-tracked Tracked internally, will be fixed in a future release. labels Nov 22, 2023

zerollzeng closed this as completed Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent results between TensorRT and ONNX inference for ReduceMax operator #3467

Inconsistent results between TensorRT and ONNX inference for ReduceMax operator #3467

hongliyu0716 commented Nov 19, 2023

zerollzeng commented Nov 22, 2023

zerollzeng commented Nov 29, 2023

Inconsistent results between TensorRT and ONNX inference for ReduceMax operator #3467

Inconsistent results between TensorRT and ONNX inference for ReduceMax operator #3467

Comments

hongliyu0716 commented Nov 19, 2023

Description

Environment

Relevant Files

Steps To Reproduce

zerollzeng commented Nov 22, 2023

zerollzeng commented Nov 29, 2023