Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent results between TensorRT and ONNX inference for ReduceMax operator #3467

Closed
hongliyu0716 opened this issue Nov 19, 2023 · 2 comments
Assignees
Labels
internal-bug-tracked Tracked internally, will be fixed in a future release. triaged Issue has been triaged by maintainers

Comments

@hongliyu0716
Copy link

Description

I am encountering an issue when using TensorRT to load an ONNX model that contains a ReduceMax operator. The inference results obtained from TensorRT are inconsistent with the results obtained from running the same model using ONNX runtime.
The model structure is as below:
ReduceMax onnx

Environment

TensorRT Version: 8.6.1.6

NVIDIA GPU: NVIDIA GeForce MX330

NVIDIA Driver Version: 470.182.03

CUDA Version: 11.4

CUDNN Version: 8.9.5

Operating System: Ubuntu 18.04

Python Version (if applicable): 3.8

Relevant Files

Model link: https://github.com/hongliyu0716/onnx_model/blob/main/ReduceMax.onnx

Steps To Reproduce

  1. Download the model
  2. Commands or scripts:
polygraphy run ReduceMax.onnx --onnxrt --trt --workspace 256M --save-engine test.plan --fp16 --verbose

The error message is as below:

[I] onnxrt-runner-N0-11/18/23-22:37:07  | Activating and starting inference
[V] Loaded Module: onnxruntime | Version: 1.16.1 | Path: ['/home/hll/anaconda3/envs/trt/lib/python3.8/site-packages/onnxruntime']
[I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider']
[V] Loaded Module: numpy | Version: 1.21.6 | Path: ['/home/hll/anaconda3/envs/trt/lib/python3.8/site-packages/numpy']
[V] Loading inputs from data loader
[V] Generating data using numpy seed: 1
[V] Input tensor: data | Generating input data in range: [0.0, 1.0]
[V] Input tensor: axes | Generating input data in range: [0, 1]
[I] onnxrt-runner-N0-11/18/23-22:37:07 
    ---- Inference Input(s) ----
    {data [dtype=float32, shape=(3, 2, 2)],
     axes [dtype=int64, shape=(1,)]}
[V] onnxrt-runner-N0-11/18/23-22:37:07  | Input metadata is: {data [dtype=float32, shape=(3, 2, 2)],
     axes [dtype=int64, shape=(1,)]}
[I] onnxrt-runner-N0-11/18/23-22:37:07 
    ---- Inference Output(s) ----
    {reduced [dtype=float32, shape=(3, 2)]}
[I] onnxrt-runner-N0-11/18/23-22:37:07  | Completed 1 iteration(s) in 0.7496 ms | Average inference time: 0.7496 ms.
[I] trt-runner-N0-11/18/23-22:37:07     | Activating and starting inference
[V] [MemUsageChange] Init CUDA: CPU +134, GPU +0, now: CPU 150, GPU 591 (MiB)
[V] [MemUsageChange] Init builder kernel library: CPU +226, GPU +38, now: CPU 451, GPU 629 (MiB)
[W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[V] ----------------------------------------------------------------
[V] Input filename:   /home/hll/workplace/trt/onnx/reproduce/ReduceMax.onnx
[V] ONNX IR version:  0.0.7
[V] Opset version:    18
[V] Producer name:    
[V] Producer version: 
[V] Domain:           
[V] Model version:    0
[V] Doc string:       
[V] ----------------------------------------------------------------
[V]     Setting TensorRT Optimization Profiles
[V]     Input tensor: data (dtype=DataType.FLOAT, shape=(3, 2, 2)) | Setting input tensor shapes to: (min=[3, 2, 2], opt=[3, 2, 2], max=[3, 2, 2])
[V]     Input tensor: axes (dtype=DataType.INT32, shape=(1,)) | Setting input tensor shapes to: (min=[1], opt=[1], max=[1])
[I]     Configuring with profiles: [Profile().add('data', min=[3, 2, 2], opt=[3, 2, 2], max=[3, 2, 2]).add('axes', min=[1], opt=[1], max=[1])]
[I] Building engine with configuration:
    Flags                  | [FP16]
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 256.00 MiB, TACTIC_DRAM: 2002.62 MiB]
    Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
    Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
[W] Unused Input: axes
[W] [RemoveDeadLayers] Input Tensor axes is unused or used only at compile-time, but is not being removed.
[V] Graph optimization time: 0.000251676 seconds.
[V] Global timing cache in use. Profiling results in this builder pass will be stored.
[V] Detected 2 inputs and 1 output network tensors.
[V] Total Host Persistent Memory: 0
[V] Total Device Persistent Memory: 0
[V] Total Scratch Memory: 0
[V] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 4 MiB
[V] Total Activation Memory: 0
[V] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[I] Finished engine building in 0.062 seconds
[V] Loaded engine size: 0 MiB
[V] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[V] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[V] Found candidate CUDA libraries: ['/home/hll/CUDA/cuda-11.4/lib64/libcudart.so', '/home/hll/CUDA/cuda-11.4/lib64/libcudart.so.11.4.148', '/home/hll/CUDA/cuda-11.4/lib64/libcudart.so.11.0']
[W] Input tensor: axes | Buffer dtype (int64) does not match expected input dtype (int32), attempting to cast. 
[I] trt-runner-N0-11/18/23-22:37:07    
    ---- Inference Input(s) ----
    {data [dtype=float32, shape=(3, 2, 2)],
     axes [dtype=int32, shape=(1,)]}
[V] trt-runner-N0-11/18/23-22:37:07     | Input metadata is: {data [dtype=float32, shape=(3, 2, 2)],
     axes [dtype=int32, shape=(1,)]}
[I] trt-runner-N0-11/18/23-22:37:07    
    ---- Inference Output(s) ----
    {reduced [dtype=float32, shape=()]}
[I] trt-runner-N0-11/18/23-22:37:07     | Completed 1 iteration(s) in 0.4239 ms | Average inference time: 0.4239 ms.
[V] Successfully ran: ['onnxrt-runner-N0-11/18/23-22:37:07', 'trt-runner-N0-11/18/23-22:37:07']
[I] Accuracy Comparison | onnxrt-runner-N0-11/18/23-22:37:07 vs. trt-runner-N0-11/18/23-22:37:07
[I]     Comparing Output: 'reduced' (dtype=float32, shape=(3, 2)) with 'reduced' (dtype=float32, shape=())
[I]         Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[E]         Will not compare outputs of different shapes. Note: Output shapes are (3, 2) and ().
[E]         Note: Use --no-shape-check or set check_shapes=False to attempt to compare values anyway.
[E]         FAILED | Output: 'reduced' | Difference exceeds tolerance (rel=1e-05, abs=1e-05)
[E]     FAILED | Mismatched outputs: ['reduced']
[E] Accuracy Summary | onnxrt-runner-N0-11/18/23-22:37:07 vs. trt-runner-N0-11/18/23-22:37:07 | Passed: 0/1 iterations | Pass Rate: 0.0%
[E] FAILED | Runtime: 5.772s | Command: /home/hll/anaconda3/envs/trt/bin/polygraphy run ReduceMax.onnx --onnxrt --trt --workspace 256M --save-engine test.plan --fp16 --verbose
@zerollzeng zerollzeng self-assigned this Nov 22, 2023
@zerollzeng zerollzeng added triaged Issue has been triaged by maintainers internal-bug-tracked Tracked internally, will be fixed in a future release. labels Nov 22, 2023
@zerollzeng
Copy link
Collaborator

Filed internal bug 4389301 for this.

@zerollzeng
Copy link
Collaborator

trt currently doesn't support axes as inputs for reduce op, in TRT 10.0 we add a check to reject this onnx. close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal-bug-tracked Tracked internally, will be fixed in a future release. triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants