🐛 [Bug] TRT Error when compiling ViT with Dynamic Shape #3016

Hukongtao · 2024-07-17T09:01:26Z

Bug Description

To Reproduce

Minimal reproducible code：

import torch
import torch_tensorrt
from transformers import ViTForImageClassification

model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224')
model = model.eval().cuda()

inputs = [
    torch_tensorrt.Input(
        min_shape=[1, 3, 224, 224],
        opt_shape=[4, 3, 224, 224],
        max_shape=[16, 3, 224, 224],
        dtype=torch.float32
    )
]
# inputs = torch_tensorrt.Input(shape=[2, 3, 224, 224], dtype=torch.float32)
trt_gm = torch_tensorrt.compile(model, "dynamo", inputs)

Expected behavior

Model should compile with Dynamic shapes.
But I got error:

WARNING:torch_tensorrt.dynamo._compiler:Node scaled_dot_product_attention of op type call_function does not have metadata. This could sometimes lead to undefined behavior.
WARNING:torch_tensorrt.dynamo._compiler:Some nodes do not have metadata (shape and dtype information). This could lead to problems sometimes if the graph has PyTorch and TensorRT segments.
INFO:torch_tensorrt.dynamo._compiler:Partitioning the graph via the fast partitioner
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageChange] Init CUDA: CPU +489, GPU +0, now: CPU 6268, GPU 2121 (MiB)
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageChange] Init builder kernel library: CPU +1906, GPU +354, now: CPU 8327, GPU 2475 (MiB)
WARNING:torch_tensorrt [TensorRT Conversion Context]:CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
WARNING:torch_tensorrt.dynamo.conversion.converter_utils:Detected unparsable type in node formatting: <class 'torch.SymInt'>
WARNING:torch_tensorrt.dynamo.conversion.converter_utils:Detected unparsable type in node formatting: <class 'torch.SymInt'>
WARNING:torch_tensorrt.dynamo.conversion.converter_utils:Detected unparsable type in node formatting: <class 'torch.SymInt'>
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:00.058611
INFO:torch_tensorrt [TensorRT Conversion Context]:Global timing cache in use. Profiling results in this builder pass will be stored.
ERROR:torch_tensorrt [TensorRT Conversion Context]:IBuilder::buildSerializedNetwork: Error Code 4: Internal Error (kOPT values for profile 0 violate shape constraints: [SLICE]-[aten_ops.expand.default]-[/vit_embeddings/expand]: ISliceLayer has out of bounds access on axis 0 Condition '<' violated: 3 >= 1.)
Traceback (most recent call last):
  File "/mnt/bn/hukongtao-infer-speed/mlx/users/kongtao.hu/codebase/EasyGuard_0617/speed_vit_test.py", line 27, in <module>
    trt_gm = torch_tensorrt.compile(model, "dynamo", inputs)
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/_compile.py", line 250, in compile
    trt_graph_module = dynamo_compile(
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/_compiler.py", line 243, in compile
    trt_gm = compile_module(gm, inputs, settings)
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/_compiler.py", line 431, in compile_module
    trt_module = convert_module(
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/conversion/_conversion.py", line 107, in convert_module
    interpreter_result = interpret_module_to_result(module, inputs, settings)
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/conversion/_conversion.py", line 88, in interpret_module_to_result
    interpreter_result = interpreter.run()
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py", line 350, in run
    assert serialized_engine
AssertionError

Environment

Additional context

Reference official documentation：
https://pytorch.org/TensorRT/user_guide/dynamic_shapes.html

peri044 · 2024-07-18T21:55:20Z

Thanks for the repro. I've fixed this bug in this PR : #3019

Hukongtao · 2024-07-19T03:04:55Z

thank you for your reply~
I use the latest version and modify the code according to your PR， and I got another error：

WARNING:torch_tensorrt.dynamo.conversion.converter_utils:Detected unparsable type in node formatting: <class 'torch.SymInt'>
WARNING:torch_tensorrt.dynamo.conversion.converter_utils:Detected unparsable type in node formatting: <class 'torch.SymInt'>
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:00.039768
INFO:torch_tensorrt [TensorRT Conversion Context]:Global timing cache in use. Profiling results in this builder pass will be stored.
INFO:torch_tensorrt [TensorRT Conversion Context]:Detected 1 inputs and 6 output network tensors.
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Host Persistent Memory: 5552
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Device Persistent Memory: 0
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Scratch Memory: 48365568
INFO:torch_tensorrt [TensorRT Conversion Context]:[BlockAssignment] Started assigning block shifts. This will take 4 steps to complete.
INFO:torch_tensorrt [TensorRT Conversion Context]:[BlockAssignment] Algorithm ShiftNTopDown took 0.031924ms to assign 2 blocks to 4 nodes requiring 61210624 bytes.
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Activation Memory: 61210624
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Weights Memory: 10853632
INFO:torch_tensorrt [TensorRT Conversion Context]:Engine generation completed in 0.123574 seconds.
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 3 MiB, GPU 100 MiB
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 9363 MiB
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:00:00.135388
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 11179132 bytes of Memory
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 1496 bytes of code generator cache.
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 157388 bytes of compilation cache.
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 16 timing cache entries
WARNING: [Torch-TensorRT] - CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
Traceback (most recent call last):
  File "/mnt/bn/hukongtao-infer-speed/mlx/users/kongtao.hu/codebase/EasyGuard_0617/speed_vit_test.py", line 17, in <module>
    trt_gm = torch_tensorrt.compile(model, "dynamo", inputs)
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/_compile.py", line 249, in compile
    trt_graph_module = dynamo_compile(
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/_compiler.py", line 243, in compile
    trt_gm = compile_module(gm, inputs, settings)
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/_compiler.py", line 383, in compile_module
    submodule_inputs = partitioning.construct_submodule_inputs(submodule)
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/partitioning/common.py", line 124, in construct_submodule_inputs
    raise AssertionError(
AssertionError: Input scaled_dot_product_attention does not contain metadata. Please ensure you have exported the graph correctly

Hukongtao · 2024-07-19T06:44:24Z

Looking forward to your reply

peri044 · 2024-07-20T01:01:55Z

@Hukongtao This error is because our lowering pass was not copying over the metadata of the attention op to its replaced variant. I've pushed a fix now to the same PR : #3019. Can you give it a try ?

Hukongtao · 2024-07-21T11:05:16Z

@Hukongtao This error is because our lowering pass was not copying over the metadata of the attention op to its replaced variant. I've pushed a fix now to the same PR : #3019. Can you give it a try ?

LGTM

Hukongtao added the bug Something isn't working label Jul 17, 2024

narendasan assigned peri044 Jul 17, 2024

Hukongtao mentioned this issue Jul 18, 2024

chore: Fix expand DS support #2962

Merged

7 tasks

peri044 mentioned this issue Jul 18, 2024

chore: bug fixes for full and expand #3019

Merged

7 tasks

peri044 closed this as completed Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 [Bug] TRT Error when compiling ViT with Dynamic Shape #3016

🐛 [Bug] TRT Error when compiling ViT with Dynamic Shape #3016

Hukongtao commented Jul 17, 2024

peri044 commented Jul 18, 2024

Hukongtao commented Jul 19, 2024

Hukongtao commented Jul 19, 2024

peri044 commented Jul 20, 2024 •

edited

Loading

Hukongtao commented Jul 21, 2024

🐛 [Bug] TRT Error when compiling ViT with Dynamic Shape #3016

🐛 [Bug] TRT Error when compiling ViT with Dynamic Shape #3016

Comments

Hukongtao commented Jul 17, 2024

Bug Description

To Reproduce

Expected behavior

Environment

Additional context

peri044 commented Jul 18, 2024

Hukongtao commented Jul 19, 2024

Hukongtao commented Jul 19, 2024

peri044 commented Jul 20, 2024 • edited Loading

Hukongtao commented Jul 21, 2024

peri044 commented Jul 20, 2024 •

edited

Loading