Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [Bug] TRT Error when compiling ViT with Dynamic Shape #3016

Closed
Hukongtao opened this issue Jul 17, 2024 · 5 comments · Fixed by #3019
Closed

🐛 [Bug] TRT Error when compiling ViT with Dynamic Shape #3016

Hukongtao opened this issue Jul 17, 2024 · 5 comments · Fixed by #3019
Assignees
Labels
bug Something isn't working

Comments

@Hukongtao
Copy link

Bug Description

To Reproduce

Minimal reproducible code:

import torch
import torch_tensorrt
from transformers import ViTForImageClassification

model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224')
model = model.eval().cuda()

inputs = [
    torch_tensorrt.Input(
        min_shape=[1, 3, 224, 224],
        opt_shape=[4, 3, 224, 224],
        max_shape=[16, 3, 224, 224],
        dtype=torch.float32
    )
]
# inputs = torch_tensorrt.Input(shape=[2, 3, 224, 224], dtype=torch.float32)
trt_gm = torch_tensorrt.compile(model, "dynamo", inputs)

Expected behavior

Model should compile with Dynamic shapes.
But I got error:

WARNING:torch_tensorrt.dynamo._compiler:Node scaled_dot_product_attention of op type call_function does not have metadata. This could sometimes lead to undefined behavior.
WARNING:torch_tensorrt.dynamo._compiler:Some nodes do not have metadata (shape and dtype information). This could lead to problems sometimes if the graph has PyTorch and TensorRT segments.
INFO:torch_tensorrt.dynamo._compiler:Partitioning the graph via the fast partitioner
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageChange] Init CUDA: CPU +489, GPU +0, now: CPU 6268, GPU 2121 (MiB)
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageChange] Init builder kernel library: CPU +1906, GPU +354, now: CPU 8327, GPU 2475 (MiB)
WARNING:torch_tensorrt [TensorRT Conversion Context]:CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
WARNING:torch_tensorrt.dynamo.conversion.converter_utils:Detected unparsable type in node formatting: <class 'torch.SymInt'>
WARNING:torch_tensorrt.dynamo.conversion.converter_utils:Detected unparsable type in node formatting: <class 'torch.SymInt'>
WARNING:torch_tensorrt.dynamo.conversion.converter_utils:Detected unparsable type in node formatting: <class 'torch.SymInt'>
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:00.058611
INFO:torch_tensorrt [TensorRT Conversion Context]:Global timing cache in use. Profiling results in this builder pass will be stored.
ERROR:torch_tensorrt [TensorRT Conversion Context]:IBuilder::buildSerializedNetwork: Error Code 4: Internal Error (kOPT values for profile 0 violate shape constraints: [SLICE]-[aten_ops.expand.default]-[/vit_embeddings/expand]: ISliceLayer has out of bounds access on axis 0 Condition '<' violated: 3 >= 1.)
Traceback (most recent call last):
  File "/mnt/bn/hukongtao-infer-speed/mlx/users/kongtao.hu/codebase/EasyGuard_0617/speed_vit_test.py", line 27, in <module>
    trt_gm = torch_tensorrt.compile(model, "dynamo", inputs)
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/_compile.py", line 250, in compile
    trt_graph_module = dynamo_compile(
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/_compiler.py", line 243, in compile
    trt_gm = compile_module(gm, inputs, settings)
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/_compiler.py", line 431, in compile_module
    trt_module = convert_module(
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/conversion/_conversion.py", line 107, in convert_module
    interpreter_result = interpret_module_to_result(module, inputs, settings)
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/conversion/_conversion.py", line 88, in interpret_module_to_result
    interpreter_result = interpreter.run()
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py", line 350, in run
    assert serialized_engine
AssertionError

Environment

image

Additional context

Reference official documentation:
https://pytorch.org/TensorRT/user_guide/dynamic_shapes.html

@Hukongtao Hukongtao added the bug Something isn't working label Jul 17, 2024
@peri044
Copy link
Collaborator

peri044 commented Jul 18, 2024

Thanks for the repro. I've fixed this bug in this PR : #3019

@Hukongtao
Copy link
Author

thank you for your reply~
I use the latest version and modify the code according to your PR, and I got another error:

WARNING:torch_tensorrt.dynamo.conversion.converter_utils:Detected unparsable type in node formatting: <class 'torch.SymInt'>
WARNING:torch_tensorrt.dynamo.conversion.converter_utils:Detected unparsable type in node formatting: <class 'torch.SymInt'>
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:00.039768
INFO:torch_tensorrt [TensorRT Conversion Context]:Global timing cache in use. Profiling results in this builder pass will be stored.
INFO:torch_tensorrt [TensorRT Conversion Context]:Detected 1 inputs and 6 output network tensors.
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Host Persistent Memory: 5552
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Device Persistent Memory: 0
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Scratch Memory: 48365568
INFO:torch_tensorrt [TensorRT Conversion Context]:[BlockAssignment] Started assigning block shifts. This will take 4 steps to complete.
INFO:torch_tensorrt [TensorRT Conversion Context]:[BlockAssignment] Algorithm ShiftNTopDown took 0.031924ms to assign 2 blocks to 4 nodes requiring 61210624 bytes.
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Activation Memory: 61210624
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Weights Memory: 10853632
INFO:torch_tensorrt [TensorRT Conversion Context]:Engine generation completed in 0.123574 seconds.
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 3 MiB, GPU 100 MiB
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 9363 MiB
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:00:00.135388
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 11179132 bytes of Memory
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 1496 bytes of code generator cache.
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 157388 bytes of compilation cache.
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 16 timing cache entries
WARNING: [Torch-TensorRT] - CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
Traceback (most recent call last):
  File "/mnt/bn/hukongtao-infer-speed/mlx/users/kongtao.hu/codebase/EasyGuard_0617/speed_vit_test.py", line 17, in <module>
    trt_gm = torch_tensorrt.compile(model, "dynamo", inputs)
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/_compile.py", line 249, in compile
    trt_graph_module = dynamo_compile(
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/_compiler.py", line 243, in compile
    trt_gm = compile_module(gm, inputs, settings)
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/_compiler.py", line 383, in compile_module
    submodule_inputs = partitioning.construct_submodule_inputs(submodule)
  File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/partitioning/common.py", line 124, in construct_submodule_inputs
    raise AssertionError(
AssertionError: Input scaled_dot_product_attention does not contain metadata. Please ensure you have exported the graph correctly

image

@Hukongtao
Copy link
Author

Looking forward to your reply

@peri044
Copy link
Collaborator

peri044 commented Jul 20, 2024

@Hukongtao This error is because our lowering pass was not copying over the metadata of the attention op to its replaced variant. I've pushed a fix now to the same PR : #3019. Can you give it a try ?

@Hukongtao
Copy link
Author

@Hukongtao This error is because our lowering pass was not copying over the metadata of the attention op to its replaced variant. I've pushed a fix now to the same PR : #3019. Can you give it a try ?

LGTM

@peri044 peri044 closed this as completed Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants