-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 [Bug] TRT Error when compiling ViT with Dynamic Shape #3016
Comments
Thanks for the repro. I've fixed this bug in this PR : #3019 |
thank you for your reply~ WARNING:torch_tensorrt.dynamo.conversion.converter_utils:Detected unparsable type in node formatting: <class 'torch.SymInt'>
WARNING:torch_tensorrt.dynamo.conversion.converter_utils:Detected unparsable type in node formatting: <class 'torch.SymInt'>
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:00.039768
INFO:torch_tensorrt [TensorRT Conversion Context]:Global timing cache in use. Profiling results in this builder pass will be stored.
INFO:torch_tensorrt [TensorRT Conversion Context]:Detected 1 inputs and 6 output network tensors.
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Host Persistent Memory: 5552
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Device Persistent Memory: 0
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Scratch Memory: 48365568
INFO:torch_tensorrt [TensorRT Conversion Context]:[BlockAssignment] Started assigning block shifts. This will take 4 steps to complete.
INFO:torch_tensorrt [TensorRT Conversion Context]:[BlockAssignment] Algorithm ShiftNTopDown took 0.031924ms to assign 2 blocks to 4 nodes requiring 61210624 bytes.
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Activation Memory: 61210624
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Weights Memory: 10853632
INFO:torch_tensorrt [TensorRT Conversion Context]:Engine generation completed in 0.123574 seconds.
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 3 MiB, GPU 100 MiB
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 9363 MiB
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:00:00.135388
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 11179132 bytes of Memory
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 1496 bytes of code generator cache.
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 157388 bytes of compilation cache.
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 16 timing cache entries
WARNING: [Torch-TensorRT] - CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
Traceback (most recent call last):
File "/mnt/bn/hukongtao-infer-speed/mlx/users/kongtao.hu/codebase/EasyGuard_0617/speed_vit_test.py", line 17, in <module>
trt_gm = torch_tensorrt.compile(model, "dynamo", inputs)
File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/_compile.py", line 249, in compile
trt_graph_module = dynamo_compile(
File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/_compiler.py", line 243, in compile
trt_gm = compile_module(gm, inputs, settings)
File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/_compiler.py", line 383, in compile_module
submodule_inputs = partitioning.construct_submodule_inputs(submodule)
File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/partitioning/common.py", line 124, in construct_submodule_inputs
raise AssertionError(
AssertionError: Input scaled_dot_product_attention does not contain metadata. Please ensure you have exported the graph correctly |
Looking forward to your reply |
@Hukongtao This error is because our lowering pass was not copying over the metadata of the attention op to its replaced variant. I've pushed a fix now to the same PR : #3019. Can you give it a try ? |
LGTM |
Bug Description
To Reproduce
Minimal reproducible code:
Expected behavior
Model should compile with Dynamic shapes.
But I got error:
Environment
Additional context
Reference official documentation:
https://pytorch.org/TensorRT/user_guide/dynamic_shapes.html
The text was updated successfully, but these errors were encountered: