Torch-TensorRT v2.3.0 #2899
narendasan
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Windows Support, Dynamic Shape and Quantization in Dynamo , PyTorch 2.3, CUDA 12.1, TensorRT 10.0
Torch-TensorRT 2.3.0 targets PyTorch 2.3, CUDA 12.1 (builds for CUDA 11.8 are available via the PyTorch package index - https://download.pytorch.org/whl/cu118) and TensorRT 10.0. 2.3.0 adds official support for Windows as a platform. Windows will only support using the Dynamo frontend and currently users are required to use the Python-only runtime (support for the C++ runtime will be added in a future version). This release also adds support for Dynamic shape without recompilation. Users can also now use quantized models with Torch-TensorRT using the Model Optimizer toolkit (https://github.com/NVIDIA/TensorRT-Model-Optimizer).
Windows
In this release we introduce Windows support for the Python runtime using the Dynamo paths. Users can now directly optimize PyTorch models with TensorRT on Windows, with minimal code changes. This integration enables Python-only optimization in the Torch-TensorRT Dynamo compilation paths (
ir="dynamo"
andir="torch_compile"
).Dynamic Shaped Model Compilation in Dynamo
Dynamic shape support has become more robust in v2.3.0. Torch-TensorRT now leverages symbolic information in the graph to calculate intermediate shape ranges which allows more dynamic shape cases to be supported. For AOT workflows using torch.export, using these new features requires no changes. For JIT workflows which previously used
torch.compile
guards to automatically recompile the engines where the input size changes, users can now mark dynamic dimensions using torch APIs (https://pytorch.org/docs/stable/torch.compiler_dynamic_shapes.html). Using these APIs will mean that as long as inputs do not violate the specified constraints, engines would not recompile.AOT workflow
JIT workflow
More information can be found here: https://pytorch.org/TensorRT/user_guide/dynamic_shapes.html
Explicit Dynamic Shape support in Converters
Converters now explicitly declare their support for dynamic shapes and we are progressively adding and verifying. Converter writers can specify the support for dynamic shapes using the
supports_dynamic_shape
argument of thedynamo_tensorrt_converter
decorator.By default, if a converter has not been marked as supporting dynamic shape, it's operator will be run in PyTorch if the user has specified the inputs as dynamic. This is done for the sake of ensuring that compilation will succeed with some valid compiled module. However, many operators already support dynamic shape in an untested fashion. Therefore, users can decide to enable to full converter library for dynamic shape using the
assume_dynamic_shape_support
flag. This flag assumes all converters support dynamic shape, leading to more operations being run in TensorRT with the potential drawback that some ops may cause compilation or runtime failures. Future releases will add progressively add coverage for dynamic shape for all Core ATen Operators.Quantization in Dynamo
We introduce support for model quantization in FP8. We support models quantized using NVIDIA TensorRT-Model-Optimizer toolkit. This toolkit introduces quantization nodes in the graph which are converted and used by TensorRT to quantize the model into lower precision. Although the toolkit supports quantization in various datatypes, we only support FP8 in this release.
Please refer to our end-end example Torch Compile VGG16 with FP8 and PTQ on how to use this.
Engine Version and Hardware Compatibility
We introduce new compilation arguments,
hardware_compatible: bool
andversion_compatible: bool
, which enable two key features in TensorRT.hardware_compatible
Enabling hardware compatibility mode will generate TRT Engines which are compatible with Ampere and newer GPUs. As a result, engines built on one GPU can later be run on others, without requiring recompilation.
version_compatible
Enabling version compatibility mode will generate TRT Engines which are compatible with newer versions of TensorRT. As a result, engines built with one version of TensorRT will be forward compatible with other TRT versions, without needing recompilation.
New Data Type Support
Torch-TensorRT includes a number of new data types that leverage dedicated hardware on Ampere, Hopper and future architectures.
bfloat16
has been added as a supported type alongside FP16 and FP32 that can be enabled for additional kernel tactic options. Models that contain BF16 weights can now be provided to Torch-TensorRT without modification. FP8 has been added with support for Hopper and newer architectures as a new quantization format (see below), similar to INT8. Finally, native support for INT64 inputs and computation has been added. In the past, thetruncate_long_and_double
feature flag must be enabled in order to handle INT64 and FLOAT64 computation, inputs and weights. This flag would cause the compiler to truncate any INT64 or FLOAT64 objects to INT32 and FLOAT32 respectively. Now INT64 objects will not be truncated and remain in INT64. As such, thetruncate_long_and_double
flag has been renamedtruncate_double
as FLOAT64 truncation is still required,truncate_long_and_double
is now deprecated.What's Changed
_unsafe_index
by @gs-olive in feat: Decomposition for_unsafe_index
#2386torch.compile
backend usage by @gs-olive in docs: Add documentation oftorch.compile
backend usage #2363aten.unbind
decomposition for VIT by @gs-olive in feat: Addaten.unbind
decomposition for VIT #2430torch.compile
sample with output image by @gs-olive in examples: Stable Diffusiontorch.compile
sample with output image #2417aten.view
across Tensor memory by @gs-olive in fix: Error withaten.view
across Tensor memory #2464torch_executed_ops
by @gs-olive in fix: Repair usage oftorch_executed_ops
#2562convert_method_to_trt_engine()
for dynamo by @zewenli98 in feat: addconvert_method_to_trt_engine()
for dynamo #2467compile
by @gs-olive in small fix: Remove extraneous argument incompile
#2635New Contributors
Full Changelog: v2.2.0...v2.3.0
This discussion was created from the release Torch-TensorRT v2.3.0.
Beta Was this translation helpful? Give feedback.
All reactions