are FP8 models supported in Triton ?? #7678

jayakommuru · 2024-10-04T06:26:22Z

We have an encoder based model, and we have currently deployed in FP16 mode in production and we want to reduce the latecny further.

Does triton support FP8 ? In the datatypes documentation here: https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_configuration.html#datatypes I don't see FP8 in the datatypes.

We are using trtexec CLI to convert onnx to trt engine file. I see an option --fp8 to generate fp8 serialized engine files. Can anyone confirm if we can deploy FP8 models in triton?

jayakommuru · 2024-10-04T07:24:14Z

@oandreeva-nv can you help with this ^^ ?

oandreeva-nv · 2024-10-04T19:26:04Z

Hi @jayakommuru, let me verify it. I'll get back to you

oandreeva-nv · 2024-10-04T20:04:04Z

RT backend does not support FP8 I/O for the TRT engine. However, weights and internal tensors can be FP8.

jayakommuru · 2024-10-05T04:54:21Z

@oandreeva-nv Ok, Can there be any throughput/performance benefits by running FP8 TRT engine file with FP16 I/O? which triton data type should be used with FP8 TRT engine file in TRT backend ?

jayakommuru · 2024-10-06T07:33:44Z

@oandreeva-nv can you confirm if using FP16 I/O triton datatypes and FP8 TRT engine, does it give any benefit? Thanks

oandreeva-nv · 2024-10-07T18:07:36Z

Hi @jayakommuru , we have a perf_analyzer tool, that can help you analyzing the performance of your model.

jayakommuru · 2024-10-08T16:14:30Z

@oandreeva-nv Sure, will explore the perf-analyzer. Any idea whether to use FP32 or FP16 I/O datatype of triton for TensorRT FP8 models ?

oandreeva-nv added the question Further information is requested label Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

are FP8 models supported in Triton ?? #7678

are FP8 models supported in Triton ?? #7678

jayakommuru commented Oct 4, 2024

jayakommuru commented Oct 4, 2024

oandreeva-nv commented Oct 4, 2024 •

edited

Loading

oandreeva-nv commented Oct 4, 2024

jayakommuru commented Oct 5, 2024

jayakommuru commented Oct 6, 2024

oandreeva-nv commented Oct 7, 2024 •

edited

Loading

jayakommuru commented Oct 8, 2024

are FP8 models supported in Triton ?? #7678

are FP8 models supported in Triton ?? #7678

Comments

jayakommuru commented Oct 4, 2024

jayakommuru commented Oct 4, 2024

oandreeva-nv commented Oct 4, 2024 • edited Loading

oandreeva-nv commented Oct 4, 2024

jayakommuru commented Oct 5, 2024

jayakommuru commented Oct 6, 2024

oandreeva-nv commented Oct 7, 2024 • edited Loading

jayakommuru commented Oct 8, 2024

oandreeva-nv commented Oct 4, 2024 •

edited

Loading

oandreeva-nv commented Oct 7, 2024 •

edited

Loading