-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FP16 outputs error of TensorRT 8.6.1.2 when running Roberta #3101
Comments
Have you seen some FP16 warnings in the polygraphy output? like LayerNorm or FP16 overflow etc. |
only FP16 overflow warnnings, no LayerNorm warnnings of roberta_wwm_ext_opset17_fuse_ln.onnx
|
@DayDayupupupup , there are only 5 exponent bits in fp16 compare to 8 exponents bits in fp32. So we might overflow depends on the input data flow into operation like |
Thank you very much and look forward to the next release of TRT. But I am still confused. I understand that the normalization in bert is forced to fp32. The model structure is little difference between roberta and bert. But the normalization of roberta overflows, does it mean that the normalization of roberta does not force to fp32? |
@DayDayupupupup the model would overflow or not depends on the input data before the And for the model structure problem, we could use visualization tool to check the onnx, sometimes there are Hope this helps! |
@ttyio As I mentioned above, I've fused the normalization process into a Visualizing the trt engine, there is only one myelin layer node. After building the fp16 engine with the |
@DayDayupupupup , by enable
we only make this layernorm to FP32 precision, not impact other layers even if they are in the same node. |
@ttyio I have always set it like this. |
Hi, @DayDayupupupup , I got the same error on BeIT model which is beased on Bert. In your case, many matmul weights are warned that they are affected by this issue: Detected subnormal FP16 values. However you just set LN force to be fp32,then the fp16-model result will be good regardless of matmul weights? unfortunately, I try to set LN\Matmul\Reduce\Softmax (Those may have potential overflow) to be fp32 on BeIT, but still get wrong result. |
@FdyCN ,In my case, only LN will overflow, and only a small number of weights are subnormal FP16 values, which will not cause a large error in the final result. |
Description
Since the INormalization layer was added in TRT8.6, I do some tests with the fp16's accuracy:
PASSED | Output: 'pooler_output' | Difference is within tolerance (rel=1e-05, abs=0.01), FAILED | Output: 'last_hidden_state'
Environment
TensorRT Version: 8.6.1.2
NVIDIA GPU: A30
NVIDIA Driver Version: 510.47.03
CUDA Version: 11.6
Operating System: Ubuntu 20.04.2 LTS
Tensorflow Version (if applicable): 1.15.5
Container version: nvcr.io/nvidia/tensorrt:23.05-py3
Steps To Reproduce
Test1: roberta-base
When I use real data, the error is even greater
then
polygraphy run roberta_base_opset17.onnx --trt --onnxrt --atol 0.01 --pool-limit workspace:10G --fp16 --load-inputs custom_inputs.json
Test2: chinese-roberta-wwm-ext
Relevant Files: Download tensorflow ckpt at below link:
Model link: chinese-roberta-wwm-ext tensorflow ckpt
As mentioned in this question #2466,bert4keras is still used to process the model
2.1 Create savedmodel
2.2 Create onnx model with tf2onnx(1.13.0)
2.3 fuse layernorm
Because tf2onnx splits layernorm, it needs to be merged manually. (fp16 result is wrong without fuse layernorm)
2.4
polygraphy run roberta_wwm_ext_opset17_fuse_ln.onnx --trt --onnxrt --atol 0.01 --pool-limit workspace:10G --fp16
Question
Bert-base is fine, so I'm not sure if this error was caused by layernorm or roberta.
Because on trt8.5, if I set LayerNorm plugin to fp32, the inference is correct.
However, on trt8.6, I tried to set the INormalization layer to fp32, then the entire model is on fp32, because the visualization engine found only one myelin layer.
What can be done to ensure the accuracy of roberta fp16?
The text was updated successfully, but these errors were encountered: