You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement a process to reduce accuracy degradation due to transposition errors in the Transformer's MatMul input values.
1. Issue
The process for automatic correction of transposition errors has already been implemented internally.
When the auto-correction feature is enabled, it attempts to retain the ONNX pre-estimation results in Numpy.ndarray format for the entire model. However, the Numpy.ndarray format is very RAM intensive, and Out of Memory frequently occurs when converting models of large size.
In addition, the output values of ONNX and TensorFlow are compared for all OPs to verify the certainty of the conversion, which has the problem of significantly slowing down the conversion speed of the model.
Frequent accuracy degradation occurs only when the sizes of all dimensions except the batch size are the same, as in [1,256,256] in the figure below.
2. Idea
Instead of keeping the output values of all OPs of ONNX, aim only at MatMul to keep the inference results, thus greatly reducing RAM consumption.
When generating a TensorFlow MatMul or BatchMatMul, always check the consistency with the output value of ONNX's MatMul, and if a large difference occurs, automatically transpose the input tensor in a brute force fashion to find the tensor arrangement with the smallest error.
When validating the ONNX model, not only the output tensor of MatMul but also the input tensor should be kept at the same time so that it can be diverted as an input tensor when validating the TensorFlow OP.
Immediately after starting the tool, dummy inference is performed, but instead of assigning output OPs to all OPs in the model, output OPs are assigned only to MatMul.
During dummy inference, only the input and output tensors of the MatMul OP should be kept internally.
Consider persistence to an external file depending on the total number of MatMul OPs and the size of each retention tensor.
This is a personal question that is off topic, but is there a reason to store the output of all ONNX OPs? In my opinion, it seems more efficient to compare the layer outputs of ONNX and TensorFlow sequentially and only store the current results rather than all previous ones. If this method is used, I think there won't be any memory shortage issues even with large models.
Thank you. Actually, I have a history of trying the idea you suggested already 2 months ago. At that time, all OPs other than MatMul had to be included in the verification process.
The reason for this is that my tools were always incomplete and still not finished, and had various bugs inherent in them.
In other words, the successful verification of a local tensor is predicated on the assumption that all of the OPs preceding the OP being verified are bug-free.
At the moment, I am troubled by the fact that this tool does not address all of the infinite number of model transformation patterns, so local verification alone often does not work.
I do most of the bug fixing on my own, but there are too many patterns and not enough time.
Ahh, Thank you for all your hard work. I have been following you since you created openvino2tensorflow, and I have always had great respect for you. If there is anything I can do to help, I am more than happy to lend a hand. Thank you again.
Issue Type
Others
onnx2tf version number
1.10.x
onnx version number
1.13.1
tensorflow version number
2.12.0
Download URL for ONNX
N/A
Parameter Replacement JSON
N/A
Description
Implement a process to reduce accuracy degradation due to transposition errors in the Transformer's
MatMul
input values.1. Issue
[1,256,256]
in the figure below.2. Idea
MatMul
to keep the inference results, thus greatly reducing RAM consumption.MatMul
orBatchMatMul
, always check the consistency with the output value of ONNX'sMatMul
, and if a large difference occurs, automatically transpose the input tensor in a brute force fashion to find the tensor arrangement with the smallest error.MatMul
but also the input tensor should be kept at the same time so that it can be diverted as an input tensor when validating the TensorFlow OP.MatMul
.MatMul
OP should be kept internally.MatMul
OPs and the size of each retention tensor.3. Related issue
The text was updated successfully, but these errors were encountered: