Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TODO] Implement a process to reduce accuracy degradation due to transposition errors in the Transformer's MatMul input values. #317

Closed
PINTO0309 opened this issue Apr 16, 2023 · 3 comments
Labels
OP:MatMul OP:MatMul TODO TODO Transformer Transformer

Comments

@PINTO0309
Copy link
Owner

PINTO0309 commented Apr 16, 2023

Issue Type

Others

onnx2tf version number

1.10.x

onnx version number

1.13.1

tensorflow version number

2.12.0

Download URL for ONNX

N/A

Parameter Replacement JSON

N/A

Description

Implement a process to reduce accuracy degradation due to transposition errors in the Transformer's MatMul input values.

1. Issue

  1. The process for automatic correction of transposition errors has already been implemented internally.
  2. When the auto-correction feature is enabled, it attempts to retain the ONNX pre-estimation results in Numpy.ndarray format for the entire model. However, the Numpy.ndarray format is very RAM intensive, and Out of Memory frequently occurs when converting models of large size.
  3. In addition, the output values of ONNX and TensorFlow are compared for all OPs to verify the certainty of the conversion, which has the problem of significantly slowing down the conversion speed of the model.
  4. Frequent accuracy degradation occurs only when the sizes of all dimensions except the batch size are the same, as in [1,256,256] in the figure below.
    image
    image

2. Idea

  1. Instead of keeping the output values of all OPs of ONNX, aim only at MatMul to keep the inference results, thus greatly reducing RAM consumption.
  2. When generating a TensorFlow MatMul or BatchMatMul, always check the consistency with the output value of ONNX's MatMul, and if a large difference occurs, automatically transpose the input tensor in a brute force fashion to find the tensor arrangement with the smallest error.
  3. When validating the ONNX model, not only the output tensor of MatMul but also the input tensor should be kept at the same time so that it can be diverted as an input tensor when validating the TensorFlow OP.
  4. Immediately after starting the tool, dummy inference is performed, but instead of assigning output OPs to all OPs in the model, output OPs are assigned only to MatMul.
  5. During dummy inference, only the input and output tensors of the MatMul OP should be kept internally.
  6. Consider persistence to an external file depending on the total number of MatMul OPs and the size of each retention tensor.

3. Related issue

@PINTO0309 PINTO0309 added OP:MatMul OP:MatMul TODO TODO Transformer Transformer Need Help Need Help and removed Need Help Need Help labels Apr 16, 2023
@On-JungWoan
Copy link
Contributor

This is a personal question that is off topic, but is there a reason to store the output of all ONNX OPs? In my opinion, it seems more efficient to compare the layer outputs of ONNX and TensorFlow sequentially and only store the current results rather than all previous ones. If this method is used, I think there won't be any memory shortage issues even with large models.

@PINTO0309
Copy link
Owner Author

Thank you. Actually, I have a history of trying the idea you suggested already 2 months ago. At that time, all OPs other than MatMul had to be included in the verification process.

The reason for this is that my tools were always incomplete and still not finished, and had various bugs inherent in them.

In other words, the successful verification of a local tensor is predicated on the assumption that all of the OPs preceding the OP being verified are bug-free.

At the moment, I am troubled by the fact that this tool does not address all of the infinite number of model transformation patterns, so local verification alone often does not work.

I do most of the bug fixing on my own, but there are too many patterns and not enough time.

@On-JungWoan
Copy link
Contributor

Ahh, Thank you for all your hard work. I have been following you since you created openvino2tensorflow, and I have always had great respect for you. If there is anything I can do to help, I am more than happy to lend a hand. Thank you again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OP:MatMul OP:MatMul TODO TODO Transformer Transformer
Projects
None yet
Development

No branches or pull requests

2 participants