Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add Legalization from ONNX-MLIR to MHLO #1514

Open
Connor-XY opened this issue Jun 28, 2022 · 9 comments
Open

[RFC] Add Legalization from ONNX-MLIR to MHLO #1514

Connor-XY opened this issue Jun 28, 2022 · 9 comments

Comments

@Connor-XY
Copy link
Contributor

[RFC] Add Legalization from ONNX-MLIR to MHLO

Hello everyone, we are from the AML (Applied Machine Learning) team at Bytedance. Given this RFC where they want to add TOSA legalization pass, we have observed similar interest in support for MHLO dialect (Meta HLO Dialect). We would also like to add the feature of conversion from ONNX to MHLO. The introduction and the implementation of MHLO dialect can be found here and the corresponding operation definition is here.

Objective

We want to add the legalization pass from ONNX-MLIR to MHLO dialect. We would add this conversion as an independent pass, which won't affect the other parts of the ONNX-MLIR. This modification could provide an extra path where ONNX-MLIR can be further lowered into MHLO dialect, and thus enrich the applications of ONNX-MLIR and make use of the optimization efforts on MHLO dialect.

Motivation

HLO (High Level Optimizer) IR provides a set of mostly orthogonal operations which can give rather complete functionalities. MHLO supports a HLO-like pipeline using MLIR and provides a uniform interface to compile and execute these optimized HLO programs with dynamic shape support. Both Tensorflow and PyTorch models can be legalized to MHLO dialect. The same conversion can also be applied to ONNX models. ONNX is one of the most important and widely used portable formats in machine learning models, and there doesn't exist a path to MHLO dialect currrently. Therefore, it is non-trivial to have a pass for this legalization. If we have the conversion pass from ONNX to MHLO, we could directly convert ONNX models to the MHLO dialect. What’s more, we can convert other models with ONNX API calls if we can support this legalization.

User Benefit

This addition would consolidate the development efforts of ONNX-MLIR. It will enhance the ecosystem of ONNX-MLIR by enriching the applications of ONNX models. This extra path from ONNX-MLIR to MHLO can bridge the two dialects. MHLO is a proper IR since it is still close to the frontend and can represent models of TensorFlow and PyTorch. If this conversion can be supported, then we can represent all models of these three ecosystems in a unified representation, so we can focus on the optimization of the MHLO dialect. This also adds to the flexibility of models since we can convert models to MHLO dialect as long as every operation can be lowered.

Design Proposal

The pass will be implemented in the similar way as that of conversion from ONNX to TOSA dialect. We will implement it as an independent pass so it won't interfere with other parts. We would represent ONNX operations using MHLO dialect.
For example, given the ONNX-MLIR of Softmax:

func @test_softmax(%arg0 : tensor<10x30xf32>) -> tensor<10x30xf32> {
    %0 = "onnx.Softmax"(%arg0) {axis = 1: si64, onnx_opset = 13 : si64} : (tensor<10x30xf32>) -> tensor<10x30xf32>
    "func.return"(%0) : (tensor<10x30xf32>) -> ()
}

We will convert it to the corresponding MHLO dialect:

func @test_softmax(%arg0: tensor<10x30xf32>) -> tensor<10x30xf32> {
    %0 = mhlo.constant dense<0.000000e+00> : tensor<f32>
    %1 = mhlo.constant dense<0xFF800000> : tensor<f32>
    %2 = mhlo.reduce(%arg0 init: %1) applies mhlo.maximum across dimensions = [1] : (tensor<10x30xf32>, tensor<f32>) -> tensor<10xf32>
    %3 = "mhlo.broadcast_in_dim"(%2) {broadcast_dimensions = dense<0> : tensor<1xi64>} : (tensor<10xf32>) -> tensor<10x30xf32>
    %4 = mhlo.subtract %arg0, %3 : tensor<10x30xf32>
    %5 = mhlo.exponential %4 : tensor<10x30xf32>
    %6 = mhlo.reduce(%5 init: %0) applies mhlo.add across dimensions = [1] : (tensor<10x30xf32>, tensor<f32>) -> tensor<10xf32>
    %7 = "mhlo.broadcast_in_dim"(%6) {broadcast_dimensions = dense<0> : tensor<1xi64>} : (tensor<10xf32>) -> tensor<10x30xf32>
    %8 = mhlo.divide %5, %7 : tensor<10x30xf32>
    return %8 : tensor<10x30xf32>
}

Performance Implications

  • The conversion is similar to the conversion from ONNX-MLIR to TOSA, so the overhead of conversion will be negligible.
  • There would be end-to-end tests. Besides lit tests provided by LLVM, we will also add numerical tests to ensure the validity of the conversion.

Dependencies

  • Dependencies: this pass will depend on mlir-hlo, which contains the MHLO dialect.

Compatibility

  • The ONNX operators are versioned. There are two ways to solve this issue. We can directly support operators with different versions by adding different conversions. Otherwise, we can update ONNX operators to the latest version so we only need to do conversion for the operators of the latest version.
  • Even if operators can't be directly represented by the MHLO dialect, we can make sure all ONNX operators can be supported with mhlo.custom_call.

Status

We have already converted a simple MLP model from ONNX dialect to MHLO dialect. And numerical correctness is also validated. Our team will continue to work on converting more operations to MHLO dialect and welcome everyone interested in being part of this effort as well.

@caoimhinuibrian
Copy link
Collaborator

@Connor-XY sounds like a great idea ... looking forward to a PR

@yaochengji
Copy link
Member

Since mhlo dialect is more like a fine-grained dialect. And for some backends, they want coarse-grained operations instead of fine-grained ones. Considering this, maybe we need to add a pass to decompose onnx operations before the lowering pass, where users could choose which operations to be decomposed. While inside the lowering pass, we will leave the coarse-grained operations alone and they could be handled in the following passes.

@AlexandreEichenberger
Copy link
Collaborator

@yaochengji which operations are you referring to? There are high level ops in ONNX that can be expressed in terms of other ONNX Ops, typically RNNs but there are others as well. In general, you are correct that many codegen prefer high level ops, but maybe not MHLO. So are you talking about an ONNX to ONNX lowering, which may be advantageous for MHLO, or ONNX to something else?

@yaochengji
Copy link
Member

yaochengji commented Jun 29, 2022

@AlexandreEichenberger I'm referring to all the operations that exist in ONNX but doesn't exist in MHLO, e.g. Softmax, Pooling.

And I'm talking about both ONNX to ONNX lowering and ONNX to MHLO custom call, they could be separated passes. Probably, these passes could be constructed as below:

  1. Optionally decompose coarse-grained ONNX operations to fine-grained ONNX-operations
  2. convert fine-grained ONNX operations to MHLO operations
  3. convert the remaining coarse-grained ONNX operation to mhlo custom call

After the 3 passes, no ONNX dialect will exist in the IR and all those coarse-grained operations the users want to keep are converted to mhlo custom_call op.

@tungld
Copy link
Collaborator

tungld commented Jun 29, 2022

@yaochengji

Optionally decompose coarse-grained ONNX operations to fine-grained ONNX-operations

FYI, we do the same thing in onnx-mlir, some ONNX operations will be decomposed into other ONNX operations: https://github.com/onnx/onnx-mlir/blob/main/src/Transform/ONNX/Decompose.td. Currenty, this decomposition is done at the very beginning in onnx-mlir. If it is not enough, you can have your own decomposition for the purpose of MHLO.

@AlexandreEichenberger
Copy link
Collaborator

@yaochengji

Your approach appears sound. Using ONNX-ONNX lowering may reduce the amount of work you need to do as you can then rely on the ONNX-MHLO to do the rest. That ONNX-ONNX lowering might be useful to other folks as well. Remaining ONNX to MHLO custom calls is a tradeoff that you will have to evaluate as you see fit.

Note also that in our lowering of ONNX to accelerator, we use additional lowering dialects that helps us do optimizations. As long as these helper dialects remain contained to your lowering, that might also be something that you can look into if it save you work/improve final quality of results.

We have found that many lowering can be done via tablegen, which was helpful to us to reduce the amount of code writing.

How is MHLO fairing with dynamic shapes? Is that something you support, and to what extent? At this time, we support in ONNX most dynamic shape but expect rank to be known as it really complicate our code gen schemes for many ops.

@yaochengji
Copy link
Member

yaochengji commented Jun 29, 2022

@AlexandreEichenberger

Note also that in our lowering of ONNX to accelerator, we use additional lowering dialects that helps us do optimizations.

Yes, we also lower MHLO, it reuses the mlir core dialects as much as possible and the lowering path is similar to what this post shows.

How is MHLO fairing with dynamic shapes? Is that something you support, and to what extent?

MHLO could support dynamic shapes, together with Shape and Tensor dialects. And the dynamic shape optimization is an important feature in process in Bytedance AML team. I posted our original design on mlir forum several weeks ago. And I will share the updated version with the community later.

@Connor-XY
Copy link
Contributor Author

The PR is here: #1519. We could currently support the lowering of the MNIST model.

@yaochengji
Copy link
Member

BTW, AML team also proposed an similar RFC to torch-mlir repo. And there's a meeting today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants