-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BYOC] [TPAT] [TensorRT] Add the ability to automatically generate TensorRT plugins using TVM #15526
Conversation
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
Thanks, @Civitasv for this great work! There are notable things:
|
The final goal is to support both Relay and Relax, but I agree currently it should be sent to main branch.
Okay, I will write an RFC.
I will try to separate it. |
I've already proposed an RFC. See apache/tvm-rfcs#103. |
|
@buptqq Thanks for your great work! It helps me a lot, If you are still working at this project, can you review the code? I've changed much. |
I've improved the code, the workflow should be clear if you've read the RFC. 😄 |
OK, I will review this code. |
c95d45f
to
45eeb8c
Compare
TPAT: TVM Plugin Autogen Tool
Disclaimer: This PR is based on Tencent's TPAT.
Purpose: Tencent's TPAT should be used with their TVM fork: BlazerML-tvm, but they haven't synchronized it with the upstream for a long time, also some bugs are not resolved. In light of these issues, I decide to try integrating it to TVM.
Objective: The primary goal is to offer a clear and user-friendly API.
Architecture
Currently, only TensorRT is supported.
In essence, this solution is built upon the Template Engine (Jinja) in Python to create plugin templates for vendor-specific acceleration libraries. It then utilizes TVM for optimization and code generation targeting the respective platforms. The generated code is rendered and filled into the templates. Subsequently, platform-specific build commands are invoked to build the plugins, which ultimately serve as extensions for the corresponding vendor's acceleration library.
Inputs & Outputs
The entry of TPAT for TensorRT is as follows:
This entry point accepts an ONNX file, a list of nodes to be tuned, the log database location, and the output ONNX file path where the modified model will be stored.
After generating plugins for each node, the function returns the path of the output ONNX file along with a list of paths where the plugins are saved, facilitating subsequent loading.
TODO
Reference