-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpFusion] Make the max number of fused ops configurable #6327
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Could you also share the compilation time impact of hummingbird by tuning this option? Thanks!
It's hard to say. I've only tested on my laptop, but without this change (unbounded fuse depth) I cannot compile for even a tiny dataset of size (20, 10) in 1 hour. Compiling for a real world dataset is infeasible. With configurable depth, the time it takes to run
Note that with this change, fusion is no longer a problem for compiling hummingbird models, no matter what the dataset size is. I can now compile for the dataset of size (500, 10), but (1000, 10) gives a segfault for a different reason not related to fusion. |
Thanks for the information @masahi! It is definitely helpful! |
Thanks @masahi @zhiics @junrushao1994 ! |
This adds support for configuring the number of ops in one fused function. The motivation is for a use case like
hummingbird where there can be unbounded number of operations fused into one function, depending on the dataset. This leads to huge compilation time.
More details explained in
https://discuss.tvm.ai/t/aggressive-operator-fusion-and-its-consequence-of-huge-inlined-tir-expression/7687
There is already a hard coded constant kMaxFusedOps and the condition to check the number of fused ops:
https://github.com/apache/incubator-tvm/blob/a8e44710c6472a2ee5cb66283c7f5e77f4e4ca0d/src/relay/transforms/fuse_ops.cc#L82
https://github.com/apache/incubator-tvm/blob/a8e44710c6472a2ee5cb66283c7f5e77f4e4ca0d/src/relay/transforms/fuse_ops.cc#L678
But this condition is simply not correct, since for a diamond structure there are multiple paths connecting a child node (
group_node
above) and its dom parent. Correctly calculating the number of fused nodes requires more care, and I believe my implementation is correct.Please review @tqchen @zhiics