-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[topi] add ARM v8.2 udot (uint8) support #3978
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few minors, other LGTM
Before merging, it would be good if we can try 2 more optimizations
|
@anijain2305 @zhiics please review again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Minor comments.
It will be good if we can share the performance speedup results. |
@tqchen could you check the ci instance? it shows "no space left" |
@anijain2305 The avg speedup is ~2.1x compared to fp32 |
ci issue fixed |
Thanks @anijain2305 @zhiics @tqchen |
* [topi] add ARM v8.2 udot (uint8) support * fix test case * fix common conv2d schedule * add back fp32_time in test * fix lint * fix doc, add support for int32_lanes=4, signed int * fix lint * add ic_bn % 4 checker in schedule
* [topi] add ARM v8.2 udot (uint8) support * fix test case * fix common conv2d schedule * add back fp32_time in test * fix lint * fix doc, add support for int32_lanes=4, signed int * fix lint * add ic_bn % 4 checker in schedule
* master: Fix split's last factor issue (apache#4044) [COMMUNITY] ajtulloch -> committer (apache#4043) [TOPI]Add op argwhere (apache#3994) [topi] add ARM v8.2 udot (uint8) support (apache#3978) [COMMUNITY] anijain2305 -> reviewer (apache#4036) [QNN] Renaming dense operator. (apache#4033) [Relay][Compile_engine] Int64 shape handling for outputs. (apache#4031) Add dmlc-core to the list of installed header directories. (apache#4035) [ARITH] migrate indexdiv/mod to floordiv/mod (apache#4008) [Relay] Move prelude to text format (apache#3939) make tvm compilable by gcc 4.9.2 (apache#4032) [AUTOTVM][DOCS] Add a link to the defining network description of auto-tuning tutorial (apache#4023) [ARITH] cleanup the indexmod/div on python side (apache#4028) [Fix] Add more pad_mode support for onnx converter (apache#4029) Add parser support for ReLU tflite operator (apache#4022) Additional MXNet Convolution and Deconvolution tests (apache#4026) docs: minor spelling tweaks (apache#4027)
Add uint8 intrinsic for ARM. Currently it is
udot.v2i32.v8i8
which may have too small lanes. will add more later@anijain2305 @zhiics @vinx13 @ZihengJiang