Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix vmlal.s16 code generation for int8 x int8 -> int32 #2748

Merged
merged 1 commit into from
Mar 9, 2019

Conversation

ajtulloch
Copy link
Contributor

@ajtulloch ajtulloch commented Mar 8, 2019

The IntrinInjecter::SwapBroadcastCast function was limited to cases where the result bitwidth was exactly 2x the input bitwidth. This fails for cases of relevance such as a vectorized int8 x int8 -> int32 GEMM. This helps improve the code-generation somewhat by outputting VMLAL.S16 instructions instead of a MOVL + VMLA.S32`.

@@ -50,7 +50,23 @@ class IntrinInjecter : public IRMutator {
// on ARM.
if (const Broadcast* bcast = e.as<Broadcast>()) {
if (const Cast* cast = bcast->value.as<Cast>()) {
if (cast->type.bits() == cast->value.type().bits() * 2) {
auto shouldSwap = [&]() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: code style, local variable need to be should_swap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, my bad.

Copy link
Member

@tqchen tqchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm modulo nit

A[k, n].astype("int32") * B[k, n].astype("int32"), axis=[k]), name='C')
s = tvm.create_schedule(C.op)
s[C].vectorize(s[C].op.axis[0])
print(tvm.lower(s, [A, B, C], simple_mode=True))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should remove this print

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, my bad.

@ajtulloch ajtulloch force-pushed the widening-int8-int32-arm-codegen branch from 98b54d4 to 17112b5 Compare March 8, 2019 21:37
@ajtulloch
Copy link
Contributor Author

Thanks for the comments, I've updated the patch with the review comments.

@tqchen
Copy link
Member

tqchen commented Mar 9, 2019

@tqchen tqchen merged commit a7e35fc into apache:master Mar 9, 2019
@tqchen
Copy link
Member

tqchen commented Mar 9, 2019

Thanks @ajtulloch @FrozenGene , this is now merged!

wweic pushed a commit to neo-ai/tvm that referenced this pull request Mar 9, 2019
@ajtulloch ajtulloch deleted the widening-int8-int32-arm-codegen branch March 11, 2019 19:55
@ajtulloch
Copy link
Contributor Author

Thanks for merging @tqchen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants