[RFC] Improve quantized convolution performance for armv8 architectures #5754

giuseros · 2020-06-09T14:26:00Z

RFC

This PR is based on the following RFC: https://discuss.tvm.ai/t/rfc-improve-quantized-convolution-performance-for-armv8-architectures/6920

High level description of the submission

The main algorithm lives in:

topi/python/topi/arm_cpu/conv2d_gemm.py(schedule)
topi/python/topi/arm_cpu/tensor_intrin.py (assembly+intrinsic)
topi/python/topi/arm_cpu/conv2d_int8.py(driver)

The Weight transform touches different files (since it is computed at compile time):

topi/python/topi/arm_cpu/conv2d_alter_op.py
python/tvm/relay/op/nn/_nn.py
python/tvm/relay/op/nn/nn.py
src/relay/op/nn/convolution.h (relay node definition)
src/relay/op/nn/convolution.cc(relay node definition)
include/tvm/relay/attrs/nn.h (relay node definition)

Strategies (mapping relay-node -> compute+schedules) are defined here:

python/tvm/relay/op/strategy/arm_cpu.py
python/tvm/relay/op/strategy/generic.py

giuseros · 2020-06-09T14:26:36Z

CC: @u99127 @anijain2305

FrozenGene · 2020-06-11T02:55:50Z

Thanks for the great work! I have some quick question:

Have you tested various models arm cpu? (like A53, A72, A55, A75 and so on). According to fb qnnpack blog, it is not always could get best performance using umul / uadalp compared with smlal instruction (used by now). (https://engineering.fb.com/ml-applications/qnnpack/). So just change legalization and give it up smlal instruction in aarch64 maybe doesn't make sense to me. One proof: our coming feature Ansor (auto scheduler) doesn't support tensorize (at least till now), however, it could get nice performance using smlal instruction and beyond TFLite 1.2X on mobilenet v2 quantized model (cortex-a53) (https://discuss.tvm.ai/t/tflite-and-tvm-comparison-for-quantized-models/6577/4). I mean here:

@qnn_conv2d_legalize.register('arm_cpu')
def _qnn_conv2d_legalize_arm_cpu(attrs, inputs, types):
    # ARM prefers the dtypes to be same.
    if is_aarch64_arm():
        return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d)
    return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.qnn.op.conv2d)

It disables us using SMLAL instruction.

I suggest we keep two schedules (tensorize and default spatial pack). Not just check aarch64 and only use tensorize template. I mean here:

is_aarch64 = "aarch64" in str(isa.target)
if is_aarch64 and data.dtype in ["int8", "uint8"]:
    strategy.add_implementation(
        wrap_compute_conv2d(topi.arm_cpu.compute_conv2d_NHWC_quantized),
        wrap_topi_schedule(topi.arm_cpu.schedule_conv2d_NHWC_quantized),
        name="compute_conv2d_NHWC_quantized.arm_cpu")
else:
    strategy.add_implementation(
        wrap_compute_conv2d(topi.arm_cpu.conv2d_nhwc_spatial_pack),
        wrap_topi_schedule(topi.arm_cpu.schedule_conv2d_nhwc_spatial_pack),
        name="conv2d_nhwc_spatial_pack.arm_cpu")

This is our design purpose of strategy. I suspect there is some workload our spatial pack could perform better. This situation is the same as Winograd, we could perform winograd and default template and choose better.

FrozenGene · 2020-06-11T02:59:06Z

cc @ajtulloch

giuseros · 2020-06-11T09:13:15Z

Hi @FrozenGene ,
Thanks a lot for your comments. I will address general replies here, and code comments in a separate reply.

I indeed read your discuss post, but I thought the work was orthogonal to this one. My main goal here is to have a fast general convolution algorithm for Armv8-A. Your post talks about mobilenet v2, and raspi 3.
In mobilenet v2 there are no deep convolutional layers, mostly depthwise convolutions and 1x1 convolutions. With shallow convolutions the problem becomes memory bound, and the differences among the algorithms become less evident. That is also why I picked inception_v3, where there are 1x1, 3x3, 5x5, 1x7, 7x1 convolutions.
Raspi 3 comes with a 32bit operative system, which means using Armv7-A. The problem with Armv7-A is that instead of having 32 registers (as in Armv8-A) you have only 16, so the optimization space is reduced. Also, I think (but I am not 100% sure) that the guys in TFlite do not extremely optimize for Armv7-A. Indeed, on Armv7-A @anijain2305 shows (in the same post you mention) a 0.80 ratio for tflite/tvm (while I see a 0.60/0.30 ratio for multi/single thread scenarios, respectively ).
The Qnnpack post you mention explicitly says that: "the microkernel that leverages the dual issue capability proves to be 15 percent to 20 percent faster for a sufficiently large channel count (K > 64)"
The way they do convolution (and gemm) in Qnnpack for Armv8-A is by using a combination of smlal and smlal2 (plus a combination of usubl and usubl2) while conv2d_nhwc_spatial_pack only uses smal. It is true that in Armv7-A they only use vsmal (and vusubl). So, I wonder if the autoscheduler (which I am not familiar with) is able to generate such combinations for armv8.
I did not try other CPUs other than the Cortex-A76. The point is that I am not using anything specific for that CPU, but only specific to the Armv8-A ISA.
I agree that in case of smaller convolutions (or depthwise convolutions) there are simpler algorithms that work as well (or even faster). I also agree in stacking multiple strategies and let TVM select the best.

I will reply on the code in the following comment.

FrozenGene · 2020-06-11T09:52:17Z

@giuseros Glad to see we have the same thought we should let autotvm select the best.

Autoscheduler reley on the legalization pass to generate smlal inst(After auto scheduler is released, let us make it better together.) One information I missed before, my testing rasp 3b+ os is Ubuntu 64 bits, not 32 bits, so the target is aarch64 too.

I mention auto scheduler is not to question your work (your work is very great!) and is orthogonal as you said. I just mention that we use smlal inst on A53 (aarch64 os mentioned before) we could get nice performance too. So I want to know on low-end arm cpu, whether smlal is better than this (as fb qnnpack blog said: The default microkernel uses the fewest possible instructions and thus delivers the best performance on low-end cores, which can execute only one NEON instruction per cycle.).

So I wish we could test several arm cpus to proove our this work work well all aarch64 cores (low-end core, high-end core).

Secondly, I suggest let us test mobilenet v2 too. To see that whether our pr could work well across various models.

Your work is very great but I wish let us use more data and result to make it more convincing.

giuseros · 2020-06-11T09:57:02Z

Hi @FrozenGene ,
About the code changes.

It will be hard to do this. The point is that the legalization is done in Relay before picking the strategy (thus, it is unaware of the strategy picked). To keep both legalizations I need somehow to pass information from the strategy (e.g., the name of the algorithm, or something like that). Are you aware of any other ways I can do it?
Note that I am targeting NHWC layout. I wasn't able to even compile with conv2d_nhwc_spatial_pack for uint8 (it just hangs, at least when I tried it without auto-tuning on Armv8-A). I gathered from various discussions that NHWC support for arm targets is incomplete at the moment. So for now we might simply agree to leave this as default for NHWC and conv2d_nchw_spatial_pack as default for NCHW and mirror that in the legalization step which might look like:

    if is_aarch64_arm() and attrs.data_layout == "NHWC":
        return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d)
    return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.qnn.op.conv2d)```

In a subsequent work, we can find a way to pick the correct legalization after we picked the strategy.

giuseros · 2020-06-11T10:44:51Z

Hi @FrozenGene
Just to clarify: I am enjoying the discussion, and since the optimization space is wild, I agree that is worth valuating different approaches.

About the Raspberry+mobilenet v2, good to know you are working on Armv8-A (sorry to have assumed otherwise). However, there is still the point that mobilenet uses shallow convolutions, while I am addressing deeper and more generic convolutions.
Are you saying that, as things stand now in TVM, the conv2d_nhwc_spatial_pack schedule might be faster than the gemm approach on smaller CPUs? Unfortunately, for now I don't think they can be added together because of what I said above about the legalization step. Do you know any work-around to that? Maybe I can legalize only for specific devices (e.g., only for Cortex-A55)?
Finally, as things stand now we might get this PR in, and later do a more detailed comparison across different networks + CPUs

FrozenGene · 2020-06-11T11:15:35Z

It will be hard to do this. The point is that the legalization is done in Relay before picking the strategy (thus, it is unaware of the strategy picked). To keep both legalizations I need somehow to pass information from the strategy (e.g., the name of the algorithm, or something like that). Are you aware of any other ways I can do it?

@giuseros I think add the algorithm name could be one way to handle it. For example, we could add it in the attr and query it in the legalization pass, then we could throw it safely.

Note that I am targeting NHWC layout. I wasn't able to even compile with conv2d_nhwc_spatial_pack for uint8 (it just hangs, at least when I tried it without auto-tuning on Armv8-A). I gathered from various discussions that NHWC support for arm targets is incomplete at the moment. So for now we might simply agree to leave this as default for NHWC and conv2d_nchw_spatial_pack as default for NCHW and mirror that in the legalization step which might look like:
if is_aarch64_arm() and attrs.data_layout == "NHWC": return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d) return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.qnn.op.conv2d)

Yes, our NHWC schedule on arm cpu doesn't be complete. After our careful testing, NHWC is also perform better than NCHW on arm cpu using Ansor (aka auto scheduler) too. So this prompts us we could improve our AutoTVM NHWC schedule on arm cpu too. As the result I show in the post, we use auto schedule is to leverage NHWC layout and smlal instruction, I prefer we could leverage attr[algorithm_name] mentioned previous to keep smlal instruction. After auto scheduler released (we are working hard to do it, we wish after 2 weeks we could bring it in), we could see how to improve it (like generating smlal and smlal2 or your tensorize instruction), they are orthogonal but they share the same legalize pass.

One background of auto scheduler: In auto scheduler, we only need tvm.compute, then we could generate schedule automatically, so we could try NHWC / NCHW easily. So there is no spatial pack schedule template concept in the auto scheduler world in fact.

About the Raspberry+mobilenet v2, good to know you are working on Armv8-A (sorry to have assumed otherwise). However, there is still the point that mobilenet uses shallow convolutions, while I am addressing deeper and more generic convolutions.

So we should keep both algorithm better, right?

Are you saying that, as things stand now in TVM, the conv2d_nhwc_spatial_pack schedule might be faster than the gemm approach on smaller CPUs? Unfortunately, for now I don't think they can be added together because of what I said above about the legalization step. Do you know any work-around to that? Maybe I can legalize only for specific devices (e.g., only for Cortex-A55)?

I think add algorithm name mentioned before maybe could help to solve it.

Finally, as things stand now we might get this PR in, and later do a more detailed comparison across different networks + CPUs

Ok. I buy it in. After legalization pass we discussed is solved, I am glad to do code review carefully and handle this pr.

giuseros · 2020-06-11T11:41:49Z

Hi @FrozenGene ,
The idea of adding the algorithm name to the attributes would work if the legalization step was run after we pick the strategy. It is instead run before, so it is unaware of the strategy picked.

Maybe we could add a new pass that runs based on the strategy? Or we can hack in _alter_conv2d_layout?

FrozenGene · 2020-06-11T12:07:56Z

Hi @FrozenGene ,

The idea of adding the algorithm name to the attributes would work if the legalization step was run after we pick the strategy. It is instead run before, so it is unaware of the strategy picked.

Maybe we could add a new pass that runs based on the strategy? Or we can hack in _alter_conv2d_layout?

@giuseros what you mean run based on the strategy?

in alter_op_layout, we could extract workload[0] to get strategy, however could you help me to double check whether our autotvm tuning will use alter_op_layout pass?(i.e. O3), I have forgot a little bit. If so, maybe we could change the dtype here according to strategy. cc @anijain2305 any better idea too?

giuseros · 2020-06-11T12:19:28Z

So I mean to add a convert_data_type pass that is similar to alter_op_layout but converts datatype (and we can do something like if topi_impl == 'spatial_nhwc' converts to int16.

This doesn't seem possible directly in the alter_op_layout because only the shapes are passed to that function, but I will play with it another bit

To reply to your question, yes, the alter_op_layout pass is executed when the autotuner runs

FrozenGene · 2020-06-11T13:31:20Z

So I mean to add a convert_data_type pass that is similar to alter_op_layout but converts datatype (and we can do something like if topi_impl == 'spatial_nhwc' converts to int16.

I think this is one interesting pass. Like we have _alter_op_layout and will have different logic for different strategy , then we have _alter_op_dtype pass and will have different logic for different strategy.

However, this pass seems do most of the same thing in legalize (change dtype). So our legalization pass should complete this work according to different strategy.

FrozenGene · 2020-06-11T13:57:33Z

@giuseros I suddenly think of auto scheduler will have one environment value. So the change of legalization won't affect auto scheduler. We could check the value of this environment value for auto scheduler and use smlal. However, this problem I think we still should resolve that we should have the ability for allowing different strategies have different logic.

giuseros · 2020-06-11T14:12:27Z

Hi @FrozenGene ,
I agree that different strategies should be available to the auto-tuner. See if the solution proposed is good enough for you (at least as a temporary work-around). For Armv7-A or NCHW, nothing changes, we follow exactly the previous path.

For Armv8-A and NHWC we don't convert during the legalization step, but during the _alter_conv2d_layout pass. The only difference now is that the offset contribution will be added after the convolution instead than before.

I agree that a better solution, where the legalization changes depending on the strategy, would be better. However, I don't think the legalization step has got enough information to know the strategy (for now).

What do you think?

FrozenGene · 2020-06-11T14:21:45Z

Hi @FrozenGene ,
I agree that different strategies should be available to the auto-tuner. See if the solution proposed is good enough for you (at least as a temporary work-around). For Armv7-A or NCHW, nothing changes, we follow exactly the previous path.

For Armv8-A and NHWC we don't convert during the legalization step, but during the _alter_conv2d_layout pass. The only difference now is that the offset contribution will be added after the convolution instead than before.

I agree that a better solution, where the legalization changes depending on the strategy, would be better. However, I don't think the legalization step has got enough information to know the strategy (for now).

What do you think?

I think it is ok.

giuseros · 2020-06-11T16:45:07Z

Hi @FrozenGene ,
I gave it another go, but switching legalization on the strategy seems very hard (since we would need the auto-tuner to pick the best data-type for us).

So for now, we have to content with the _alter_conv2d_layout workaround and try to think a bit more on how we can infer the strategy during legalization

FrozenGene · 2020-06-12T14:00:56Z

Hi @FrozenGene ,
I gave it another go, but switching legalization on the strategy seems very hard (since we would need the auto-tuner to pick the best data-type for us).

So for now, we have to content with the _alter_conv2d_layout workaround and try to think a bit more on how we can infer the strategy during legalization

I think I could accept this way.

python/tvm/relay/op/nn/_nn.py

python/tvm/relay/op/nn/nn.py

topi/python/topi/arm_cpu/conv2d_alter_op.py

anijain2305 · 2020-06-12T18:26:44Z

@FrozenGene @giuseros If QNN Legalization is causing issues, we can remove QNN legalization for ARM CPUs altogether and move the logic to Alter Op layout. Alter op layout might become more complicated (like we might have to handle uint8 x int8 input and kernel dtype in alter op layout now). Just an idea if consolidating things at one place makes life easier.

giuseros · 2020-06-15T09:48:16Z

@anijain2305 , thanks for the review! About getting rid of the legalization, I would not do that for now. It is in my backlog to go back to this issue and try to retrieve the strategy from the legalization pass. This should give us more optimization options. If that turns out to be not possible, then yes, I would remove the pass and do everything in the alter_layout pass.

anijain2305 · 2020-06-16T17:24:57Z

@FrozenGene Can you please review when you get time?

FrozenGene · 2020-06-17T14:24:07Z

@FrozenGene Can you please review when you get time?

Yep. I could review it tomorrow.

python/tvm/relay/qnn/op/legalizations.py

src/relay/op/nn/convolution.h

topi/python/topi/arm_cpu/conv2d_gemm.py

Signed-off-by: Giuseppe Rossini <[email protected]> Change-Id: I3a3d29f5332dd9b3354e8e0dfb24677a521f9c8f

Change-Id: I33853279e39c849ae1b555a9c91d7557985a0a35

Change-Id: Ieee22f032e595dabfc1616ab33466fcbf8d94365

Change-Id: I435d4d7bca7500db99547f4401fdc0d0995a1ff4

Change-Id: I2fc1ad8453e9020072ab967c849df5390c2967b5

Change-Id: I0a67a49a7849f52ef7d57b9292ce9125bbb7cb2c

Change-Id: I91b67fabd475e90a9b75f2dd5ecfee851265e0bb

Change-Id: I9a03040a8c40a6cd2658ed14c3751e05a8e19f2b

Change-Id: Ice34101e358e3ce8ebfb12c58f73e910ba5de8e8

Change-Id: Id9273688b2620e1ea849ab01b4c46af8fbf37fd0

Change-Id: Ia1755a0af7b6d159072d9f0c93c932c481101e48

Change-Id: I3333186bbc2fe4054b58ce15d910e3be7b315482

Change-Id: Ifb5f1f33af7512fe67c6b049b20a42a0bb2d26c9

Change-Id: I25ccc844d9cee23766096e1daddb6180abc413a6

giuseros · 2020-06-19T12:38:09Z

Hi @FrozenGene ,
Thanks for the review!
I applied your changes, but I get a (seemingly) unrelated test failure.

Could you double check please, and let me know if this has got anything to do with my changes?

Thanks

giuseros · 2020-06-19T15:00:31Z

It actually seems related to: #5827

giuseros · 2020-06-22T15:15:12Z

Hi @FrozenGene , @anijain2305 ,
Any update on this review?
Also, is there a way to retrigger the tests? Or should I contact someone in particular?

Thanks

python/tvm/relay/qnn/op/legalizations.py

FrozenGene · 2020-06-22T16:00:36Z

Hi @FrozenGene , @anijain2305 ,
Any update on this review?
Also, is there a way to retrigger the tests? Or should I contact someone in particular?

Thanks

for the CI, maybe you could force trigger it or you could comment it (and contact @jroesch ) and explain the reason?

anijain2305 · 2020-06-22T16:13:56Z

I push an empty commit to retrigger the CI - https://coderwall.com/p/vkdekq/git-commit-allow-empty

Change-Id: Id37706fb7cf77a87a3cc817ecf8046297d9ca95a

FrozenGene · 2020-06-23T03:45:11Z

@anijain2305 could you have a look another round?

anijain2305

LGTM

FrozenGene · 2020-06-23T05:20:24Z

Thanks @giuseros @anijain2305 MERGED NOW.

…es (apache#5754) * Improve quantized conv2d performance for armv8 Signed-off-by: Giuseppe Rossini <[email protected]> Change-Id: I3a3d29f5332dd9b3354e8e0dfb24677a521f9c8f * Add ASF header to conv2d_gemm.py Change-Id: I33853279e39c849ae1b555a9c91d7557985a0a35 * Run clang-format-10 on c++ files Change-Id: Ieee22f032e595dabfc1616ab33466fcbf8d94365 * Fix pylint errors/warnings Change-Id: I435d4d7bca7500db99547f4401fdc0d0995a1ff4 * Fix pylint errors/warnings in topi Change-Id: I2fc1ad8453e9020072ab967c849df5390c2967b5 * Fix legalizations tests for aarch64 Change-Id: I0a67a49a7849f52ef7d57b9292ce9125bbb7cb2c * Reintroduce conv2d_nhwc_spatial_pack.arm_cpu and int16 cast Change-Id: I91b67fabd475e90a9b75f2dd5ecfee851265e0bb * Switch type of legalization depending on the strategy used Change-Id: I9a03040a8c40a6cd2658ed14c3751e05a8e19f2b * Revert last commit Change-Id: Ice34101e358e3ce8ebfb12c58f73e910ba5de8e8 * Fix the auto-tuner by registering the correct schedules Change-Id: Id9273688b2620e1ea849ab01b4c46af8fbf37fd0 * Address review comments Change-Id: Ia1755a0af7b6d159072d9f0c93c932c481101e48 * Improve usability and readability of conv2d_gemm_weight_transform Change-Id: I3333186bbc2fe4054b58ce15d910e3be7b315482 * Change variable name to weight in Conv2DGemmWeightTransformRel Change-Id: Ifb5f1f33af7512fe67c6b049b20a42a0bb2d26c9 * Fix clang-10 linting errors Change-Id: I25ccc844d9cee23766096e1daddb6180abc413a6 * Trigger tests Change-Id: Id37706fb7cf77a87a3cc817ecf8046297d9ca95a

tqchen assigned FrozenGene and vinx13 and unassigned FrozenGene Jun 9, 2020

anijain2305 reviewed Jun 12, 2020

View reviewed changes

python/tvm/relay/op/nn/_nn.py Outdated Show resolved Hide resolved

python/tvm/relay/op/nn/nn.py Show resolved Hide resolved

python/tvm/relay/op/nn/nn.py Outdated Show resolved Hide resolved

topi/python/topi/arm_cpu/conv2d_alter_op.py Outdated Show resolved Hide resolved

giuseros force-pushed the conv2d_quantized_improvements branch from b6dc7c5 to c151f90 Compare June 16, 2020 09:26

FrozenGene requested changes Jun 19, 2020

View reviewed changes

Giuseppe Rossini added 2 commits June 19, 2020 11:50

Improve quantized conv2d performance for armv8

06740db

Signed-off-by: Giuseppe Rossini <[email protected]> Change-Id: I3a3d29f5332dd9b3354e8e0dfb24677a521f9c8f

Add ASF header to conv2d_gemm.py

ddc136c

Change-Id: I33853279e39c849ae1b555a9c91d7557985a0a35

Giuseppe Rossini added 10 commits June 19, 2020 11:50

Run clang-format-10 on c++ files

ead3a40

Change-Id: Ieee22f032e595dabfc1616ab33466fcbf8d94365

Fix pylint errors/warnings

1a7a50c

Change-Id: I435d4d7bca7500db99547f4401fdc0d0995a1ff4

Fix pylint errors/warnings in topi

6576359

Change-Id: I2fc1ad8453e9020072ab967c849df5390c2967b5

Fix legalizations tests for aarch64

637516a

Change-Id: I0a67a49a7849f52ef7d57b9292ce9125bbb7cb2c

Reintroduce conv2d_nhwc_spatial_pack.arm_cpu and int16 cast

1d1073f

Change-Id: I91b67fabd475e90a9b75f2dd5ecfee851265e0bb

Switch type of legalization depending on the strategy used

fc0efc5

Change-Id: I9a03040a8c40a6cd2658ed14c3751e05a8e19f2b

Revert last commit

6e926db

Change-Id: Ice34101e358e3ce8ebfb12c58f73e910ba5de8e8

Fix the auto-tuner by registering the correct schedules

38d59fa

Change-Id: Id9273688b2620e1ea849ab01b4c46af8fbf37fd0

Address review comments

e5618a7

Change-Id: Ia1755a0af7b6d159072d9f0c93c932c481101e48

Improve usability and readability of conv2d_gemm_weight_transform

9057c8b

Change-Id: I3333186bbc2fe4054b58ce15d910e3be7b315482

giuseros force-pushed the conv2d_quantized_improvements branch from c151f90 to 9057c8b Compare June 19, 2020 10:51

Giuseppe Rossini added 2 commits June 19, 2020 12:11

Change variable name to weight in Conv2DGemmWeightTransformRel

f430302

Change-Id: Ifb5f1f33af7512fe67c6b049b20a42a0bb2d26c9

Fix clang-10 linting errors

5a21a29

Change-Id: I25ccc844d9cee23766096e1daddb6180abc413a6

FrozenGene reviewed Jun 22, 2020

View reviewed changes

python/tvm/relay/qnn/op/legalizations.py Show resolved Hide resolved

Trigger tests

ddc61d2

Change-Id: Id37706fb7cf77a87a3cc817ecf8046297d9ca95a

FrozenGene approved these changes Jun 23, 2020

View reviewed changes

anijain2305 approved these changes Jun 23, 2020

View reviewed changes

FrozenGene merged commit b94e8b7 into apache:master Jun 23, 2020

giuseros deleted the conv2d_quantized_improvements branch June 23, 2020 09:45

ZihengJiang mentioned this pull request Sep 25, 2020

TVM v0.7 Release Note Candidate #6486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Improve quantized convolution performance for armv8 architectures #5754

[RFC] Improve quantized convolution performance for armv8 architectures #5754

giuseros commented Jun 9, 2020

giuseros commented Jun 9, 2020

FrozenGene commented Jun 11, 2020 •

edited

Loading

FrozenGene commented Jun 11, 2020

giuseros commented Jun 11, 2020 •

edited

Loading

FrozenGene commented Jun 11, 2020 •

edited

Loading

giuseros commented Jun 11, 2020

giuseros commented Jun 11, 2020

FrozenGene commented Jun 11, 2020

giuseros commented Jun 11, 2020

FrozenGene commented Jun 11, 2020

giuseros commented Jun 11, 2020

FrozenGene commented Jun 11, 2020

FrozenGene commented Jun 11, 2020

giuseros commented Jun 11, 2020

FrozenGene commented Jun 11, 2020

giuseros commented Jun 11, 2020

FrozenGene commented Jun 12, 2020

anijain2305 commented Jun 12, 2020 •

edited

Loading

giuseros commented Jun 15, 2020

anijain2305 commented Jun 16, 2020 •

edited

Loading

FrozenGene commented Jun 17, 2020

giuseros commented Jun 19, 2020

giuseros commented Jun 19, 2020

giuseros commented Jun 22, 2020

FrozenGene commented Jun 22, 2020

anijain2305 commented Jun 22, 2020

FrozenGene commented Jun 23, 2020

anijain2305 left a comment

FrozenGene commented Jun 23, 2020

[RFC] Improve quantized convolution performance for armv8 architectures #5754

[RFC] Improve quantized convolution performance for armv8 architectures #5754

Conversation

giuseros commented Jun 9, 2020

RFC

High level description of the submission

giuseros commented Jun 9, 2020

FrozenGene commented Jun 11, 2020 • edited Loading

FrozenGene commented Jun 11, 2020

giuseros commented Jun 11, 2020 • edited Loading

FrozenGene commented Jun 11, 2020 • edited Loading

giuseros commented Jun 11, 2020

giuseros commented Jun 11, 2020

FrozenGene commented Jun 11, 2020

giuseros commented Jun 11, 2020

FrozenGene commented Jun 11, 2020

giuseros commented Jun 11, 2020

FrozenGene commented Jun 11, 2020

FrozenGene commented Jun 11, 2020

giuseros commented Jun 11, 2020

FrozenGene commented Jun 11, 2020

giuseros commented Jun 11, 2020

FrozenGene commented Jun 12, 2020

anijain2305 commented Jun 12, 2020 • edited Loading

giuseros commented Jun 15, 2020

anijain2305 commented Jun 16, 2020 • edited Loading

FrozenGene commented Jun 17, 2020

giuseros commented Jun 19, 2020

giuseros commented Jun 19, 2020

giuseros commented Jun 22, 2020

FrozenGene commented Jun 22, 2020

anijain2305 commented Jun 22, 2020

FrozenGene commented Jun 23, 2020

anijain2305 left a comment

Choose a reason for hiding this comment

FrozenGene commented Jun 23, 2020

FrozenGene commented Jun 11, 2020 •

edited

Loading

giuseros commented Jun 11, 2020 •

edited

Loading

FrozenGene commented Jun 11, 2020 •

edited

Loading

anijain2305 commented Jun 12, 2020 •

edited

Loading

anijain2305 commented Jun 16, 2020 •

edited

Loading