Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Torch, QNN] Add support for quantized models via QNN #4977

Merged
merged 14 commits into from
Mar 4, 2020

Conversation

masahi
Copy link
Member

@masahi masahi commented Mar 2, 2020

This adds support for converting quantized models from torch to Relay via QNN. I believe this makes a good case for introducing Torch frontend to TVM, since the quantized Torch models -> ONNX path is not supported at the moment (or any time soon). Also torch cannot run quantized models on GPU yet, but we can run quantized models on CUDA very fast.

So far I've tested on quantized models from torchvision available at https://github.com/pytorch/vision/tree/master/torchvision/models/quantization (shufflenet is removed since it has issues with tracing), and a custom mobilenet v3 model.

Per channel quantization is fully supported, and it is a big win on mobilenet v3 (more than 10 point accuracy on imagenet). I think even TFLite or MXNet + QNN doesn't support per channel quantization end to end.

The PR contains

  • The converter to QNN
  • Tests for individual quantized modules
  • Tests on pretrained imagenet models with dummy calibration

Verifying the result is tricky due to difference in numerics between torch and tvm.

Quantized modules tests are run but there is no assertion on how close the raw floating point output is to torch. I dumped some statistics that we can eye ball and verify that the result is reasonable.

For imagenet tests, I use the real image and verify that two top3 label sets are identical. Here, I'm ignoring the order among top3, but for the 4 models that I added in the test, the order is actually correct. The order is ignored to make CI less fragile.

Below is the result on 1k subset of imagenet val data with full calibration + per channel quantization. I've prepared a repo to do the same evaluation here. You can also use the full imagenet dataset, if you have an access to it.

model name Torch-Top1 Torch-Top5 TVM-Top1 TVM-Top5
resnet18 69.7 90.2 70.0 90.1
resnet50 78.1 94.1 77.7 93.7
inception_v3 69.8 89.5 68.9 89.2
googlenet 71.8 90.5 71.6 90.6
mobilenet_v2 69.9 89.4 71.0 89.4
mobilenet_v3 63.5 83.6 62.7 83.2

cc @anijain2305 @vinx13 @FrozenGene @jwfromm @alexwong @ajtulloch @hlu1 @yinghai

@masahi
Copy link
Member Author

masahi commented Mar 2, 2020

Test passed!!

mean_abs_diff = np.mean(np.abs(tvm_result - pt_result))
num_identical = np.sum(tvm_result == pt_result)

print("\nModel name: %s" % model_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having some sort of assert here instead of prints would be nice. Is there some reasonable upper bound on the diffs that we could check?

Copy link
Member Author

@masahi masahi Mar 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is my pain point :) I added a sample output in a6239e1

The problem is how close the raw floating point output is to torch differs widely among different network and which of per tensor/per channel quantization is used. I could add something like max_abs_diff < 2.5, but I don't think that is too helpful. I think argmax is quite reasonable, though.

I believe there is a room for improvement in terms of making the result closer to torch. Also quantization in general is WIP on torch as well. There are some quantized ops that piggy backs to fp32 by dequantize -> fp32 op -> quantize, which defeats the purpose of doing quantization (going faster than fp32). We may need to talk to torch folks on this.

I want to say this is an initial support for torch quantization by someone who is new to the quantization space, and that by making it public I am hoping that people can improve on it :)

cc @anijain2305

Copy link
Contributor

@anijain2305 anijain2305 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM with minor comments. I think it might be helpful to add little more comments. Just a soft suggestion, feel free to ignore.

python/tvm/relay/frontend/pytorch.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/qnn_torch.py Outdated Show resolved Hide resolved
@masahi
Copy link
Member Author

masahi commented Mar 2, 2020

@anijain2305 Where do you want to see more comments? Happy to add them.

I think the most non-obvious aspect of the code is the need to do some surgery on an input graph, to make all input and output quant params visible on the op inputs. Is that part clear?

@masahi
Copy link
Member Author

masahi commented Mar 2, 2020

@anijain2305 Added more comments, I think it is enough for now.

Copy link
Contributor

@anijain2305 anijain2305 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@anijain2305
Copy link
Contributor

Will wait for a day or two for others to review.

@masahi
Copy link
Member Author

masahi commented Mar 4, 2020

@anijain2305 I think it is a good time to land it.

@anijain2305 anijain2305 merged commit fc7f078 into apache:master Mar 4, 2020
@anijain2305
Copy link
Contributor

Thanks @masahi @jwfromm This is merged!

@masahi
Copy link
Member Author

masahi commented Mar 4, 2020

Hmm for some reason, this commit became @anijain2305' commit.
https://github.com/apache/incubator-tvm/commits?author=anijain2305

Also see the commit log. It seems the way github thinks about who is the commit author has changed since today. Have we changed some settings or is this github bug? @tqchen

@greysteil
Copy link

FYI, this issue (the change in commit author) got escalated to me at GitHub. We have a bug in our squash and merge logic right now (introduced yesterday) which causes the original PR author to be removed from the list of commit co-authors in some cases. We're working on a fix now.

anijain2305 added a commit to anijain2305/tvm that referenced this pull request Mar 9, 2020
@tqchen tqchen mentioned this pull request Mar 9, 2020
5 tasks
@tqchen
Copy link
Member

tqchen commented Mar 9, 2020

This PR is among one of the PRs affected by the github squash commit bug. We take every contribution serious in the TVM community. The community has decided to use revert/redo approach to amend the contributions as per #5015

anijain2305 added a commit that referenced this pull request Mar 9, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Apr 16, 2020
* qnn support initial import

* fix upsampling num input

* imagenet tests added

* add qunatized module tests

* quantized module tests working

* imagenet test working

* fix lint

* remove top level torch import to fix ci error

* disable lint warning on outside toplevel import

* revert parse -> convert change

* add comments to qnn translation

* address comments, add sample outputs

* add more comments

* refactor bias add and requantize step
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Apr 16, 2020
zhiics pushed a commit to neo-ai/tvm that referenced this pull request Apr 17, 2020
* qnn support initial import

* fix upsampling num input

* imagenet tests added

* add qunatized module tests

* quantized module tests working

* imagenet test working

* fix lint

* remove top level torch import to fix ci error

* disable lint warning on outside toplevel import

* revert parse -> convert change

* add comments to qnn translation

* address comments, add sample outputs

* add more comments

* refactor bias add and requantize step
zhiics pushed a commit to neo-ai/tvm that referenced this pull request Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants