-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Torch, QNN] Add support for quantized models via QNN #4977
Conversation
Test passed!! |
mean_abs_diff = np.mean(np.abs(tvm_result - pt_result)) | ||
num_identical = np.sum(tvm_result == pt_result) | ||
|
||
print("\nModel name: %s" % model_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having some sort of assert here instead of prints would be nice. Is there some reasonable upper bound on the diffs that we could check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is my pain point :) I added a sample output in a6239e1
The problem is how close the raw floating point output is to torch differs widely among different network and which of per tensor/per channel quantization is used. I could add something like max_abs_diff < 2.5
, but I don't think that is too helpful. I think argmax is quite reasonable, though.
I believe there is a room for improvement in terms of making the result closer to torch. Also quantization in general is WIP on torch as well. There are some quantized ops that piggy backs to fp32 by dequantize -> fp32 op -> quantize, which defeats the purpose of doing quantization (going faster than fp32). We may need to talk to torch folks on this.
I want to say this is an initial support for torch quantization by someone who is new to the quantization space, and that by making it public I am hoping that people can improve on it :)
cc @anijain2305
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM with minor comments. I think it might be helpful to add little more comments. Just a soft suggestion, feel free to ignore.
@anijain2305 Where do you want to see more comments? Happy to add them. I think the most non-obvious aspect of the code is the need to do some surgery on an input graph, to make all input and output quant params visible on the op inputs. Is that part clear? |
@anijain2305 Added more comments, I think it is enough for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Will wait for a day or two for others to review. |
@anijain2305 I think it is a good time to land it. |
Hmm for some reason, this commit became @anijain2305' commit. Also see the commit log. It seems the way github thinks about who is the commit author has changed since today. Have we changed some settings or is this github bug? @tqchen |
FYI, this issue (the change in commit author) got escalated to me at GitHub. We have a bug in our squash and merge logic right now (introduced yesterday) which causes the original PR author to be removed from the list of commit co-authors in some cases. We're working on a fix now. |
This PR is among one of the PRs affected by the github squash commit bug. We take every contribution serious in the TVM community. The community has decided to use revert/redo approach to amend the contributions as per #5015 |
* qnn support initial import * fix upsampling num input * imagenet tests added * add qunatized module tests * quantized module tests working * imagenet test working * fix lint * remove top level torch import to fix ci error * disable lint warning on outside toplevel import * revert parse -> convert change * add comments to qnn translation * address comments, add sample outputs * add more comments * refactor bias add and requantize step
…#4977)" (apache#5013) This reverts commit fc7f078.
* qnn support initial import * fix upsampling num input * imagenet tests added * add qunatized module tests * quantized module tests working * imagenet test working * fix lint * remove top level torch import to fix ci error * disable lint warning on outside toplevel import * revert parse -> convert change * add comments to qnn translation * address comments, add sample outputs * add more comments * refactor bias add and requantize step
…#4977)" (apache#5013) This reverts commit fc7f078.
This adds support for converting quantized models from torch to Relay via QNN. I believe this makes a good case for introducing Torch frontend to TVM, since the quantized Torch models -> ONNX path is not supported at the moment (or any time soon). Also torch cannot run quantized models on GPU yet, but we can run quantized models on CUDA very fast.
So far I've tested on quantized models from torchvision available at https://github.com/pytorch/vision/tree/master/torchvision/models/quantization (shufflenet is removed since it has issues with tracing), and a custom mobilenet v3 model.
Per channel quantization is fully supported, and it is a big win on mobilenet v3 (more than 10 point accuracy on imagenet). I think even TFLite or MXNet + QNN doesn't support per channel quantization end to end.
The PR contains
Verifying the result is tricky due to difference in numerics between torch and tvm.
Quantized modules tests are run but there is no assertion on how close the raw floating point output is to torch. I dumped some statistics that we can eye ball and verify that the result is reasonable.
For imagenet tests, I use the real image and verify that two top3 label sets are identical. Here, I'm ignoring the order among top3, but for the 4 models that I added in the test, the order is actually correct. The order is ignored to make CI less fragile.
Below is the result on 1k subset of imagenet val data with full calibration + per channel quantization. I've prepared a repo to do the same evaluation here. You can also use the full imagenet dataset, if you have an access to it.
cc @anijain2305 @vinx13 @FrozenGene @jwfromm @alexwong @ajtulloch @hlu1 @yinghai