Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Quantization] Make calibration faster and more memory usage friendly #4589

Merged
merged 11 commits into from
Jan 3, 2020

Conversation

masahi
Copy link
Member

@masahi masahi commented Dec 27, 2019

This PR improves the performance (not accuracy) of KL based calibration in two ways:

  • Current implementation stores entire samples for all layers in one go, this quickly becomes intractable in terms of memory usage for image segmentation tasks, where there are more than a hundred of intermediate outputs and high-res inputs are common. I added "calibrate_chunk_by" parameter to qconfig, which enables chunk-by-chunk, interleaved profile generation and scale calculation. This adds some redundant computation, but it is a worthwhile trade off. In practice, using the cuda target for profile generation, I don't find noticeable slowdown. The default behavior is to use the number of intermediate outputs as chunk size, so there will be only one chunk and performance is the same as existing implementation.

  • Port KL div minimization to C++. MXNet has C++ implementation now, so I replaced python one we have with that one. This turned out a big win, scale calculation is now 10-20x faster. Below is a log from running test_calibrate_chunk() that I added, showing elapsed seconds and found scales for python and cpp respectively.

elapsed py: 4.980973720550537
elapsed cpp: 0.26279664039611816
scale py: 0.6268190741539001
scale cpp: 0.6268190741539001
elapsed py: 5.02645468711853
elapsed cpp: 0.27138352394104004
scale py: 0.5978888273239136
scale cpp: 0.5933241844177246
elapsed py: 5.484269857406616
elapsed cpp: 0.27078914642333984
scale py: 0.3622346520423889
scale cpp: 0.3667539358139038
elapsed py: 4.992680788040161
elapsed cpp: 0.27702999114990234
scale py: 0.6268190741539001
scale cpp: 0.6268190741539001
elapsed py: 5.395789861679077
elapsed cpp: 0.26751089096069336
scale py: 0.5125797390937805
scale cpp: 0.5124440789222717
elapsed py: 5.406205177307129
elapsed cpp: 0.26792263984680176
scale py: 0.7058634161949158
scale cpp: 0.7058634161949158

please review @vinx13 @ZihengJiang @tmoreau89

@vinx13 vinx13 merged commit 2440c9c into apache:master Jan 3, 2020
@vinx13
Copy link
Member

vinx13 commented Jan 3, 2020

Thanks @masahi this is merged

alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 26, 2020
…apache#4589)

* Use memory efficient calibrate

* Fixed indexing

* add cpp kl stub

* ported KL cpp from mxnet

* Fixed std::distance arguments order

* remove python implementation

* fix lint and indent

* fix indent

* refactoring

* fix lint

* fix for i386
alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 28, 2020
…apache#4589)

* Use memory efficient calibrate

* Fixed indexing

* add cpp kl stub

* ported KL cpp from mxnet

* Fixed std::distance arguments order

* remove python implementation

* fix lint and indent

* fix indent

* refactoring

* fix lint

* fix for i386
zhiics pushed a commit to neo-ai/tvm that referenced this pull request Mar 2, 2020
…apache#4589)

* Use memory efficient calibrate

* Fixed indexing

* add cpp kl stub

* ported KL cpp from mxnet

* Fixed std::distance arguments order

* remove python implementation

* fix lint and indent

* fix indent

* refactoring

* fix lint

* fix for i386
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants