[Quantization] Make calibration faster and more memory usage friendly #4589

masahi · 2019-12-27T03:12:12Z

This PR improves the performance (not accuracy) of KL based calibration in two ways:

Current implementation stores entire samples for all layers in one go, this quickly becomes intractable in terms of memory usage for image segmentation tasks, where there are more than a hundred of intermediate outputs and high-res inputs are common. I added "calibrate_chunk_by" parameter to qconfig, which enables chunk-by-chunk, interleaved profile generation and scale calculation. This adds some redundant computation, but it is a worthwhile trade off. In practice, using the cuda target for profile generation, I don't find noticeable slowdown. The default behavior is to use the number of intermediate outputs as chunk size, so there will be only one chunk and performance is the same as existing implementation.
Port KL div minimization to C++. MXNet has C++ implementation now, so I replaced python one we have with that one. This turned out a big win, scale calculation is now 10-20x faster. Below is a log from running test_calibrate_chunk() that I added, showing elapsed seconds and found scales for python and cpp respectively.

elapsed py: 4.980973720550537
elapsed cpp: 0.26279664039611816
scale py: 0.6268190741539001
scale cpp: 0.6268190741539001
elapsed py: 5.02645468711853
elapsed cpp: 0.27138352394104004
scale py: 0.5978888273239136
scale cpp: 0.5933241844177246
elapsed py: 5.484269857406616
elapsed cpp: 0.27078914642333984
scale py: 0.3622346520423889
scale cpp: 0.3667539358139038
elapsed py: 4.992680788040161
elapsed cpp: 0.27702999114990234
scale py: 0.6268190741539001
scale cpp: 0.6268190741539001
elapsed py: 5.395789861679077
elapsed cpp: 0.26751089096069336
scale py: 0.5125797390937805
scale cpp: 0.5124440789222717
elapsed py: 5.406205177307129
elapsed cpp: 0.26792263984680176
scale py: 0.7058634161949158
scale cpp: 0.7058634161949158

please review @vinx13 @ZihengJiang @tmoreau89

vinx13 · 2020-01-03T12:23:23Z

Thanks @masahi this is merged

…apache#4589) * Use memory efficient calibrate * Fixed indexing * add cpp kl stub * ported KL cpp from mxnet * Fixed std::distance arguments order * remove python implementation * fix lint and indent * fix indent * refactoring * fix lint * fix for i386

masahi added the status: need review label Dec 27, 2019

tqchen assigned vinx13 Dec 31, 2019

masahi added 9 commits January 2, 2020 22:20

Use memory efficient calibrate

83d6f96

Fixed indexing

a806894

add cpp kl stub

3cd78b6

ported KL cpp from mxnet

719064b

Fixed std::distance arguments order

8cb0cac

remove python implementation

cbe61a0

fix lint and indent

16219ce

fix indent

ef75257

refactoring

5908267

masahi force-pushed the calib-kl-cpp branch from 4d6c6ce to 5908267 Compare January 2, 2020 13:25

masahi added 2 commits January 2, 2020 22:37

fix lint

615378d

fix for i386

4751383

vinx13 approved these changes Jan 3, 2020

View reviewed changes

vinx13 merged commit 2440c9c into apache:master Jan 3, 2020

vinx13 added status: accepted and removed status: need review labels Jan 3, 2020

ZihengJiang mentioned this pull request Sep 17, 2020

TVM v0.7 Release Note Candidate #6486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Quantization] Make calibration faster and more memory usage friendly #4589

[Quantization] Make calibration faster and more memory usage friendly #4589

masahi commented Dec 27, 2019 •

edited

Loading

vinx13 commented Jan 3, 2020

[Quantization] Make calibration faster and more memory usage friendly #4589

[Quantization] Make calibration faster and more memory usage friendly #4589

Conversation

masahi commented Dec 27, 2019 • edited Loading

vinx13 commented Jan 3, 2020

masahi commented Dec 27, 2019 •

edited

Loading