New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Tutorial] Deploy Quantized Model on CUDA #4667

Merged

masahi merged 4 commits into apache:master from vinx13:tutorial/deploy_quanti

Jan 11, 2020

Member

vinx13 commented Jan 9, 2020

This tutorial demonstrates how to import a model using Relay frontend, run quantization and calibration passes, and perform quantized inference.
ref #4435

cc @tqchen @masahi @anijain2305 @ZihengJiang @tmoreau89


          [Tutorial] Deploy Quantized Model on CUDA

43a0ca9

anijain2305 approved these changes

View reviewed changes

Contributor

anijain2305 left a comment

LGTM with some minor comments.
I was expecting to see some accuracy and performance characteristics, but I also realize that might be out of scope of this user tutorial.

Later, it might also be useful to add a developer tutorial to help TVM developers add new operators for quantization.

tutorials/frontend/deploy_quantized.py

+              ================================
+              **Author**: `Wuwei Lin <https://github.com/vinx13>`_
+              This article is an introductory tutorial of automatic quantization with TVM.

Contributor

anijain2305 Jan 9, 2020

Maybe add a link to this discuss forum - https://discuss.tvm.ai/t/quantization-story/3920 to give a high-level idea of whats automatic quantization.

tutorials/frontend/deploy_quantized.py Outdated

+              ###############################################################################
+              # The calibration dataset should be a iterable object. We define the
+              # calibration dataset as a generator object in Python. In this tutorials, we

Contributor

anijain2305 Jan 9, 2020

tutorials -> tutorial

tutorials/frontend/deploy_quantized.py Outdated

+              # When the scales are not power of two, fixed point multiplications will
+              # be used.
+              #
+              # For outputs, we can find the scales with data-aware quantization.

Contributor

anijain2305 Jan 9, 2020

outputs --> intermediate feature maps

tutorials/frontend/deploy_quantized.py Outdated

+              import tvm
+              from tvm import relay
+              from tvm.relay import quantize as qtz

Contributor

anijain2305 Jan 9, 2020

Not used

tqchen added the status: need review label

tqchen assigned ZihengJiang

masahi reviewed

View reviewed changes

tutorials/frontend/deploy_quantized.py Outdated

+              ###############################################################################
+              # Import the model
+              # ----------------
+              # We use the Relay MxNet frontent to import a model from the Gluon model zoo.

Member

masahi Jan 9, 2020

frontend

masahi reviewed

View reviewed changes

tutorials/frontend/deploy_quantized.py Outdated Show resolved Hide resolved

tqchen assigned masahi


          update

d7a5f74

masahi approved these changes

View reviewed changes


          update

2f493cc

tmoreau89 reviewed

View reviewed changes

Contributor

tmoreau89 left a comment

Thanks @vinx13 ; this is a great tutorial! I've added some nits on language.

tutorials/frontend/deploy_quantized.py Outdated

+              **Author**: `Wuwei Lin <https://github.com/vinx13>`_
+              This article is an introductory tutorial of automatic quantization with TVM.
+              Automatic quantization is one of the quantization mode in TVM. More details of the quantization story in TVM can be found `here <https://discuss.tvm.ai/t/quantization-story/3920>`_.

Contributor

tmoreau89 Jan 10, 2020

mode -> modes
details of -> details on

Contributor

tmoreau89 Jan 10, 2020

also long line

tutorials/frontend/deploy_quantized.py Outdated

+              # Prepare the Dataset
+              # -------------------
+              # We will demonstrate how to prepare the calibration dataset for quantization.
+              # We first download the validate set of ImageNet and pre-process the dataset.

Contributor

tmoreau89 Jan 10, 2020

validation set

tutorials/frontend/deploy_quantized.py Outdated



		###############################################################################
		# The calibration dataset should be a iterable object. We define the

Contributor

tmoreau89 Jan 10, 2020

"should be an"

tutorials/frontend/deploy_quantized.py Outdated

+              # intermediate feature maps are power of two, we can leverage bit shifting for
+              # multiplications. This make it computationally more efficient. In `max` mode,
+              # the maximum is used as the scale. Without rounding, `max` mode might have
+              # better accuracy in some cases. When the scales are not power of two, fixed

Contributor

tmoreau89 Jan 10, 2020

powers

tmoreau89 requested changes

View reviewed changes

Contributor

tmoreau89 left a comment

(requesting changes on those minor typos)


          address comments

3eb7e41

tmoreau89 approved these changes

View reviewed changes

Contributor

tmoreau89 left a comment

Thanks, LGTM

masahi merged commit a2fe7a3 into apache:master

Member

masahi commented Jan 11, 2020

Thanks @vinx13 @anijain2305 @tmoreau89

alexwong pushed a commit to alexwong/tvm that referenced this pull request


          [Tutorial] Deploy Quantized Model on CUDA (apache#4667)

5adb627

* [Tutorial] Deploy Quantized Model on CUDA

* update

* update

* address comments

alexwong pushed a commit to alexwong/tvm that referenced this pull request


          [Tutorial] Deploy Quantized Model on CUDA (apache#4667)

85a427f

* [Tutorial] Deploy Quantized Model on CUDA

* update

* update

* address comments

zhiics pushed a commit to neo-ai/tvm that referenced this pull request


          [Tutorial] Deploy Quantized Model on CUDA (apache#4667)

c644ff0

* [Tutorial] Deploy Quantized Model on CUDA

* update

* update

* address comments

masahi mentioned this pull request

[DOC] Documentation on Quantization #4435

Closed

ZihengJiang mentioned this pull request

TVM v0.7 Release Note Candidate #6486

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: need review