Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tutorial] Deploy Quantized Model on CUDA #4667

Merged
merged 4 commits into from
Jan 11, 2020

Conversation

vinx13
Copy link
Member

@vinx13 vinx13 commented Jan 9, 2020

This tutorial demonstrates how to import a model using Relay frontend, run quantization and calibration passes, and perform quantized inference.
ref #4435

cc @tqchen @masahi @anijain2305 @ZihengJiang @tmoreau89

Copy link
Contributor

@anijain2305 anijain2305 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some minor comments.
I was expecting to see some accuracy and performance characteristics, but I also realize that might be out of scope of this user tutorial.

Later, it might also be useful to add a developer tutorial to help TVM developers add new operators for quantization.

================================
**Author**: `Wuwei Lin <https://github.com/vinx13>`_

This article is an introductory tutorial of automatic quantization with TVM.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a link to this discuss forum - https://discuss.tvm.ai/t/quantization-story/3920 to give a high-level idea of whats automatic quantization.


###############################################################################
# The calibration dataset should be a iterable object. We define the
# calibration dataset as a generator object in Python. In this tutorials, we
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tutorials -> tutorial

# When the scales are not power of two, fixed point multiplications will
# be used.
#
# For outputs, we can find the scales with data-aware quantization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outputs --> intermediate feature maps


import tvm
from tvm import relay
from tvm.relay import quantize as qtz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used

###############################################################################
# Import the model
# ----------------
# We use the Relay MxNet frontent to import a model from the Gluon model zoo.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frontend

Copy link
Contributor

@tmoreau89 tmoreau89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vinx13 ; this is a great tutorial! I've added some nits on language.

**Author**: `Wuwei Lin <https://github.com/vinx13>`_

This article is an introductory tutorial of automatic quantization with TVM.
Automatic quantization is one of the quantization mode in TVM. More details of the quantization story in TVM can be found `here <https://discuss.tvm.ai/t/quantization-story/3920>`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mode -> modes
details of -> details on

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also long line

# Prepare the Dataset
# -------------------
# We will demonstrate how to prepare the calibration dataset for quantization.
# We first download the validate set of ImageNet and pre-process the dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validation set



###############################################################################
# The calibration dataset should be a iterable object. We define the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"should be an"

# intermediate feature maps are power of two, we can leverage bit shifting for
# multiplications. This make it computationally more efficient. In `max` mode,
# the maximum is used as the scale. Without rounding, `max` mode might have
# better accuracy in some cases. When the scales are not power of two, fixed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

powers

Copy link
Contributor

@tmoreau89 tmoreau89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(requesting changes on those minor typos)

Copy link
Contributor

@tmoreau89 tmoreau89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM

@masahi masahi merged commit a2fe7a3 into apache:master Jan 11, 2020
@masahi
Copy link
Member

masahi commented Jan 11, 2020

Thanks @vinx13 @anijain2305 @tmoreau89

alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 26, 2020
* [Tutorial] Deploy Quantized Model on CUDA

* update

* update

* address comments
alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 28, 2020
* [Tutorial] Deploy Quantized Model on CUDA

* update

* update

* address comments
zhiics pushed a commit to neo-ai/tvm that referenced this pull request Mar 2, 2020
* [Tutorial] Deploy Quantized Model on CUDA

* update

* update

* address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants