add int8 quantization support #3058

lanluo-nvidia · 2024-08-03T23:45:12Z

Description

Add int8 quantization support

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

py/torch_tensorrt/dynamo/conversion/impl/quantize.py

peri044 · 2024-08-07T17:14:10Z

consider adding modelopt as optional dependency with the correct version

https://gitlab-master.nvidia.com/omniml/modelopt/-/merge_requests/1457/diffs

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/examples/dynamo/simple_int8_ptq.py	2024-08-21 23:27:44.130840+00:00
+++ /home/runner/work/TensorRT/TensorRT/examples/dynamo/simple_int8_ptq.py	2024-08-21 23:28:02.118758+00:00
@@ -14,13 +14,15 @@
        x = self.linear1(x)
        x = torch.nn.ReLU()(x)
        x = self.linear2(x)
        return x

+
def calibrate_loop(model):
    """Simple calibration function for testing."""
    model(input_tensor)
+

input_tensor = torch.randn(1, 6).cuda()
model = SimpleNetwork().eval().cuda()
print(f"model before quantize: {model}")

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/examples/dynamo/simple_int8_ptq.py	2024-08-21 23:28:32.341283+00:00
+++ /home/runner/work/TensorRT/TensorRT/examples/dynamo/simple_int8_ptq.py	2024-08-21 23:28:50.802608+00:00
@@ -14,13 +14,15 @@
        x = self.linear1(x)
        x = torch.nn.ReLU()(x)
        x = self.linear2(x)
        return x

+
def calibrate_loop(model):
    """Simple calibration function for testing."""
    model(input_tensor)
+

input_tensor = torch.randn(1, 6).cuda()
model = SimpleNetwork().eval().cuda()
print(f"model before quantize: {model}")

lanluo-nvidia · 2024-08-21T23:31:49Z

@dheerajperi @narendasan ready for review
tested with modelopt source from main with this PR, it is working for both int8 and fp8.

however in the testcase I has to change fromtorch.export.export()to torch.export._trace._export()
otherwise got the following error:
RuntimeError: Attempting to use FunctionalTensor on its own. Instead, please use it with a corresponding FunctionalTensorMode()
here is the slack discussion with Wei-Ming Chen from modelopt team: https://nvidia.slack.com/archives/C06K7JPMXE1/p1724280167574089

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/examples/dynamo/simple_int8_ptq.py	2024-08-22 00:07:47.982949+00:00
+++ /home/runner/work/TensorRT/TensorRT/examples/dynamo/simple_int8_ptq.py	2024-08-22 00:08:05.777701+00:00
@@ -14,13 +14,15 @@
        x = self.linear1(x)
        x = torch.nn.ReLU()(x)
        x = self.linear2(x)
        return x

+
def calibrate_loop(model):
    """Simple calibration function for testing."""
    model(input_tensor)
+

input_tensor = torch.randn(1, 6).cuda()
model = SimpleNetwork().eval().cuda()
print(f"model before quantize: {model}")

tests/py/dynamo/models/test_models_export.py

examples/dynamo/simple_int8_ptq.py

py/torch_tensorrt/dynamo/conversion/impl/quantize.py

tests/py/dynamo/models/test_models_export.py

examples/dynamo/vgg16_fp8_ptq.py

narendasan · 2024-08-26T22:58:07Z

examples/dynamo/vgg16_fp8_ptq.py

+        from torch.export._trace import _export
+
+        exp_program = _export(model, (input_tensor,))
+        if args.quantize_type == "int8":


Shouldnt these be joined with a default set like enabled_precisions = {torch.float, torch.half}

py/torch_tensorrt/dynamo/conversion/impl/quantize.py

narendasan

Seems good to me, one small optional idea for the docs

tests/py/dynamo/models/test_models_export.py

examples/dynamo/vgg16_fp8_ptq.py

add int8 quantization support

e18fa4b

lanluo-nvidia self-assigned this Aug 3, 2024

facebook-github-bot added the cla signed label Aug 3, 2024

lanluo-nvidia added WIP Work is in progress, pull request should not be merged yet and removed cla signed labels Aug 3, 2024

github-actions bot requested a review from apbose August 3, 2024 23:45

facebook-github-bot added the cla signed label Aug 4, 2024

github-actions bot requested a review from zewenli98 August 4, 2024 00:11

narendasan reviewed Aug 5, 2024

View reviewed changes

py/torch_tensorrt/dynamo/conversion/impl/quantize.py Outdated Show resolved Hide resolved

narendasan reviewed Aug 5, 2024

View reviewed changes

py/torch_tensorrt/dynamo/conversion/impl/quantize.py Outdated Show resolved Hide resolved

lanluo-nvidia added 2 commits August 5, 2024 14:38

resolve comments

d79300b

test

fca7191

lanluo-nvidia added 3 commits August 20, 2024 14:31

Merge branch 'main' into lluo/int_quantization

5dd7053

modify according to the modelopt PR

24720bb

https://gitlab-master.nvidia.com/omniml/modelopt/-/merge_requests/1457/diffs

test

428276d

github-actions bot requested changes Aug 21, 2024

View reviewed changes

lanluo-nvidia marked this pull request as ready for review August 21, 2024 23:28

github-actions bot requested changes Aug 21, 2024

View reviewed changes

lanluo-nvidia requested a review from peri044 August 22, 2024 00:07

github-actions bot requested changes Aug 22, 2024

View reviewed changes

lanluo-nvidia removed the WIP Work is in progress, pull request should not be merged yet label Aug 22, 2024

test

6e21fcf

peri044 reviewed Aug 22, 2024

View reviewed changes

tests/py/dynamo/models/test_models_export.py Outdated Show resolved Hide resolved

examples/dynamo/simple_int8_ptq.py Outdated Show resolved Hide resolved

HolyWu reviewed Aug 24, 2024

View reviewed changes

py/torch_tensorrt/dynamo/conversion/impl/quantize.py Outdated Show resolved Hide resolved

HolyWu reviewed Aug 24, 2024

View reviewed changes

py/torch_tensorrt/dynamo/conversion/impl/quantize.py Outdated Show resolved Hide resolved

HolyWu reviewed Aug 24, 2024

View reviewed changes

tests/py/dynamo/models/test_models_export.py Outdated Show resolved Hide resolved

lanluo-nvidia added 2 commits August 25, 2024 09:54

resolve comments

e087da3

Merge branch 'main' into lluo/int_quantization

eaeb9f8

HolyWu reviewed Aug 26, 2024

View reviewed changes

examples/dynamo/vgg16_fp8_ptq.py Outdated Show resolved Hide resolved

resolve comments

3acb1e6

narendasan reviewed Aug 26, 2024

View reviewed changes

py/torch_tensorrt/dynamo/conversion/impl/quantize.py Show resolved Hide resolved

narendasan approved these changes Aug 26, 2024

View reviewed changes

narendasan reviewed Aug 26, 2024

View reviewed changes

tests/py/dynamo/models/test_models_export.py Outdated Show resolved Hide resolved

HolyWu reviewed Aug 27, 2024

View reviewed changes

examples/dynamo/vgg16_fp8_ptq.py Show resolved Hide resolved

lanluo-nvidia added 3 commits August 28, 2024 10:55

test

a58b666

add modelopt dependency in tests

abaf339

test

97f9166

lanluo-nvidia merged commit b3a8cdd into main Aug 28, 2024
50 of 67 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add int8 quantization support #3058

add int8 quantization support #3058

lanluo-nvidia commented Aug 3, 2024

peri044 commented Aug 7, 2024

github-actions bot left a comment

github-actions bot left a comment

lanluo-nvidia commented Aug 21, 2024

github-actions bot left a comment

narendasan Aug 26, 2024

narendasan left a comment

add int8 quantization support #3058

add int8 quantization support #3058

Conversation

lanluo-nvidia commented Aug 3, 2024

Description

Type of change

Checklist:

peri044 commented Aug 7, 2024

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

lanluo-nvidia commented Aug 21, 2024

github-actions bot left a comment

Choose a reason for hiding this comment

narendasan Aug 26, 2024

Choose a reason for hiding this comment

narendasan left a comment

Choose a reason for hiding this comment