Introduce INC 3.0 quantization API and port torch RTN into 3.0 #1380

yiliu30 · 2023-11-09T02:52:48Z

Description

This PR includes the minimal sets of INC 3.0-related configs and algos to demonstrate its architecture and E2E pipeline. More tests, algos, and frameworks will be added by following PRs.

INC 3.0 quantization API
Port torch RTN to INC 3.0
Several E2E UTs to demonstrate usage
Add more args for quantize API (like calib_func, calib_func_args)
Update the CI @chensuyue
Docstring

Other TODOs by follow-up PRs.

Support Config combination (RTNWeightOnlyConfig + GPTQWeightOnlyConfig)
Validate the user config
Port a LLM using RTN
Document
Other TODOs marked in the code

3.0 API Usage

Users can pass the quantization configs by a dict or instances of XxxAlgoConfig.
Below are demos for beginners, intermediate, and experts.

For beginners

from neural_compressor.torch import get_default_rtn_config, quantize
fp32_model = build_simple_torch_model()
qmodel = quantize(fp32_model, quant_config=get_default_rtn_config())

For intermediate

Pass a dict as config

from neural_compressor.torch import quantize
quant_config = {
    "rtn_weight_only_quant": {
        "weight_dtype": "nf4",
        "weight_bits": 4,
        "weight_group_size": 32,
    },
}
fp32_model = build_simple_torch_model()
qmodel = quantize(fp32_model, quant_config)

Pass instances of XxxAlgoConfig.

from neural_compressor.torch import RTNWeightQuantConfig, quantize
quant_config = RTNWeightQuantConfig(weight_bits=4, weight_dtype="nf4", weight_group_size=32)
fp32_model = build_simple_torch_model()
qmodel = quantize(fp32_model, quant_config)

For experts

Pass a dict as config

from neural_compressor.torch import quantize
fp32_model = build_simple_torch_model()
quant_config = {
    "rtn_weight_only_quant": {
        "global": {
            "weight_dtype": "nf4",
            "weight_bits": 4,
            "weight_group_size": 32,
        },
        "operator_name": {
            "fc1": {
                "weight_dtype": "int8",
                "weight_bits": 4,
            }
        },
    }
}
qmodel = quantize(fp32_model, quant_config)

Pass instances of XxxAlgoConfig.

from neural_compressor.torch import RTNWeightQuantConfig, quantize
quant_config = RTNWeightQuantConfig(weight_bits=4, weight_dtype="nf4")
# set operator instance
fc1_config = RTNWeightQuantConfig(weight_bits=4, weight_dtype="int8")
quant_config.set_operator_name("model.fc1", fc1_config)
# get model and quantize
fp32_model = build_simple_torch_model()
qmodel = quantize(fp32_model, quant_config)

The structure of INC 3.0

neural_compressor
  - common                 #  the common methods/configs across frameworks
    - base_config.py            # the base config for all algorithms
    - tunner(strategy)     # strategy-related code
  - torch
    - quantization
      - config.py             # the torch-specific config for its algorithms 
    - algorithms           # torch algo-related code
  - onnxrt                 # onnxrt-related code
  - tensorflow             # tensorflow-related code

E2E example

# copy the below code to try it :)
import torch
class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.fc1 = torch.nn.Linear(30, 50)
        self.fc2 = torch.nn.Linear(50, 30)
        self.fc3 = torch.nn.Linear(30, 5)

    def forward(self, x):
        out = self.fc1(x)
        out = self.fc2(out)
        out = self.fc3(out)
        return out

fp32_model = Model()

from neural_compressor.torch import quantize
quant_config = {
    "rtn_weight_only_quant": {
        "weight_dtype": "nf4",
        "weight_bits": 4,
        "weight_group_size": 32,
    },
}
qmodel = quantize(fp32_model, quant_config)

How has this PR been tested?

Pre-CI

Dependency Change?

torch (add torch as a mandatory dependency package for neural-compressor[pt])

Signed-off-by: yiliu30 <[email protected]>

ftian1

looks good to me

neural_compressor/torch/algorithms/rtn_quantize.py

neural_compressor/torch/quantization/quantize.py

Signed-off-by: yiliu30 <[email protected]>

Signed-off-by: chensuyue <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: chensuyue <[email protected]>

…ressor into ly/inc3_config

Signed-off-by: chensuyue <[email protected]>

Signed-off-by: yiliu30 <[email protected]>

yiliu30 added 15 commits November 6, 2023 16:46

add new config

8494f2a

Signed-off-by: yiliu30 <[email protected]>

Merge branch 'master' of https://github.com/intel/neural-compressor

1ef0762

fixed the format

0733da5

Signed-off-by: yiliu30 <[email protected]>

separate base and rnt

3a8393a

Signed-off-by: yiliu30 <[email protected]>

support pass dict as config

3e2ceb6

Signed-off-by: yiliu30 <[email protected]>

add mode UTs

010ac35

Signed-off-by: yiliu30 <[email protected]>

reorganize UTs

304422d

Signed-off-by: yiliu30 <[email protected]>

fixed bug

99b0555

Signed-off-by: yiliu30 <[email protected]>

add rtn impl

6dab1bf

Signed-off-by: yiliu30 <[email protected]>

clean code

7bbfecb

Signed-off-by: yiliu30 <[email protected]>

remove unused code

ef043eb

Signed-off-by: yiliu30 <[email protected]>

get all constants into one file

12b4f12

Signed-off-by: yiliu30 <[email protected]>

fix ut

1ba02b4

Signed-off-by: yiliu30 <[email protected]>

add note

4327b7f

Signed-off-by: yiliu30 <[email protected]>

add init files

774e7c5

Signed-off-by: yiliu30 <[email protected]>

yiliu30 requested review from ftian1, mengniwang95, xin3he, chensuyue, thuang6 and lvliang-intel November 9, 2023 03:00

ftian1 approved these changes Nov 9, 2023

View reviewed changes

thuang6 reviewed Nov 9, 2023

View reviewed changes

neural_compressor/torch/algorithms/rtn_quantize.py Outdated Show resolved Hide resolved

neural_compressor/torch/quantization/quantize.py Outdated Show resolved Hide resolved

rename config.py and update import path

1b41938

yiliu30 added the WIP label Nov 9, 2023

yiliu30 and others added 5 commits November 9, 2023 13:41

complete the quantize definition

7120582

Signed-off-by: yiliu30 <[email protected]>

add get default

67c7b98

Signed-off-by: yiliu30 <[email protected]>

update binary installation

86c2c98

Signed-off-by: chensuyue <[email protected]>

support 3x pt UT test and code coverage collect

4a03c7c

Signed-off-by: chensuyue <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

591a315

for more information, see https://pre-commit.ci

chensuyue and others added 7 commits November 13, 2023 16:03

omit some test for 2x

a1599c1

Signed-off-by: chensuyue <[email protected]>

Merge branch 'ly/inc3_config' of https://github.com/intel/neural-comp…

a69f2a6

…ressor into ly/inc3_config

fix typo

897d036

Signed-off-by: chensuyue <[email protected]>

update deps

dba8fc7

Signed-off-by: chensuyue <[email protected]>

minor fix

3c13c80

Signed-off-by: chensuyue <[email protected]>

update config

4f5b799

Signed-off-by: yiliu30 <[email protected]>

add more UTs

6f945a4

Signed-off-by: yiliu30 <[email protected]>

chensuyue approved these changes Nov 14, 2023

View reviewed changes

chensuyue added the new feature label Nov 14, 2023

chensuyue added this to the v2.4 milestone Nov 14, 2023

add more docstring

bdde3c2

Signed-off-by: yiliu30 <[email protected]>

yiliu30 removed the WIP label Nov 14, 2023

chensuyue merged commit dc9328c into master Nov 14, 2023
55 of 57 checks passed

chensuyue deleted the ly/inc3_config branch November 14, 2023 09:47

This was referenced Nov 17, 2023

Enhance 3.x API #1397

Merged

Port torch GPTQ to 3.x #1408

Merged

yiliu30 added the INC3.X label Nov 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce INC 3.0 quantization API and port torch RTN into 3.0 #1380

Introduce INC 3.0 quantization API and port torch RTN into 3.0 #1380

yiliu30 commented Nov 9, 2023 •

edited

Loading

ftian1 left a comment

Introduce INC 3.0 quantization API and port torch RTN into 3.0 #1380

Introduce INC 3.0 quantization API and port torch RTN into 3.0 #1380

Conversation

yiliu30 commented Nov 9, 2023 • edited Loading

Description

3.0 API Usage

For beginners

For intermediate

For experts

The structure of INC 3.0

E2E example

How has this PR been tested?

Dependency Change?

ftian1 left a comment

Choose a reason for hiding this comment

yiliu30 commented Nov 9, 2023 •

edited

Loading