Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce INC 3.0 quantization API and port torch RTN into 3.0 #1380

Merged
merged 29 commits into from
Nov 14, 2023

Conversation

yiliu30
Copy link
Contributor

@yiliu30 yiliu30 commented Nov 9, 2023

Description

This PR includes the minimal sets of INC 3.0-related configs and algos to demonstrate its architecture and E2E pipeline. More tests, algos, and frameworks will be added by following PRs.

  • INC 3.0 quantization API
  • Port torch RTN to INC 3.0
  • Several E2E UTs to demonstrate usage
  • Add more args for quantize API (like calib_func, calib_func_args)
  • Update the CI @chensuyue
  • Docstring

Other TODOs by follow-up PRs.

  • Support Config combination (RTNWeightOnlyConfig + GPTQWeightOnlyConfig)
  • Validate the user config
  • Port a LLM using RTN
  • Document
  • Other TODOs marked in the code

3.0 API Usage

Users can pass the quantization configs by a dict or instances of XxxAlgoConfig.
Below are demos for beginners, intermediate, and experts.

For beginners

from neural_compressor.torch import get_default_rtn_config, quantize
fp32_model = build_simple_torch_model()
qmodel = quantize(fp32_model, quant_config=get_default_rtn_config())

For intermediate

  • Pass a dict as config
from neural_compressor.torch import quantize
quant_config = {
    "rtn_weight_only_quant": {
        "weight_dtype": "nf4",
        "weight_bits": 4,
        "weight_group_size": 32,
    },
}
fp32_model = build_simple_torch_model()
qmodel = quantize(fp32_model, quant_config)
  • Pass instances of XxxAlgoConfig.
from neural_compressor.torch import RTNWeightQuantConfig, quantize
quant_config = RTNWeightQuantConfig(weight_bits=4, weight_dtype="nf4", weight_group_size=32)
fp32_model = build_simple_torch_model()
qmodel = quantize(fp32_model, quant_config)

For experts

  • Pass a dict as config
from neural_compressor.torch import quantize
fp32_model = build_simple_torch_model()
quant_config = {
    "rtn_weight_only_quant": {
        "global": {
            "weight_dtype": "nf4",
            "weight_bits": 4,
            "weight_group_size": 32,
        },
        "operator_name": {
            "fc1": {
                "weight_dtype": "int8",
                "weight_bits": 4,
            }
        },
    }
}
qmodel = quantize(fp32_model, quant_config)
  • Pass instances of XxxAlgoConfig.
from neural_compressor.torch import RTNWeightQuantConfig, quantize
quant_config = RTNWeightQuantConfig(weight_bits=4, weight_dtype="nf4")
# set operator instance
fc1_config = RTNWeightQuantConfig(weight_bits=4, weight_dtype="int8")
quant_config.set_operator_name("model.fc1", fc1_config)
# get model and quantize
fp32_model = build_simple_torch_model()
qmodel = quantize(fp32_model, quant_config)

The structure of INC 3.0

neural_compressor
  - common                 #  the common methods/configs across frameworks
    - base_config.py            # the base config for all algorithms
    - tunner(strategy)     # strategy-related code
  - torch
    - quantization
      - config.py             # the torch-specific config for its algorithms 
    - algorithms           # torch algo-related code
  - onnxrt                 # onnxrt-related code
  - tensorflow             # tensorflow-related code

E2E example

# copy the below code to try it :)
import torch
class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.fc1 = torch.nn.Linear(30, 50)
        self.fc2 = torch.nn.Linear(50, 30)
        self.fc3 = torch.nn.Linear(30, 5)

    def forward(self, x):
        out = self.fc1(x)
        out = self.fc2(out)
        out = self.fc3(out)
        return out

fp32_model = Model()

from neural_compressor.torch import quantize
quant_config = {
    "rtn_weight_only_quant": {
        "weight_dtype": "nf4",
        "weight_bits": 4,
        "weight_group_size": 32,
    },
}
qmodel = quantize(fp32_model, quant_config)

How has this PR been tested?

Pre-CI

Dependency Change?

  • torch (add torch as a mandatory dependency package for neural-compressor[pt])

Copy link
Contributor

@ftian1 ftian1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@yiliu30 yiliu30 added the WIP label Nov 9, 2023
chensuyue and others added 7 commits November 13, 2023 16:03
Signed-off-by: chensuyue <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
@chensuyue chensuyue added this to the v2.4 milestone Nov 14, 2023
Signed-off-by: yiliu30 <[email protected]>
@yiliu30 yiliu30 removed the WIP label Nov 14, 2023
@chensuyue chensuyue merged commit dc9328c into master Nov 14, 2023
55 of 57 checks passed
@chensuyue chensuyue deleted the ly/inc3_config branch November 14, 2023 09:47
This was referenced Nov 17, 2023
@yiliu30 yiliu30 added the INC3.X label Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants