Add quantized PyTorch models in model builder #600

kunal-vaishnavi · 2024-06-11T23:37:37Z

Description

This PR adds support for building the final ONNX models that are optimized and quantized from already-quantized PyTorch models.

Motivation and Context

Quantization methods supported for already-quantized PyTorch models are GPTQ and AWQ. Currently, only INT4 precision is supported.

src/python/py/models/quantized_model.py

src/python/py/models/README.md

src/python/py/models/quantized_model.py

### Description This PR adds an end-to-end example for quantizing a PyTorch model with [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), creating the corresponding optimized and quantized ONNX model, and running the ONNX model with ONNX Runtime GenAI. ### Motivation and Context This PR shows an end-to-end example for [the quantized PyTorch support in the model builder](#600).

kunal-vaishnavi added 7 commits April 18, 2024 19:17

Add initial version of quantized PyTorch to quantized ONNX models

311b779

Merge branch 'main' into kvaishnavi/quant-models

4db5f60

Add revised version of quantized PyTorch to quantized ONNX models

3d7d21b

Add support for multiple safetensors and activation order

001fadd

Merge branch 'main' into kvaishnavi/quant-models

20b3677

Add example usage in README

967a6a8

Update wording in README

ddd5f54

github-advanced-security bot found potential problems Jun 11, 2024

View reviewed changes

src/python/py/models/quantized_model.py Dismissed Show dismissed Hide dismissed

src/python/py/models/quantized_model.py Fixed Show fixed Hide fixed

kunal-vaishnavi added 2 commits June 11, 2024 23:58

Remove unused import

47a69a8

Fix packed MatMul error

5f3413c

yufenglee reviewed Jun 12, 2024

View reviewed changes

src/python/py/models/README.md Show resolved Hide resolved

yufenglee reviewed Jun 12, 2024

View reviewed changes

src/python/py/models/quantized_model.py Outdated Show resolved Hide resolved

kunal-vaishnavi added 2 commits June 13, 2024 23:37

Add packed MatMulNBits from quantized QKV proj

76160b4

Remove commented out code

5f13822

yufenglee mentioned this pull request Jun 18, 2024

[Feature Request] 4bit and 2bit and 1bit quantization support microsoft/onnxruntime#14997

Open

Remove use_g_idx from AWQ

f3e7ec7

yufenglee approved these changes Jun 18, 2024

View reviewed changes

kunal-vaishnavi merged commit c622cc1 into main Jun 18, 2024
12 checks passed

kunal-vaishnavi deleted the kvaishnavi/quant-models branch June 18, 2024 22:36

kunal-vaishnavi mentioned this pull request Jun 27, 2024

Add quantized PyTorch to ONNX example #648

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add quantized PyTorch models in model builder #600

Add quantized PyTorch models in model builder #600

kunal-vaishnavi commented Jun 11, 2024

Add quantized PyTorch models in model builder #600

Add quantized PyTorch models in model builder #600

Conversation

kunal-vaishnavi commented Jun 11, 2024

Description

Motivation and Context