Skip to content

Commit

Permalink
update (#761)
Browse files Browse the repository at this point in the history
* update

* fixes
  • Loading branch information
mikekgfb authored May 13, 2024
1 parent acf8773 commit 262d5de
Showing 1 changed file with 10 additions and 10 deletions.
20 changes: 10 additions & 10 deletions docs/quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,12 @@ While quantization can potentially degrade the model's performance, the methods

## Supported Quantization Schemes
### Weight Quantization
| compression | FP Precision | bitwidth| group size | dynamic activation quantization | Eager | AOTI | ExecuTorch |
| compression | bitwidth| group size | dynamic activation quantization | Eager | AOTI | ExecuTorch |
|--|--|--|--|--|--|--|--|
| linear (asymmetric) | fp32, fp16, bf16 | [8, 4]* | [32, 64, 128, 256]** | ||| 🚧 |
| linear with GPTQ*** (asymmetric) | | |[32, 64, 128, 256]** | ||||
| linear with HQQ*** (asymmetric) | | |[32, 64, 128, 256]** | ||||
| linear with dynamic activations (symmetric) | fp32^ | | [32, 64, 128, 256]* | a8w4dq | 🚧 |🚧 ||
| linear (asymmetric) | [8, 4]* | [32, 64, 128, 256]** | ||| 🚧 |
| linear with GPTQ*** (asymmetric) | |[32, 64, 128, 256]** | ||||
| linear with HQQ*** (asymmetric) | |[32, 64, 128, 256]** | ||||
| linear with dynamic activations (symmetric) | | [32, 64, 128, 256]* | a8w4dq | 🚧 |🚧 ||

### Embedding Quantization

Expand All @@ -28,11 +28,9 @@ on-device usecases.

| compression | weight quantization (bitwidth)| weight quantization (group size) | dynamic activation quantization | Eager | AOTI | ExecuTorch |
|--|--|--|--|--|--|--|--|
| embedding (symmetric) | [8, 4]* | [32, 64, 128, 256]** | ||||
| embedding (symmetric) | [8, 4]* | [32, 64, 128, 256]+ | ||||


^ a8w4dq quantization scheme requires model to be converted to fp32,
due to lack of support for fp16 and bf16 in the kernels provided with
ExecuTorch.

* These are the only valid bitwidth options.

Expand All @@ -55,6 +53,8 @@ on-device usecases.
(As a result, there's presently no upstream path for changes and/or
improvements to HQQ.)

+ Should support non-power-of-2-groups as well.

## Quantization Profiles

Torchchat quantization supports profiles with multiple settings such
Expand Down Expand Up @@ -135,7 +135,7 @@ python3 generate.py llama3 --dso-path llama3.so --prompt "Hello my name is"
```
### ExecuTorch
```
python3 torchchat.py export llama3 --dtype fp32 --quantize '{"embedding": {"bitwidth": 4, "groupsize":32}, "linear:a8w4dq": {"groupsize" : 256}}' --output-pte-path llama3.pte
python3 torchchat.py export llama3 --quantize '{"embedding": {"bitwidth": 4, "groupsize":32}, "linear:a8w4dq": {"groupsize" : 256}}' --output-pte-path llama3.pte
python3 generate.py llama3 --pte-path llama3.pte --prompt "Hello my name is"
```
Expand Down

0 comments on commit 262d5de

Please sign in to comment.