Skip to content

Commit

Permalink
update readme for fp8 (#1979)
Browse files Browse the repository at this point in the history
Signed-off-by: xinhe3 <[email protected]>
  • Loading branch information
xin3he authored Aug 15, 2024
1 parent 842b715 commit 46d9192
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 7 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ pip install "neural-compressor>=2.3" "transformers>=4.34.0" torch torchvision
```
After successfully installing these packages, try your first quantization program.

### [FP8 Quantization](./examples/3.x_api/pytorch/cv/fp8_quant/)
### [FP8 Quantization](./docs/source/3x/PT_FP8Quant.md)
Following example code demonstrates FP8 Quantization, it is supported by Intel Gaudi2 AI Accelerator.

To try on Intel Gaudi2, docker image with Gaudi Software Stack is recommended, please refer to following script for environment setup. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#launch-docker-image-that-was-built).
Expand Down Expand Up @@ -147,7 +147,7 @@ Intel Neural Compressor will convert the model format from auto-gptq to hpu form
</tr>
<tr>
<td colspan="2" align="center"><a href="./docs/source/3x/PT_WeightOnlyQuant.md">Weight-Only Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/3x/PT_FP8Quant.md">FP8 Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/PT_FP8Quant.md">FP8 Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/PT_MXQuant.md">MX Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/PT_MixedPrecision.md">Mixed Precision</a></td>
</tr>
Expand Down
2 changes: 1 addition & 1 deletion docs/3x/PT_FP8Quant.md → docs/source/3x/PT_FP8Quant.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,6 @@ model = convert(model)
| Task | Example |
|----------------------|---------|
| Computer Vision (CV) | [Link](../../examples/3.x_api/pytorch/cv/fp8_quant/) |
| Large Language Model (LLM) | [Link](https://github.com/HabanaAI/optimum-habana-fork/tree/habana-main/examples/text-generation#running-with-fp8) |
| Large Language Model (LLM) | [Link](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation#running-with-fp8) |

> Note: For LLM, Optimum-habana provides higher performance based on modified modeling files, so here the Link of LLM goes to Optimum-habana, which utilize Intel Neural Compressor for FP8 quantization internally.
13 changes: 9 additions & 4 deletions docs/source/3x/PyTorch.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,16 +176,21 @@ def load(output_dir="./saved_results", model=None):
<td class="tg-9wq8"><a href="PT_SmoothQuant.md">link</a></td>
</tr>
<tr>
<td class="tg-9wq8" rowspan="2">Static Quantization</td>
<td class="tg-9wq8" rowspan="2"><a href=https://pytorch.org/docs/master/quantization.html#post-training-static-quantization>Post-traning Static Quantization</a></td>
<td class="tg-9wq8">intel-extension-for-pytorch</td>
<td class="tg-9wq8" rowspan="3">Static Quantization</td>
<td class="tg-9wq8" rowspan="3"><a href=https://pytorch.org/docs/master/quantization.html#post-training-static-quantization>Post-traning Static Quantization</a></td>
<td class="tg-9wq8">intel-extension-for-pytorch (INT8)</td>
<td class="tg-9wq8">&#10004</td>
<td class="tg-9wq8"><a href="PT_StaticQuant.md">link</a></td>
</tr>
<tr>
<td class="tg-9wq8"><a href=https://pytorch.org/docs/stable/torch.compiler_deepdive.html>TorchDynamo</a></td>
<td class="tg-9wq8"><a href=https://pytorch.org/docs/stable/torch.compiler_deepdive.html>TorchDynamo (INT8)</a></td>
<td class="tg-9wq8">&#10004</td>
<td class="tg-9wq8"><a href="PT_StaticQuant.md">link</a></td>
<tr>
<td class="tg-9wq8"><a href=https://docs.habana.ai/en/latest/index.html>Intel Gaudi AI accelerator (FP8)</a></td>
<td class="tg-9wq8">&#10004</td>
<td class="tg-9wq8"><a href="PT_FP8Quant.md">link</a></td>
</tr>
</tr>
<tr>
<td class="tg-9wq8">Dynamic Quantization</td>
Expand Down

0 comments on commit 46d9192

Please sign in to comment.