Export to ExecuTorch with Quantization #34787

guangy10 · 2024-11-18T19:35:53Z

Feature request

This task is to experiment running quantized HuggingFace models with ExecuTorch out-of-the-box.

The heavy-lifting quantization work will be done through quantize_ API by torchao, for example quantize_(model, int4_weight_only()).

The quantization API can be integrated with the integration points to executorch transformers.integrations.executorch, expanding the export workflow with a new option of "exporting with quantization". In eager, users can verify the numerics accuracy of the quantized exported artifact, e.g. the script for eval llama (here). In ExecuTorch, users can just load the quantized .pte files to ExecuTorch runner for inference.

Motivation

Experiment quantization workflow w/ transforms + torchao + executorch

Your contribution

Direct contribution, or provide guidance to anyone who is interested in this work

The text was updated successfully, but these errors were encountered:

guangy10 added the Feature request Request for a new feature label Nov 18, 2024

guangy10 mentioned this issue Nov 18, 2024

Export to ExecuTorch #32253

Open

26 tasks

qubvel added the ExecuTorch label Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export to ExecuTorch with Quantization #34787

Export to ExecuTorch with Quantization #34787

guangy10 commented Nov 18, 2024

Export to ExecuTorch with Quantization #34787

Export to ExecuTorch with Quantization #34787

Comments

guangy10 commented Nov 18, 2024

Feature request

Motivation

Your contribution