Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export to ExecuTorch with Quantization #34787

Open
guangy10 opened this issue Nov 18, 2024 · 0 comments
Open

Export to ExecuTorch with Quantization #34787

guangy10 opened this issue Nov 18, 2024 · 0 comments
Labels
ExecuTorch Feature request Request for a new feature

Comments

@guangy10
Copy link
Contributor

Feature request

This task is to experiment running quantized HuggingFace models with ExecuTorch out-of-the-box.

The heavy-lifting quantization work will be done through quantize_ API by torchao, for example quantize_(model, int4_weight_only()).

The quantization API can be integrated with the integration points to executorch transformers.integrations.executorch, expanding the export workflow with a new option of "exporting with quantization". In eager, users can verify the numerics accuracy of the quantized exported artifact, e.g. the script for eval llama (here). In ExecuTorch, users can just load the quantized .pte files to ExecuTorch runner for inference.

Motivation

Experiment quantization workflow w/ transforms + torchao + executorch

Your contribution

Direct contribution, or provide guidance to anyone who is interested in this work

@guangy10 guangy10 added the Feature request Request for a new feature label Nov 18, 2024
@guangy10 guangy10 mentioned this issue Nov 18, 2024
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExecuTorch Feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants