Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement 60% faster context processing for AWQ #2551

Closed
casper-hansen opened this issue Jan 22, 2024 · 1 comment
Closed

Implement 60% faster context processing for AWQ #2551

casper-hansen opened this issue Jan 22, 2024 · 1 comment

Comments

@casper-hansen
Copy link
Contributor

After some experimentation, I found that dequantizing and running FP16 matmul is faster in cases where batch_size * n_tokens >= 1024. This should help with throughput.

casper-hansen/AutoAWQ#316

@simon-mo simon-mo added the quantization label Jan 23, 2024 — with Linear
@WoosukKwon
Copy link
Collaborator

Closed by #2566

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants