Implement 60% faster context processing for AWQ #2551

casper-hansen · 2024-01-22T16:51:07Z

After some experimentation, I found that dequantizing and running FP16 matmul is faster in cases where batch_size * n_tokens >= 1024. This should help with throughput.

casper-hansen/AutoAWQ#316

The text was updated successfully, but these errors were encountered:

WoosukKwon · 2024-01-30T21:48:51Z

Closed by #2566

simon-mo added the quantization label Jan 23, 2024 — with Linear

WoosukKwon closed this as completed Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement 60% faster context processing for AWQ #2551

Implement 60% faster context processing for AWQ #2551

casper-hansen commented Jan 22, 2024

WoosukKwon commented Jan 30, 2024

Implement 60% faster context processing for AWQ #2551

Implement 60% faster context processing for AWQ #2551

Comments

casper-hansen commented Jan 22, 2024

WoosukKwon commented Jan 30, 2024