Does KV cache belong to Activation? #6

pprp · 2024-04-18T08:02:29Z

The survey discusses the sensitivity of activation quantization and the tolerance of KV cache quantization in the context of post-training quantization (PTQ) for large language models (LLMs). It makes the distinction that while activation quantization is quite sensitive (meaning it can significantly affect performance if not handled carefully), KV cache quantization is more tolerant (implying it can be quantized with less impact on performance).

My question is:
Whether KV cache should be considered part of the activation.

wln20 · 2024-04-19T14:18:07Z

Hi!

To be exact, "activations" refers to the "temporary activations" in our paper, which serve as the inputs of linear operators, and KV cache comes from the outputs of k_proj and v_proj.

While you may think both temporary activations and KV cache are the "feature maps" within the model, we empirically found that some of their characteristics fairly differ from each other (including their sensitivity towards quantization). So I think it's not a good idea to treat KV cache and the temporary activations as the same kind of tensors.

By the way, some recent works have also shown their finding about the difference between activations and KV cache, aligning with our observations. For example, WKVQuant also indicates the excessive sensitivity of temporary activations, compared with KV cache. Furthermore, KIVI's study on the data distribution of KV cache demostrates that the outlier patterns of KV cache and temporary activations are quite different, so it's not surprising the effect of quantization varies on these two kinds of tensors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does KV cache belong to Activation? #6

Does KV cache belong to Activation? #6

pprp commented Apr 18, 2024

wln20 commented Apr 19, 2024

Does KV cache belong to Activation? #6

Does KV cache belong to Activation? #6

Comments

pprp commented Apr 18, 2024

wln20 commented Apr 19, 2024