Questions about the paper #14

hiroshinoji · 2024-03-13T10:58:10Z

First, great work! I read the paper and had a few questions.

On p. 5, the paper says that minimal sequence length s = 6c, but where does this 6 come from? Is this related to 6bch for the blocks memory?
About the memory requirement, if I understand correctly, the total memory for 6 blocks might be 12bch (instead of 6bch) because each data is bfloat16?
Possibly, the interconnect bandwidth for TPUs might be wrong? According to https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-tpu-v5p-and-ai-hypercomputer?hl=en (the table), ICI BW per chip is 2,400Gbps. My understanding is that this is the total of 6 links (to form 3D torus), so each link is 400Gbps or 50GB/s. Let me know if this interpretation is wrong.

The text was updated successfully, but these errors were encountered:

ZhaiFeiyue · 2024-04-08T02:01:16Z

maybe is the 6 = HBM BW / Interconnect BW

haoliuhl · 2024-04-10T15:54:43Z

6 comes from storing key-value from previous host, key-value for current computation, and current query and output, so in total 2x2+1+1=6.

Provide feedback