Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the paper #14

Open
hiroshinoji opened this issue Mar 13, 2024 · 2 comments
Open

Questions about the paper #14

hiroshinoji opened this issue Mar 13, 2024 · 2 comments

Comments

@hiroshinoji
Copy link

First, great work! I read the paper and had a few questions.

  • On p. 5, the paper says that minimal sequence length s = 6c, but where does this 6 come from? Is this related to 6bch for the blocks memory?
  • About the memory requirement, if I understand correctly, the total memory for 6 blocks might be 12bch (instead of 6bch) because each data is bfloat16?
  • Possibly, the interconnect bandwidth for TPUs might be wrong? According to https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-tpu-v5p-and-ai-hypercomputer?hl=en (the table), ICI BW per chip is 2,400Gbps. My understanding is that this is the total of 6 links (to form 3D torus), so each link is 400Gbps or 50GB/s. Let me know if this interpretation is wrong.
@ZhaiFeiyue
Copy link

maybe is the 6 = HBM BW / Interconnect BW

@haoliuhl
Copy link
Owner

haoliuhl commented Apr 10, 2024

6 comes from storing key-value from previous host, key-value for current computation, and current query and output, so in total 2x2+1+1=6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants