-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do we comprehend the factor between algBw and busBw? #235
Comments
You can find the explanation in https://github.com/NVIDIA/nccl-tests/blob/master/doc/PERFORMANCE.md... In particular, regarding the difference between Broadcast and Scatter, Broadcast always needs to send out a complete buffer, whereas Scatter doesn't need to send the part destined for the root process (since that data is already there). I.e., for |
@kiskra-nvidia Thanks for the link, it helps. But I am still confused about difference between Broadcast and Scatter. For Broadcast, do you mean the root process(has a complete buffer) still needs to send out a complete buffer to itself, whereas Scatter doesn't? Since the root process already has the complete data, should the number of communicate be |
Perhaps we misunderstood each other. I was answering your question about the communication amount, which I understood to be a question about the volume of data. Broadcast needs to send a complete buffer |
@kiskra-nvidia
instead of
Do you know what else do I miss? |
I has same question |
AllGather, Alltoall, Gather, ReduceScatter, Scatter:
AllReduce:
Broadcast, Reduce, Send/Recv:
How do we comprehend the factor between algBw and busBw?
Particularly, I think the communication amount of Broadcast is just the same as Scatter, why are the factors different between them?
And It seems Alltoall communicate a lot more than AllGather, why are their factors the same?
The text was updated successfully, but these errors were encountered: