AllReduce Bus Bandwidth decreases with larger network latency #241

chenzhu99 · 2024-07-29T01:58:07Z

We are conducting a nccl_test experiment with varied network latency.

The topology is Server 1&2&3&4 connected to local network 1, Server 5&6&7&8 connected to local network 2, each local network latency is less than 5 us. Local network 1 and 2 are connected with a controllable cross network latency.

We are doing a ring Allreduce test with the ring topology of: Server 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8, i.e. a 64 GPU ring AR.

We can get maximum bus bandwidth at low cross network latency (say 20us between local network 1 and 2);
however when we increase the cross network latency to several hundreds of us, the bus bandwidth drops by almost 50%.

Another data point is that, when we reduce the Allreduce rank by half, i.e. a 32 GPU ring Allreduce test with the ring topology of: Server 1 - 2 - 5 - 6, we can always get maximum bus bandwidth regardless what's the bus bandwidth.

Could anyone give a hint on explaining why the busBW drops with 64 GPU ring and large cross network latency?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AllReduce Bus Bandwidth decreases with larger network latency #241

AllReduce Bus Bandwidth decreases with larger network latency #241

chenzhu99 commented Jul 29, 2024 •

edited

Loading

AllReduce Bus Bandwidth decreases with larger network latency #241

AllReduce Bus Bandwidth decreases with larger network latency #241

Comments

chenzhu99 commented Jul 29, 2024 • edited Loading

chenzhu99 commented Jul 29, 2024 •

edited

Loading