`local_rank` or `rank` for multi-node FSDP #1263

Emerald01 · 2024-05-30T19:47:21Z

I am wondering for multi-node FSDP, does local_rank and rank have any obvious difference here?
I think I understand that local_rank is the rank within a node.

I see in a few places it looks like local_rank is specifically used

For example

https://github.com/pytorch/examples/blob/main/distributed/FSDP/T5_training.py#L111
torch.cuda.set_device(local_rank)

and
https://github.com/pytorch/examples/blob/main/distributed/FSDP/utils/train_utils.py#L48
batch[key] = batch[key].to(local_rank)

Is there any problem if using rank instead?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`local_rank` or `rank` for multi-node FSDP #1263

`local_rank` or `rank` for multi-node FSDP #1263

Emerald01 commented May 30, 2024

local_rank or rank for multi-node FSDP #1263

local_rank or rank for multi-node FSDP #1263

Comments

Emerald01 commented May 30, 2024

`local_rank` or `rank` for multi-node FSDP #1263

`local_rank` or `rank` for multi-node FSDP #1263