You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am wondering for multi-node FSDP, does local_rank and rank have any obvious difference here?
I think I understand that local_rank is the rank within a node.
I see in a few places it looks like local_rank is specifically used
I am wondering for multi-node FSDP, does
local_rank
andrank
have any obvious difference here?I think I understand that
local_rank
is the rank within a node.I see in a few places it looks like
local_rank
is specifically usedFor example
https://github.com/pytorch/examples/blob/main/distributed/FSDP/T5_training.py#L111
torch.cuda.set_device(local_rank)
and
https://github.com/pytorch/examples/blob/main/distributed/FSDP/utils/train_utils.py#L48
batch[key] = batch[key].to(local_rank)
Is there any problem if using
rank
instead?The text was updated successfully, but these errors were encountered: