Cluster job that spawns its own processes for use with DDP #2408
Labels
feature
Is an improvement or enhancement
help wanted
Open to be worked on
won't fix
This will not be worked on
🚀 Feature
Not sure if the title is appropriate. This feature would support the use case where
MASTER_ADDR
andMASTER_PORT
LOCAL_RANK
,GLOBAL_RANK
, andWORLD_SIZE
N_g
GPUsN_j
jobs are spawned (in my case, MPI on SLURM) for each gpu, i.e.,world_size= N_j * N_g
local_rank = global_rank % N_g
andtorch.cuda.set_device(local_rank)
Motivation
I'm able to write a class that overrides
pl.Trainer
to support this, but thought 1) this might be a use case for others and 2) I'd prefer not to override your code as much as possible. Here is thesbatch
file headerEach job sees 2 GPUs (and the device ids are not integers, another issue). To setup my run I set the following environment variables:
where
get_dist_env
knows how to getworld_size
andglobal_rank
from the environment. Formpirun
this isWith those variables (which I think are standard in your code) I should be able to run in
DDP
mode. Yet, the issue is, because each node sees both GPUs, I cannot define a setting inTrainer
that will allow this to execute correctly. Either I setnum_gpus=1
and thelocal_rank
is not calculated correctly or if I setnum_gpus=2
then your code will try to spawn an additional job.Pitch
I'm not sure what the best API approach is, but if the user sets
MASTER_ADDR
,MASTER_PORT
,WORLD_SIZE
,GLOBAL_RANK
, andLOCAL_RANK
then that should be everything you need to execute a distributed job.Additional context
I'm clearly not expert in distributed processing so I'm not sure if I'm asking for something that only works on my cluster with my settings and cannot be generalized. In this case, I am able to override
Trainer
to support my use case without you needing to change anything.Thanks for a great package!
The text was updated successfully, but these errors were encountered: