You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The problem is that LSB_MCPU_HOSTS is actually a list of hostname slots pairs as described in Running parallel jobs on specific hosts. slot may contain multiple cores. Thus the calculation above may produce wrong numbers.
I have requested a job consisting of 4 slots. Each slot has 7 cores and 1 GPU. As a result, 1 slot is allocated on host1 and 3 slots are allocated on host2 as described by LSB_MCPU_HOSTS variable.
The file specified by LSB_MCPU_HOSTS contains a list of slots and core allocation for each slot. Each line of the files is of the form of hostname core-list. core-list is comma separated list of core IDs.
So possible solution is to count up core IDs for each host from $LSB_AFFINITY_HOSTFILE file.
The text was updated successfully, but these errors were encountered:
We have assumed that LSB_MCPU_HOSTS contains a list of
hostname cores
pairs as follows:And the number of cores of each host is computed as follows with some trick to allow duplication of host names.
ray-integration/ray_launch_cluster.sh
Lines 64 to 75 in c63630a
The problem is that LSB_MCPU_HOSTS is actually a list of
hostname slots
pairs as described in Running parallel jobs on specific hosts.slot
may contain multiple cores. Thus the calculation above may produce wrong numbers.Here is an example.
I have requested a job consisting of 4 slots. Each slot has 7 cores and 1 GPU. As a result, 1 slot is allocated on host1 and 3 slots are allocated on host2 as described by LSB_MCPU_HOSTS variable.
The file specified by
LSB_MCPU_HOSTS
contains a list of slots and core allocation for each slot. Each line of the files is of the form ofhostname core-list
.core-list
is comma separated list of core IDs.So possible solution is to count up core IDs for each host from $LSB_AFFINITY_HOSTFILE file.
The text was updated successfully, but these errors were encountered: