-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Polaris app_run changes #357
Conversation
cpu_ids = self._node_spec.cpu_ids[0] | ||
cpus_per_node = len(cpu_ids) | ||
if not cpu_ids: | ||
compute_node = ComputeNode(self._node_spec.node_ids[0], self._node_spec.hostnames[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to watch out for here is that ComputeNode
will always have an empty list of cpu_ids
since it's the generic base class. Only at runtime do the launchers load the specific compute node class from the site configuration.
I think it should be safe for the NodeManager
to include the full set of CPU IDs in the NodeSpec
of a multinode job. This can be done by updating the method NodeManager._assign_multi_node
.
That would keep the abstraction intact and avoid the need to pass an extra piece of information (which subclass of ComputeNode
) from the launcher to the AppRun
.
https://github.com/argonne-lcf/balsam/blob/main/balsam/site/launcher/node_manager.py#L68
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about doing that (modifying _assign_multi_node
) but the way the NodeSpec object is structured, cpu_ids
and gpu_ids
will then be lists of a bunch of identical lists stored in memory. That seemed a bit silly. I did wonder why cpu_ids
and gpu_ids
have to be lists of lists? They only time they contain non-empty lists in the single node case. Having a list for every node does not seem to be used in any functional way.
You’re totally right that none of the implementations so far need cpu_ids
or gpu_ids to be a list of lists! This was just an attempt to provide a
generic interface, in case some future job launch mechanism provided more
fine-grained control over resources (like using 2 GPUs on node X and 1 GPU
on node Y).
The current decision that you see in NodeManager for multi-node jobs was
“let’s assume that if a job needs more than one node, it uses a *whole
number of nodes*.” That’s why the _assign_multi_node requests a full node
occupancy (setting each node’s occupancy to 1.0) and doesn’t bother with
setting CPUs.
Honestly I don’t know if there’s ever a situation where it makes sense to
have a multi-node job that only uses partial resources of each node.
The list of lists does seem silly in light of the fact it’s not used, it’s
a great call out! Still I would say the impact of sending that list of
lists in each node spec would be negligible as far as memory usage or code
maintainability. Alternatively, including the ComputeNode class as another
attribute in the NodeSpec seems like a good option too.
…On Thu, Jun 1, 2023 at 4:32 PM Christine Simpson ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In balsam/platform/app_run/app_run.py
<#357 (comment)>:
> @@ -67,10 +68,23 @@ def get_num_ranks(self) -> int:
return self._ranks_per_node * len(self._node_spec.node_ids)
def get_cpus_per_rank(self) -> int:
- cpu_per_rank = len(self._node_spec.cpu_ids[0]) // self._ranks_per_node
- if not cpu_per_rank:
- cpu_per_rank = max(1, int(self._threads_per_rank // self._threads_per_core))
- return cpu_per_rank
+ # Get the list of cpus assigned to the job. If it is a single node job, that is stored in
+ # the NodeSpec object. If it is a multinode job, the cpu_ids assigned to NodeSpec is empty,
+ # so we will assume all cpus on a compute node are available to the job. The list of cpus is
+ # just the list of cpus on the node in that case.
+ cpu_ids = self._node_spec.cpu_ids[0]
+ cpus_per_node = len(cpu_ids)
+ if not cpu_ids:
+ compute_node = ComputeNode(self._node_spec.node_ids[0], self._node_spec.hostnames[0])
I thought about doing that (modifying _assign_multi_node) but the way the
NodeSpec object is structured, cpu_ids and gpu_ids will then be lists of
a bunch of identical lists stored in memory. That seemed a bit silly. I did
wonder why cpu_ids and gpu_ids have to be lists of lists? They only time
they contain non-empty lists in the single node case.
—
Reply to this email directly, view it on GitHub
<#357 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE753U5YE2FKH2PPSISF6PTXJEC5XANCNFSM6AAAAAAYWJDQWI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Codecov ReportPatch coverage:
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## main #357 +/- ##
==========================================
- Coverage 60.91% 60.69% -0.23%
==========================================
Files 157 157
Lines 9627 9677 +50
Branches 1259 1271 +12
==========================================
+ Hits 5864 5873 +9
- Misses 3502 3544 +42
+ Partials 261 260 -1
☔ View full report in Codecov by Sentry. |
This PR modifies the way cpu_bind is set to make it consistent with the cpus assigned to the job by the node manager. A few items/questions that should be noted:
-d
depends on the value of cpu-bind. Ifcpu-bind=cores
,-d
is the number of physical cores per rank. If it isdepth
ornuma
, it is the number of hardware threads. We have a check for this now in_build_cmdline
.get_cpus_per_rank()
. It now uses the ComputeNode cpu_ids in the multimode case, and it has been reorganized to be more readable.OMP_PLACES
andOMP_PROC_BIND
for the user?