Skip to content

Commit

Permalink
shell: affinity: use cached hwloc XML
Browse files Browse the repository at this point in the history
Problem: Loading hwloc topology can be very slow, especially on a
system with many cores and when possibly many processes are trying
to simultaneously call hwloc_topology_load(3). This can occur when
many short running jobs are being launched by Flux, since the job
shell loads topology by default in the affinity plugin.

Since the job shell now caches the hwloc XML in the shell info object,
fetch this XML and use it to load topology, avoiding redundant scans
of ths sytem. This may greatly improve job throughput on many core
systems.

Fixes flux-framework#4365
  • Loading branch information
grondo committed Jun 15, 2022
1 parent 197fc67 commit d7e5d49
Showing 1 changed file with 22 additions and 2 deletions.
24 changes: 22 additions & 2 deletions src/shell/affinity.c
Original file line number Diff line number Diff line change
Expand Up @@ -158,10 +158,30 @@ static void shell_affinity_destroy (void *arg)

/* Initialize topology object for affinity processing.
*/
static int shell_affinity_topology_init (struct shell_affinity *sa)
static int shell_affinity_topology_init (flux_shell_t *shell,
struct shell_affinity *sa)
{
const char *xml;

/* Fetch hwloc XML cached in job shell to avoid heavyweight
* hwloc topology load (Issue #4365)
*/
if (flux_shell_info_unpack (shell, "{s:s}", "hwloc", &xml) < 0)
return shell_log_errno ("failed to unpack hwloc object");

if (hwloc_topology_init (&sa->topo) < 0)
return shell_log_errno ("hwloc_topology_init");

if (hwloc_topology_set_xmlbuffer (sa->topo, xml, strlen (xml)) < 0)
return shell_log_errno ("hwloc_topology_set_xmlbuffer");

/* Tell hwloc that our XML loaded topology is from this system,
* O/w hwloc CPU binding will not work.
*/
if (hwloc_topology_set_flags (sa->topo,
HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM) < 0)
return shell_log_errno ("hwloc_topology_set_flags");

if (hwloc_topology_load (sa->topo) < 0)
return shell_log_errno ("hwloc_topology_load");
if (topology_restrict_current (sa->topo) < 0)
Expand All @@ -178,7 +198,7 @@ static struct shell_affinity * shell_affinity_create (flux_shell_t *shell)
struct shell_affinity *sa = calloc (1, sizeof (*sa));
if (!sa)
return NULL;
if (shell_affinity_topology_init (sa) < 0)
if (shell_affinity_topology_init (shell, sa) < 0)
goto err;
if (flux_shell_rank_info_unpack (shell,
-1,
Expand Down

0 comments on commit d7e5d49

Please sign in to comment.