Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broker: use enclosing instance tbon.interface-hint unless overridden #6283

Merged
merged 9 commits into from
Sep 24, 2024

Conversation

garlick
Copy link
Member

@garlick garlick commented Sep 13, 2024

Problem: we don't have a way to configure subinstances to use a particular network, other than setting the global environment variable FLUX_IPADDR_INTERFACE.

Change the way the default tbon.interface-hint is set. Instead of hard coding "default-route", use the value that is set in the enclosing instance, if any.

This would allow a system instance to configure tbon.interface-hint and batch/alloc jobs that it starts would use that value unless overridden. If the value is not configured, then the attribute will have the default value which will be inherited by the child.

@garlick
Copy link
Member Author

garlick commented Sep 14, 2024

The thing that's bugging me about this solution is it adds a synchronous RPC (as well as a flux_open() / flux_close() to the parent broker on every rank. They will go in parallel so it will scale but it adds a fixed startup latency to every job.

One tactic to reduce the latency would be to pre-populate a PMI key like flux.tbon-interface-hint. It still will add the PMI RTT cost but at least the connection is already open. In retrospect that would seem to be preferable. Then perhaps later if we decide we want to pass a default configuration object to subinstances, we could eliminate it entirely.

I'll mark this WIP pending some feedback on the approach.

@garlick garlick changed the title broker: use enclosing instance tbon.interface-hint unless overridden WIP: broker: use enclosing instance tbon.interface-hint unless overridden Sep 14, 2024
@garlick
Copy link
Member Author

garlick commented Sep 20, 2024

Reworked to transfer the enclosing instance's tbon.interface-hint via PMI.

@garlick garlick changed the title WIP: broker: use enclosing instance tbon.interface-hint unless overridden broker: use enclosing instance tbon.interface-hint unless overridden Sep 20, 2024
Problem: the enclosing instance tbon.interface-hint attribute
is needed when flux launches flux.

Set it in the PMI kvs under the flux.tbon-interface-hint key.
Problem: adding support for reading flux.tbon-interface-hint from
the PMI KVS is awkward due to the structure of an internal function.

Refactor set_instance_level_attr() so that it returns a boolean
'under_flux' flag indicating whether this instance was launched by Flux.
Use that that flag to determine whether to set the jobid attr.
The flag will also be useful for gating whether to check for the
interface hint in the PMI KVS.
Problem: we don't have a way to configure subinstances to
use a particular network, other than setting the global
environment variable FLUX_IPADDR_INTERFACE.

Change the way the default tbon.interface-hint is set.  Instead
of hard coding "default-route", use the value that is set in the
enclosing instance's PMI KVS, if any.

This would allow a system instance to configure tbon.interface-hint
and batch/alloc jobs that it starts would use that value unless
overridden.  If the value is not configured in the system instance,
then the attribute will have the default value which will be inherited
by the child.

Fixes flux-framework#6272
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, nice work! Just a couple trivial comments inline.

@@ -447,19 +447,19 @@ test_expect_success 'FLUX_IPADDR_INTERFACE=lo works' '
flux getattr tbon.endpoint >endpoint3.out &&
grep "127.0.0.1" endpoint3.out
'
test_expect_success 'tcp.interface-hint=lo works' '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor commit message nit: 'use the "tcp.interface-hint" in the test titles' drop the the or add "string" or "label"?

flux start -o,-Stbon.interface-hint=hostname \
flux alloc -N1 flux getattr tbon.interface-hint >childhint.out &&
grep hostname childhint.out
'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should a test be added that ensures setting the attribute in the subinstance uses the set value and not the inherited one?

Problem: tests recently added to t0001-basic.t for tbon.interface-hint
use the wrong attribute name in the test titles.

Fix test titles.
Problem: t0001-basic.t takes too long to run.

Move the tbon config related tests to a new script called
t0019-tbon-config.t.
Problem: There is no test coverage for tbon.interface-hint
inheritance from enclosing instance.

Add a test.
Problem: the description of tbon.interface-hint does not mention
that the default value is taken from the enclosing instance, if any.

Rework the description of this attribute.
Problem: the description of tbon.interface hint does not mention
the other ways it can be set.

Note that the broker attribute overrides and add a pointer to
flux-broker-attributes(7) where the other ways of setting it are
mentioned.
Problem: a test expects to be terminated but sometimes it just
fails with a non-zero exit code.

Use test_must_fail_or_be_terminated.
@garlick
Copy link
Member Author

garlick commented Sep 24, 2024

Thanks! I addressed those comments and forced a push. Will set MWP.

@mergify mergify bot merged commit 7efc1a0 into flux-framework:master Sep 24, 2024
32 of 33 checks passed
@garlick garlick deleted the issue#6272 branch September 24, 2024 18:56
Copy link

codecov bot commented Dec 23, 2024

Codecov Report

Attention: Patch coverage is 80.64516% with 12 lines in your changes missing coverage. Please review.

Project coverage is 83.32%. Comparing base (fa81b09) to head (cc2b3a2).
Report is 550 commits behind head on master.

Files with missing lines Patch % Lines
src/broker/boot_pmi.c 79.31% 6 Missing ⚠️
src/broker/overlay.c 86.95% 3 Missing ⚠️
src/broker/boot_config.c 33.33% 2 Missing ⚠️
src/shell/pmi/pmi.c 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6283      +/-   ##
==========================================
- Coverage   83.33%   83.32%   -0.02%     
==========================================
  Files         522      522              
  Lines       86057    86093      +36     
==========================================
+ Hits        71717    71738      +21     
- Misses      14340    14355      +15     
Files with missing lines Coverage Δ
src/shell/pmi/pmi.c 81.51% <85.71%> (+0.09%) ⬆️
src/broker/boot_config.c 81.30% <33.33%> (-0.46%) ⬇️
src/broker/overlay.c 83.78% <86.95%> (-0.32%) ⬇️
src/broker/boot_pmi.c 65.79% <79.31%> (+0.40%) ⬆️

... and 5 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants