Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add FluxURIResolver Python class and flux-uri command for job URI discovery #3999

Merged
merged 9 commits into from
Dec 13, 2021

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Dec 10, 2021

This PR introduces a FluxURIResolver Python class which can resolve target URIs to Flux job URIs. Support for resolving URIs in different contexts is provided by resolver "plugins" which are loaded based on the target URI scheme, though if no scheme is provided, then a scheme of jobid is assumed.

A new flux-uri(1) utility is provided to access this Python class from the commandline.

The PR includes three resolver plugins:

  • jobid: resolve job URIs for a Flux jobid in the current instance. This resolver works recursively if a "path" of jobids is provided, e.g. jobid:f1234/f3456 will attempt to resolve the URI for jobid f3456 running as child job of f1234. Since jobid is the default URI resolver scheme, f1234/f3456 would also work. e.g.
    $ flux uri ƒR3S77XQb
    ssh://quartz18/var/tmp/grondo/flux-jJ8CaZ/local-0
  • pid: resolve a URI for a local process id. This plugin attempts to read the FLUX_URI value from /proc/PID/environ. As a convenience, if the process appears to be a flux-broker, then the FLUX_URI of one of its children is used. This allows determination of the URI for a running flux-broker by providing its PID (useful in implementation of other resolvers)
    $ flux uri pid:$(pidof flux-broker)
    local:///tmp/flux-TBU7RM/local-0
  • slurm: an experimental resolver for Flux instances run as Slurm jobs. The slurm plugin works by using srun to run scontrol listpids on the first node of a Slurm job. For each PID that is also a direct child of slurmstepd (i.e. localid == 0) the pid resolver is used to resolve the FLUX_URI locally.
    $ squeue -u grondo
              JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            7843494    pdebug interact   grondo  R      55:36      2 quartz[18-19
    $ flux uri slurm:7843494
    ssh://quartz18/var/tmp/grondo/flux-MpnytT/local-0

URI resolver plugins are loaded from the flux.uri.resolvers namespace, so it is trivial to add third party or override existing resolvers on systems where needed. E.g. I assume someone could provide an lsf plugin at some point.

The next step is to extend flux proxy to call out to flux uri if its argument is not already a local:// or ssh:// URI. I actually have this working, but figured that would be better for a separate PR, release notes wise.

Here's a preview:

$ src/cmd/flux proxy slurm:7843731 flux resource list
     STATE NNODES   NCORES    NGPUS NODELIST
      free      4       72        0 quartz[1,1-2,2]
 allocated      0        0        0 
      down      0        0        0 

@grondo
Copy link
Contributor Author

grondo commented Dec 10, 2021

BTW, flux-uri lists available plugins in --help output:

$ flux uri  --help
usage: flux-uri [-h] [--remote] [--local] TARGET

Resolve TARGET to a Flux URI

positional arguments:
  TARGET      A Flux jobid or URI in scheme:argument form (e.g. jobid:f1234)

optional arguments:
  -h, --help  show this help message and exit
  --remote    convert a local URI to remote
  --local     convert a remote URI to local

Supported resolver schemes:
  jobid		Get URI for a given Flux JOBID
  pid		Get FLUX_URI for a given local PID
  slurm		Get URI for a Flux instance launched under Slurm

it can also be used as a trivial way to convert local URIs to remote and vice versa:

$ flux uri --remote $FLUX_URI
ssh://asp/tmp/flux-TBU7RM/local-0
$ flux uri --local ssh://asp/tmp/flux-TBU7RM/local-0
local:///tmp/flux-TBU7RM/local-0

@lgtm-com
Copy link

lgtm-com bot commented Dec 10, 2021

This pull request introduces 6 alerts when merging c67c60d into 87c800b - view on LGTM.com

new alerts:

  • 5 for Unused import
  • 1 for Except block handles 'BaseException'

@lgtm-com
Copy link

lgtm-com bot commented Dec 10, 2021

This pull request introduces 1 alert when merging 0ac1b64 into 87c800b - view on LGTM.com

new alerts:

  • 1 for Unused import

@garlick
Copy link
Member

garlick commented Dec 10, 2021

This is a nice addition to our tool set! Couple of quick comments:

It might be good to add examples of input accepted by the built-in methods to the man page.

I would expect this to be commonly used, so could we add the magic flux-help-command comment to the man rst source so this pops up in flux help output?

Should this parse?

$ flux uri pid://1234
flux-uri: ERROR: [Errno 2] No such file or directory: '/proc////status'

(I expected the // to be optional when the authority is missing, but I could be mistaken about that.)

In another PR we could drop the jobid resolving code from flux top and have it popen flux-uri instead. Then flux top slurm:JOBID just works. Nice! On that note, it would be handy if the--local and --remote options could be specified in the URI as query or fragment, e.g. jobid:1234?local.

@grondo
Copy link
Contributor Author

grondo commented Dec 10, 2021

Should this parse?

No, pid://1234 is a URI with authority 1234, the authority is only "empty" when there is a following slash as in the local scheme local:///tmp/foo/bar

@grondo
Copy link
Contributor Author

grondo commented Dec 10, 2021

I would expect this to be commonly used, so could we add the magic flux-help-command comment to the man rst source so this pops up in flux help output?

Sure, though I was thinking that those need a holistic redo since we still reference lesser used commands like keygen and logger

@grondo
Copy link
Contributor Author

grondo commented Dec 10, 2021

In another PR we could drop the jobid resolving code from flux top and have it popen flux-uri instead.

Would it be better to encourage flux proxy URI flux top instead?

In any event, in the next PR I'll add a helper function to call out to flux uri and we can add it to commands that make sense.

@grondo
Copy link
Contributor Author

grondo commented Dec 10, 2021

On that note, it would be handy if the--local and --remote options could be specified in the URI as query or fragment, e.g. jobid:1234?local.

Ooh, great idea!

@garlick
Copy link
Member

garlick commented Dec 10, 2021

Would it be better to encourage flux proxy URI flux top instead?

Eh, I'd prefer to call your helper :-)

Sure, though I was thinking that those need a holistic redo since we still reference lesser used commands

True. I'll open an issue on that.

@garlick
Copy link
Member

garlick commented Dec 10, 2021

Would it be better to encourage flux proxy URI flux top instead?

The problem with encouraging flux-proxy for running a single command is it adds an unnecessary layer of message routing, which feels slightly wrong to me when a simple environment modification and exec would suffice.

What about adding a --remote=URI option to flux(1)?

@grondo
Copy link
Contributor Author

grondo commented Dec 10, 2021

The problem with encouraging flux-proxy for running a single command is it adds an unnecessary layer of message routing, which feels slightly wrong to me when a simple environment modification and exec would suffice.

Ah, that should have been obvious, sorry. I was thinking of the case of FLUX_URI set in a shell session with many commands.

What about adding a --remote=URI option to flux(1)?

I'm not sure how exactly that would work, but might be more user friendly to selectively extend commands that can take a URI to allow resolver URIs instead of just Flux job URIs. (I could just be missing the use case for flux --remote=URI -- how is it different from FLUX_URI=URI flux...?)

However, what if flux proxy worked more like tmux or screen or ssh control sockets? Instead of opening a random socket, flux-proxy could open a socket that uniquely identifies the remote instance. Then if another process has FLUX_URI set to the same remote, the proxy connection could be reused?

@garlick
Copy link
Member

garlick commented Dec 10, 2021

I was just thinking flux --remote URI ... would be shorthand for FLUX_URI=$(flux uri URI) flux ... (a bit less typing), but was just brainstorming. Maybe after we have flux uri merged, certain commands (like flux jobs?) may emerge as frequently used with a FLUX_URI= prefix, and then we can decide whether it makes sense to add a convenience option to them.

The tmux idea is great! Now I think I've forgotten how flux terminus works. I don't want to get us too far off topic in this PR and I have to hit the road soon, so I'll just take that thought "to go". :-)

@grondo
Copy link
Contributor Author

grondo commented Dec 10, 2021

I've pushed an update here that adds support for common ?local ?remote query params on the URI resolver URIs passed to flux uri and FluxURIResolver as suggested.

If the final id in a jobid: lookup contains ?local, then all intermediate jobid URIs will be resolved to local: URIs, so this can be used to resolve a nested job without ssh (useful in testing)

The manpage was also expanded with a description of all 3 included resolver plugins, and examples of each.

Finally, a few more tests were added to cover the new code for handing query parameters.

@grondo
Copy link
Contributor Author

grondo commented Dec 11, 2021

The tmux idea is great! Now I think I've forgotten how flux terminus works. I don't want to get us too far off topic in this PR and I have to hit the road soon, so I'll just take that thought "to go". :-)

I actually shouldn't have mentioned tmux, as I was thinking more along the lines of how ssh control sockets work. When you ssh to host, if control sockets are enabled ssh first looks for a socket in a well known location based on the the target host and options (like username, port, etc). Could the ssh connector do something similar to re-use an existing proxy connector before launching another ssh? If one doesn't exist, flux-proxy could be started running daemonized.

@grondo
Copy link
Contributor Author

grondo commented Dec 11, 2021

I'd actually like to clean up the terminology a bit here and could use some help.

I've been calling the URI schemes into which resolver URIs are resolved (i.e. ssh and local for now) job URIs, but that probably isn't correct, since a flux start --test-size instance isn't a job, but it has a URI. Should these be called "connector URIs" or something else more specific?

It gets a little confusing talking about URIs that resolve to other URIs. I wish we had a better name for the URI argument to flux uri. If there's any suggestions, I could clean up the docs in this PR. Additionally, maybe an RFC describing "URI resolver URIs" (or something) would help here?

@garlick
Copy link
Member

garlick commented Dec 11, 2021

I guess the distinction is that the former can be passed to flux_open(3) while the latter has to be resolved first (kind like domain names).

Probably not great, but could we call the new one a URN and the original a URL (both URIs)?

Another thought, is there really any reason why we couldn't allow the new URIs to be passed to flux_open(3)? It's just more schemes, so could we fall through to the proposed C helper if flux_open(3) can't find a scheme to dlopen directly...

@grondo
Copy link
Contributor Author

grondo commented Dec 11, 2021

Another thought, is there really any reason why we couldn't allow the new URIs to be passed to flux_open(3)? It's just more schemes, so could we fall through to the proposed C helper if flux_open(3) can't find a scheme to dlopen directly...

Well, there's a popen2() call which might be surprising and could possibly cause problems in some cases, but maybe it could be opt-in via a new flux_open(3) flag (and then programs like flux-proxy and flux-top could use that flag)? It would also be nice to encourage Python users of flux.Flux() to resolve the URI first to avoid calling in to C to popen() another Python process. I guess then it isn't quite as useful, because a user would still have to know when a connector URI is required vs one of these resolver URIs.

I'm feeling a bit ambivalent about all this so I'd be will to try whatever you suggest.

@garlick
Copy link
Member

garlick commented Dec 11, 2021

I think this PR could be independent of adding name resolution to flux_open(2), which seems like it requires more thought, if it should be done at all.

One quick thought: do you think renaming the comand to flux-resolve-uri would clarify its purpose?

On terminology, maybe "high level" or "unresolved" Flux URIs get resolved to "native" Flux URIs?

@grondo
Copy link
Contributor Author

grondo commented Dec 11, 2021

I think this PR could be independent of adding name resolution to flux_open(2), which seems like it requires more thought, if it should be done at all.

👍

One quick thought: do you think renaming the comand to flux-resolve-uri would clarify its purpose?

Yeah, good thought. I had originally called it flux-uri-resolve but thought the shorter command would be appreciated when used in scripts and such. Now that you mention it, since the command will be more of a helper or plumbing command it doesn't make sense to drop the verb, so I'll add it back.

On terminology, maybe "high level" or "unresolved" Flux URIs get resolved to "native" Flux URIs?

Yeah, I like that and it will do for now.

@grondo
Copy link
Contributor Author

grondo commented Dec 11, 2021

I would expect this to be commonly used, so could we add the magic flux-help-command comment to the man rst source so this pops up in flux help output?

After discussion it seems like this would be more of a plumbing script, so I'll leave it out of the automated flux help generation for now.

@grondo
Copy link
Contributor Author

grondo commented Dec 11, 2021

Ok, I've pushed some fixup commits that rename flux uri to flux uri-resolve, though I have to say after doing that, I miss the shorter command (it is like a literal "flux URI for JOBID" or "flux uri for PID") but no big deal either way.

@garlick
Copy link
Member

garlick commented Dec 12, 2021

I am fine both ways too. If you feel that shorter is better here feel free to revert that fixup.

@grondo grondo force-pushed the flux-uri branch 2 times, most recently from 7be7cba to f55d562 Compare December 13, 2021 02:51
@grondo
Copy link
Contributor Author

grondo commented Dec 13, 2021

I actually ended up keeping the shorter name for now (easy to change if we decide against it). Even though flux uri itself might not be commonly used by normal users, I added some introductory text about "URIs" to the flux-uri(1) manpage, and it seemed like it would be nice to reference this from other locations as "see flux-uri(1)".

Another idea would be to add a flux uri resolve subcommand to flux uri if the shorter name flux uri is too confusing. Then the intro to Flux URIs could still be in flux-uri(1).

The intro was written a little hastily and still needs improvement, but at least it is somewhere.

@grondo
Copy link
Contributor Author

grondo commented Dec 13, 2021

Oh, I also got annoyed at how awkward it is to reference a manpage from in the reStructuredText. I found a random "extension" to easily add new domain refs to sphinx, and added special manN refs using a new domainrefs setting in conf.py. Now you can reference a manpage with, e.g. :man3:`flux_open`

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some notes on the man page. I'll make a pass through the code and then finish my review but wanted to get you these first.

DESCRIPTION
===========

Connections to Flux are established via a Uniform Resource Indicator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Indicator/Identifier/

Processes running within a Flux instance will have the ``FLUX_URI``
environment variable set to a native URI which :man3:`flux_open` will
use by default. Therefore, there is usually no need to specify a URI when
connecting to the enclosing instance. However, connecting to a _different_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"different" is using markdown emphasis not rst emphasis.

Might want to add to the first sentence in this paragraph something like

, with fallback to a compiled-in native URI for the a system instance of Flux.


As a convenience, if *TARGET* is specified with no scheme, then the scheme
is assumed to be ``jobid``. This allows ``flux uri`` to be used to look
up the URI for a Flux instance running as a job with:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this doesn't make the sentence too awkward, maybe use:

This allows flux uri to be used to look up the URI for a Flux instance running as a job in the current enclosing instance with:

Comment on lines 60 to 64
**-remote**
Return the _remote_ (``ssh://``) equivalent of the resolved URI.

**--local**
Return the _local_ (``local://``) equivalent of the resulved URI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use rst emphasis not markdown for "local" and "remote"

Comment on lines 83 to 86
This scheme attempts to read the ``FLUX_URI`` value from the process id
*PID* using ``/proc/PID/environ``. If *PID* refers to a ``flux-broker``,
then a child of the broker is used instead in order to obtain the
URI for that broker (instead of its parent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of "then a child of the broker is used instead" say "then the scheme reads FLUX_URI from the broker's initial program or another child process since FLUX_URI in the broker's environment would refer to its parent.

I'm not sure if that makes it any clearer. Feel free to ignore.

Missing period at end of sentence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added "(or may not be set at all in the case of an instance started with flux start --test-size=N)"

Comment on lines 120 to 124
.. note::
With the ``jobid`` resolver, ``?local`` only needs to be placed on
the last component of the jobid "path" or hierarchy. This will resolve
each URI in turn as a local URI, so it is only useful if all jobs
are actually running on the local system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the last job is running on the local system, then all enclosing instances have to be running on the local system, so maybe the "so..." explanation could just be dropped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds better, though more of what I was going for is that in resolving each job along the way the jobid resolver plugin has to connect to the parent job to resolve its child, and when ?local is placed on the last jobid all connections will use the local connector. This may not work if for example the first jobid is on another host (the first flux_open() will fail).

Also, jobid URIs are resolved to the URI of rank 0. If not all enclosing instances have rank 0 running locally, then your statement, while true, doesn't apply to resolution of URIs.

However, perhaps that is too much detail for the manpage.

Copy link
Member

@garlick garlick Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you, but if you leave it in you might want to include the rank 0 requirement.

If not all enclosing instances have rank 0 running locally

The top enclosing instance rank 0 isn't required to be local, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just left it out.

The top enclosing instance rank 0 isn't required to be local, correct?

no I meant the first jobid in the path component, it will be forced to a local:// uri if ?local is used. Probably the only use case for ?local is developers working with test instances, so it is probably better to just sweep this under the rug for now.


::

$ flux uri pid:$(pidof flux-broker | cut -d' ' -f1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use pidof -s instead of the cut?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, didn't know about pidof -s

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few superficial comments - I didn't spot anything in the code to comment on.

Great work to improve our usability here.

@@ -9,6 +9,7 @@ nobase_fluxpy_PYTHON = \
future.py \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit message: s/validatory.py/validator.py/

@@ -40,6 +40,8 @@ nobase_fluxpy_PYTHON = \
hostlist.py \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit message s/hiearchical/hierarchical/

if a URI is resolved with no scheme, it is assumed to be a Flux jobid

Did you mean "if a URI is parsed with no scheme..."? And is the default scheme an optimization or a convenience?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And is the default scheme an optimization or a convenience?

Yeah, a convenience. I'll just drop the As an optimization, -- no need for filler phrases in the commit message

@@ -46,9 +46,29 @@

extensions = [
'sphinx.ext.intersphinx',
'sphinx.ext.napoleon'
'sphinx.ext.napoleon',
'domainrefs'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!

@@ -44,6 +44,7 @@ nobase_fluxpy_PYTHON = \
uri/__init__.py \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message: s/direclty/directly/

Problem: Utility functions to aid in loading plugins from Python
namespace packages are currently in flux/validator.py and therefore
no other modules can use them.

Move these functions to a new flux/importer.py module and use them
from the job validator code.
Problem: Flux does not have a common place to collect the methods
for discovery of URIs for Flux jobs. This lack causes an excess of
ad-hoc solutions across the Flux user-base and reduces the overall
user experience when working with hierarchical instances of Flux.

Introduce a new Python FluxURIResolver class which is meant to act as
an extensible repository of methods for discovery of URIs from other
sources. On initialization, the class discovers a set of URI resolver
"plugins" which can translate a simple URI given in the plugin's
"scheme" to a job URI. If a URI does not include a scheme, it is
assumed to be a Flux jobid (e.g.  "f1234" is resolved as "jobid:f1234")

As a convenience, a resolver URI may have an optional query component
of "remote" or "local" which will force the result into a local or
remote job URI. For intance

 jobid:f1234?local

would rewrite the discovered URI for job f1234 as a local:// URI.

Query parameters are _also_ passed into resolver plugins, and the use
of `local` or `remote` (or other plugin-supported query parameters)
may influence the plugin's URI resolution as well.
Problem: There is no way to easily resolve a Flux instance running as
a job to its URI.

Add a "jobid" resolver plugin for the FluxURIResolver class.

This plugin resolves a Flux jobid or hierarchy of Flux jobids to a
URI by querying the user.uri job memo.

A hierarchy of jobs is specified by use of a forward-slash, e.g.

 "jobid:f1234/f3456"

would resolve the URI for job f3456 running in the job f1234.
Problem: It would useful to fetch the FLUX_URI from a local process,
but currently the schemes to do this are ad-hoc.

Add a "pid" resolver plugin to the FluxURIResolver class.

This plugin attempts to read the target process id's FLUX_URI from
/proc/pid/environ, allowing the user to connect to the same Flux
instance in which a process is running.

As a convenience, if the target PID is a running flux-broker, get
FLUX_URI from one of its children, allowing the "pid" resolver scheme
to fetch the FLUX_URI for a broker running on the current system.

This will be useful when discovering URIs when Flux is running under
a foreign resource manager.
Problem: A command line interface for resolving Flux jobids and
uri-resolver URIs to job URIs is not currently available.

Add flux-uri.py for this purpose. The command can take a jobid
or URI in any supported resolver scheme and will return a
FLUX_URI for the target instance.

For convenience, --local and --remote options are provided to
attempt to convert a resolved job URI to its local or remote form.
Problem: It is cumbersome to cross reference manpages with sphinx,
when it should be simple.

Add the `domainrefs.py` sphinx plugin from

 https://github.com/mitogen-hq/mitogen/blob/master/docs/domainrefs.py

which allows simple addition of multiple domain cross-references to
the sphinx conf.py.

Now referencing another manpage is as simple as :man1:`flux` for example.

Extend PYTHONPATH in sphinx commands so that sphinx can find extension
in srcdir.
Problem: No documentation exists for the `flux uri` command.

Add a short manpage for flux-uri.
Problem: When running a Flux under Slurm, there is no convenient
method to get the URI for instances running as Slurm jobs.

Add an experimental "slurm" resolver plugin for the FluxURIResolver class.

The slurm plugin works using the following method:

 * run `scontrol listpids` on the first node of the Slurm job
   via `srun --overlap --jobid=JOBID`

 * Try resolving each job PID using the "pid" resolver and return
   the first URI on success

This plugin is best effort and can probably be easily fooled, for example
if `flux start` or `flux broker` isn't run directly as a Slurm job.
Problem: No tests exist for the Python FluxURIResolver class and
its front-end tool flux-uri.

Add a small set of tests for the base URI and FluxJobURI classes
to the python tests as t/python/t0025-uri.py, and a larger set of
functionality tests in t2802-uri-cmd.t which use the flux-uri(1)
command.
@codecov
Copy link

codecov bot commented Dec 13, 2021

Codecov Report

Merging #3999 (0b332fd) into master (87c800b) will increase coverage by 0.05%.
The diff coverage is 94.56%.

❗ Current head 0b332fd differs from pull request most recent head 6c02e1a. Consider uploading reports for the commit 6c02e1a to get more accurate results

@@            Coverage Diff             @@
##           master    #3999      +/-   ##
==========================================
+ Coverage   83.42%   83.47%   +0.05%     
==========================================
  Files         364      371       +7     
  Lines       53418    53636     +218     
==========================================
+ Hits        44565    44774     +209     
- Misses       8853     8862       +9     
Impacted Files Coverage Δ
src/bindings/python/flux/importer.py 87.50% <87.50%> (ø)
src/cmd/flux-uri.py 88.00% <88.00%> (ø)
src/bindings/python/flux/uri/resolvers/pid.py 90.32% <90.32%> (ø)
src/bindings/python/flux/uri/resolvers/slurm.py 95.55% <95.55%> (ø)
src/bindings/python/flux/uri/uri.py 97.43% <97.43%> (ø)
...rc/bindings/python/flux/job/validator/validator.py 93.54% <100.00%> (-0.26%) ⬇️
src/bindings/python/flux/uri/__init__.py 100.00% <100.00%> (ø)
src/bindings/python/flux/uri/resolvers/jobid.py 100.00% <100.00%> (ø)
src/broker/overlay.c 88.03% <0.00%> (-0.14%) ⬇️
src/common/libflux/msg_handler.c 90.03% <0.00%> (+0.31%) ⬆️
... and 3 more

@grondo
Copy link
Contributor Author

grondo commented Dec 13, 2021

Seeing some unexpected errors in the centos8 builder, only in t3000-mpi-basic.t:

./t/t3000-mpi-basic.log
  2021-12-13T17:28:40.977607Z broker.err[0]: rc1.0: flux-module: /usr/src/src/modules/job-manager/plugins/.libs/perilog.so: undefined symbol: flux_jobtap_epilog_start
  2021-12-13T17:28:40.977801Z broker.err[0]: rc1.0: flux-module: /usr/src/src/modules/job-manager/plugins/.libs/alloc-bypass.so: undefined symbol: flux_jobtap_job_set_flag

Which concerns me that we're executing the wrong version of Flux with flux-module? But flux shouldn't be installed in the image so I'm a bit confused. Couldn't reproduce this locally either, so maybe I'll just try restarting the build.

(suspiciously, centos7 seems to have timed out, but got a similar error from the t3000-mpi-basic.t test)

@grondo
Copy link
Contributor Author

grondo commented Dec 13, 2021

Ok, rerunning the CI fixed the errors noted above, still a little strange, but I'll set MWP now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants