Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd: support high-level URIs and JOBID arguments in flux-top and flux-proxy #4004

Merged
merged 11 commits into from
Dec 14, 2021

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Dec 13, 2021

This PR adds a C function flux_uri_resolve() to libutil which calls out to flux-uri(1) to resolve a high-level URI "target" to a native Flux URI.

Then, support for high-level URIs is added to both flux top and flux proxy by calling flux_uri_resolve() to resolve the command argument, if provided.

This allows these commands to connect to jobs using any resolvable URI argument supported by flux-uri(1), including plain jobids, nested jobids, e.g.:

ƒ(s=1,d=0,builddir) grondo@asp:~/git/flux-core.git$ flux jobs
       JOBID USER     NAME       ST NTASKS NNODES  RUNTIME NODELIST
   ƒQ6Nz6NkP grondo   flux        R      4      1   43.22s asp
ƒ(s=1,d=0,builddir) grondo@asp:~/git/flux-core.git$ flux proxy ƒQ6Nz6NkP?local flux jobs
       JOBID USER     NAME       ST NTASKS NNODES  RUNTIME NODELIST
     ƒs3pknC grondo   flux        R      1      1   50.43s asp
     ƒs3pknB grondo   flux        R      1      1   50.43s asp
     ƒs2LmVq grondo   flux        R      1      1   50.43s asp
ƒ(s=1,d=0,builddir) grondo@asp:~/git/flux-core.git$ flux proxy ƒQ6Nz6NkP/ƒs3pknC?local flux jobs -c 4
       JOBID USER     NAME       ST NTASKS NNODES  RUNTIME NODELIST
     ƒxpsxUD grondo   sleep      PD      1      -        - -
     ƒxpsxUE grondo   sleep      PD      1      -        - -
     ƒxrMwkT grondo   sleep      PD      1      -        - -
     ƒyG5kFy grondo   sleep      PD      1      -        - -

(Here I used ?local because I don't have passwordless ssh set up)

@garlick
Copy link
Member

garlick commented Dec 13, 2021

Public API function or not? The header won't be installed but the flux_ prefix will let it leak out.

@grondo
Copy link
Contributor Author

grondo commented Dec 13, 2021

No sorry, I'll fix that

Problem: A typo identified during review of the flux-uri(1) manpage
was missed before the initial version was merged, URI should stand
for Uniform Resource Identifier not "Indicator".

Correct the mistake.
@grondo
Copy link
Contributor Author

grondo commented Dec 13, 2021

I've renamed flux_uri_resolve() to just uri_resolve() (sorry wasn't thinking when I named it) and force-pushed the changes.

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement! Looks good, just a minor error path issue in the uri_resolve() implementation.

Comment on lines 24 to 32
char **argv = malloc (4 * sizeof (char *));
if (!(argv[0] = strdup ("flux"))
|| !(argv[1] = strdup ("uri"))
|| !(argv[2] = strdup (uri))) {
free (argv);
return NULL;
}
argv[3] = NULL;
return argv;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check malloc return, and free all allocated memory on error path.

}
}
free (cpy);
if (!(argv = get_uri_cmd_create (uri))
Copy link
Member

@garlick garlick Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a little simpler to just allocate argv on the stack since it's fixed length? e.g.

char *argv[] = { "flux", "uri", uri, NULL };

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't know what I was thinking there. 😕

@grondo
Copy link
Contributor Author

grondo commented Dec 14, 2021

Ok, I pushed an update to simplify uri_resolve() since the flux uri arguments don't change (thanks @garlick, sorry for being a numbskull). I also fixed an off-by-one error in the code to remove newlines.
Force-pushed the result.

static void nullify_newline (char *str)
{
int n;
if (str && (n = strlen (str) > 0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing set of parens around n = strlen (str)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry, typo that my local compiler didn't catch. Fixed now.

@grondo grondo force-pushed the uri_resolve branch 2 times, most recently from 61580a7 to 4238015 Compare December 14, 2021 14:58
Problem: The "jobid" URI resolver returns a URI for Flux instances that
have already completed, in most cases resulting in a less useful later
error when the caller attempts to connecto to the completed instance.

Update the jobid URI resolver to raise an error if the target jobid
is not running.
Problem: The description of the jobid scheme in flux-uri(1) does not
note that an error will be raised for jobs that are not running.

Update flux-uri(1) to indicate the error.
@grondo
Copy link
Contributor Author

grondo commented Dec 14, 2021

I actually realized there was a loss in usability here for flux top because you don't get a specific error when the target jobid is not running, instead you now get 'unable to connect to Flux' after top tries to open a Flux handle to the completed job.

I've got a fix that updates the jobid URI resolver plugin to raise an error by default when the target jobid is not running, so that you get jobid {jobid} not running as the error instead. I went ahead and pushed the fixes here.

@grondo
Copy link
Contributor Author

grondo commented Dec 14, 2021

Oops, a new test in t2802-uri-cmd.t caused a deadlock when the test is run as a single core job (a submitted job blocks waiting forever for resources). Reversed the order of tests and re-pushed.

Problem: The t2802-uri-cmd.t sharness test does not test the check
for running jobs in the jobid uri resolver.

Add a check that a completed job fails URI resolution with a descriptive
error message.
Problem: Multiple flux-core utilities could benefit from calling
out to flux-uri to resolve high-level URIs to native URIs, but this
requires a lot of boilerplate code to launch the external process
and read its output, etc.

Add a uri_resolve() convenience function to libutil which takes a
possibly unresolved URI and calls `flux uri` on it, returning the
resolved URI on success.
Problem: The flux-proxy tests ensure failure with a non-supported
scheme, but we want to allow the command to take resolver schemes
such as 'jobid' and 'pid'.

Remove the test for now, to be replaced with a different test after
support for calling out to flux-uri is added.
Problem: The usability of flux-proxy is suboptimal because it requires
a fully-resolved, native Flux URI, whereas users typically know only
the jobid of the instance they want to target.

Use the new `uri_resolve()` libutil function to process the first
argument to `flux-proxy`. This allows flux-proxy to accept any URI
which can be processed by `flux uri`, including:

 1. A Flux jobid, e.g. `flux proxy ƒAaDN391`
 2. A nested Flux jobid e.g. `flux proxy jobid:ƒAaDN391/ƒQABnKR`
 3. A Slurm job `flux proxy slurm:1234`
 4. A local PID `flux proxy pid:1234`

Since `flux uri` supports plugins for addition of other resolver
schemes, site-specific or user custom schemes may be supported as
well.

Fixes flux-framework#3921
Problem: The flux-proxy manpage is out of date now that flux-proxy
can take URI or jobid argument, where URI can be resolved to a
job URI using `flux get-uri`.

Update the manpage describing the new usage of flux-proxy.
Problem: t1105-proxy.t doesn't test flux-proxy behavior when it calls
out to flux-uri via flux_uri_resolve()

Add a few tests to ensure flux-proxy works and fails in an expected
way when calling flux_uri_resolve().
Problem: The argument to flux-top(1) (if provided) can only be a
valid jobid in the enclosing instance, but for enhanced usability
any high-level URI accepted by `flux uri` should be allowed.

If an argument is provided to `flux top`, attempt to resolve it with
the new `uri_resolve()` function, which calls out to `flux uri`.
Thus, any high-level URI accepted by `flux uri` can also be used
directly with `flux top`.
Problem: Many flux-top expected failure tests now fail since flux-top
resolves its target argument with flux_uri_resolve().

Updated expected failure output and other tests in t2801-top-cmd.t.
@codecov
Copy link

codecov bot commented Dec 14, 2021

Codecov Report

Merging #4004 (9a7b7e5) into master (0b332fd) will increase coverage by 0.00%.
The diff coverage is 95.00%.

@@           Coverage Diff           @@
##           master    #4004   +/-   ##
=======================================
  Coverage   83.47%   83.48%           
=======================================
  Files         371      372    +1     
  Lines       53636    53647   +11     
=======================================
+ Hits        44774    44787   +13     
+ Misses       8862     8860    -2     
Impacted Files Coverage Δ
src/bindings/python/flux/importer.py 87.50% <87.50%> (ø)
src/cmd/flux-uri.py 88.00% <88.00%> (ø)
src/bindings/python/flux/uri/resolvers/pid.py 90.32% <90.32%> (ø)
src/bindings/python/flux/uri/resolvers/slurm.py 95.55% <95.55%> (ø)
src/common/libutil/uri.c 95.83% <95.83%> (ø)
src/bindings/python/flux/uri/uri.py 97.43% <97.43%> (ø)
...rc/bindings/python/flux/job/validator/validator.py 93.54% <100.00%> (ø)
src/bindings/python/flux/uri/__init__.py 100.00% <100.00%> (ø)
src/bindings/python/flux/uri/resolvers/jobid.py 100.00% <100.00%> (ø)
src/cmd/builtin/proxy.c 75.75% <100.00%> (+0.56%) ⬆️
... and 9 more

@grondo
Copy link
Contributor Author

grondo commented Dec 14, 2021

Thanks! Setting MWP

@mergify mergify bot merged commit c5566b6 into flux-framework:master Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants