-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for RFC19 F58 encoded JOBIDs #3045
Conversation
This pull request introduces 1 alert when merging ccb02cc into 4ebcc86 - view on LGTM.com new alerts:
|
If it wasn't clear from above, most
work and are all equivalent. |
I think this is really neat. I was noticing that some of the output from
Should we open an issue to export the FLUID interface? I think it's not part of the public API because early on I wasn't sure the 64 bit job IDs were going to work out, but I think those days are behind us. |
I tried that at first. However, there are assumptions of integer JOBIDs all over the place in the testsuite. That would need to be cleaned up first. I was going to ask the question if I should attempt that. I also didn't update
I wasn't sure. Currently |
BTW, I did verify that RH backported Python Issue 28180 to the RHEL8 version of Python 3.6. i.e. in CentOS 8, this works: $ LANG=C python3 -c 'print("ƒQsm1jrTzf")'
ƒQsm1jrTzf Whereas in ubuntu 18.04 and RHEL7 it fails with an exception: $ LANG=C python3 -c 'print("ƒQsm1jrTzf")'
Unable to decode the command from the command line:
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 7-8: surrogates not allowed |
84bb84a
to
3f7fc8f
Compare
Rebased. Also made a small fix that was preventing I take lack of objections as assent to move forward. So I'll go ahead an make some tests for new |
class JobID(int): | ||
"""Class used to represent a Flux JOBID | ||
|
||
JobID is a subclass of `int`, so may be used in place of integer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment, but I'm not sure I follow the subclassing of int
. What scenarios do you have in mind where you would want to use a JobID in place of an integer? Does this help with the auto-generated bindings which are expecting an int for the id_parse
and id_encode
functions?
My main concern is that the __add__
, __sub__
, __mul__
, and __div__
methods that are inherited from int
don't really make sense in the context of a FLUID. I suspect the only int-like methods we care about are ordering related, and we can get that with very little code using functools total_ordering
decorator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS - I was originally thinking that int
was the class we were advised not to subclass in the python course (for reasons that escape me right now), but it was actually list
that we were warned about with UserList
being the preferred thing to subclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern was that JOBIDs were treated explicitly as int
already in the code, and I just wanted to extend that to allow the other representations, instead of adding a swath of functions to convert to/from int
and the other forms. We were already treating JOBIDs as integers and the int
methods didn't seem to get in the way, so I actually thought this was the least change for the least effort.
However, I'm not a good Python programmer (clearly), what would be proper to do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this also follows the C API, where we have typedef uint64_t fluid_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, one more thing. I remembered what I was worried about.
flux-jobs
currently stores JOBIDs as int
, and this value is passed to the {id}
format key as int
and therefore allows all integer format specifiers, presentations, etc. Therefore the JobID
class needs to present itself as an int
, but allow the other representations as properties so we can use {id.f58}
{id.words}
etc. Is there a better way to accomplish this than subclassing int
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern was that JOBIDs were treated explicitly as int already in the code, and I just wanted to extend that to allow the other representations, instead of adding a swath of functions to convert to/from int and the other forms.
Gotcha! That makes total sense to me. My concern about the arithmetic methods is relatively minor, and I see how adding calls to int()
all over the place would be a lot, especially if we miss something that is behind a conditional (darn dynamically-typed languages!).
flux-jobs currently stores JOBIDs as int, and this value is passed to the {id} format key as int and therefore allows all integer format specifiers, presentations, etc.
That makes a ton of sense too. I don't know of an easy way to support all of the int
format specifiers other than subclassing. The hard way would be to use/re-implement the int
's __format__
method or implement a .int()
method and then do {id.int:...}
in all the different formats that we do.
However, I'm not a good Python programmer
This is the only thing you said that I disagree with 😄
c47858b
to
1757eab
Compare
Once all these unicode obstacles are resolved, we can really have fun with this! Was just thinking a 🌱 for jobs with a self-imposed power cap would be cool! |
Or for @SteVwonder's sake a I think this one's ready for a full review. For those that missed my slack conversation with myself, I had to make one change to the flux Python diff --git a/src/bindings/python/flux/wrapper.py b/src/bindings/python/flux/wrapper.py
index 13632f914..fa444164d 100644
--- a/src/bindings/python/flux/wrapper.py
+++ b/src/bindings/python/flux/wrapper.py
@@ -190,7 +190,7 @@ class FunctionWrapper(object):
# Unpack wrapper objects
args[i] = args[i].handle
elif isinstance(args[i], six.text_type):
- args[i] = args[i].encode("utf-8")
+ args[i] = args[i].encode("utf-8", errors="surrogateescape")
try:
result = self.fun(*args) I also held off on manpage updates to avoid conflicts with #3033 |
Oh, I also should mention I threw on a change for
This gives a bit more room to add another field to default output in near future (e.g. sched.queue or sched.reason) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only got minor suggestions, so approving. I love the end result. I don't know if everyone will agree but let's see :-)
Prob would be good to get some python review also, as I mainly focused on the C bits.
src/common/libutil/fluid.c
Outdated
int index = 0; | ||
memset (buf, 0, bufsz); | ||
if (id == 0) { | ||
buf[0] = b58digits[0]; return 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new line for the return 1;
statement?
src/common/libutil/fluid.c
Outdated
return -1; | ||
} | ||
if (bufsz <= strlen (f58_prefix) + 1) { | ||
errno = ENOSPC; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used ENOSPC (No space left on device) in this case, but then somebody came up with EOVERFLOW (Value too large for defined data type) in some of our code and I thought that was more appropriate? Either is fine but I thought I'd throw that out there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I like EOVERFLOW as well. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to jump in without having to review the entire PR.
I have been using ERANGE
as well and I think ERANGE
or EDOM
could be another choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, now that I think about it, in this particular case, EOVERFLOW
would be best since this is a buffer being passed, not a number being passed. Sorry for the noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, thanks for the feedback!
} | ||
|
||
if ((count = b58revenc (b58reverse, sizeof (b58reverse), id)) < 0) | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set errno
for (int i = len - 1; i >= 0; i--) { | ||
int8_t c = b58digits_map[(int8_t)str[i]]; | ||
if (c == -1) /* invalid base58 digit */ | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set errno
case FLUID_STRING_F58: | ||
if (fluid_f58_decode (&fluid, s) < 0) | ||
return -1; | ||
break; | ||
default: | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not from this PR, but since you're here, this needs errno set, and also the mn_decode() != 8
case (my bad)
if (type == NULL || strcasecmp (type, "dec") == 0) { | ||
int len = snprintf (buf, bufsz, "%ju", (uintmax_t) id); | ||
if (len >= bufsz) { | ||
errno = ENOSPC; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: EOVERFLOW?
@@ -244,12 +243,8 @@ static struct optparse_option status_opts[] = { | |||
}; | |||
|
|||
static struct optparse_option id_opts[] = { | |||
{ .name = "from", .key = 'f', .has_arg = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
else if (!strcmp (to, "hex")) { | ||
if (fluid_encode (dst, dstsz, id, FLUID_STRING_DOTHEX) < 0) | ||
log_msg_exit ("error encoding id"); | ||
if (flux_job_id_encode (id, to, dst, dstsz) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great cleanup.
Problem: A couple corner cases in fluid_decode(3) do not set an errno on error return. Add errnos of EINVAL in these cases.
Support RFC18 defined F58 (prefixed base58) encoding for FLUIDs. This representation encodes 64bit FLUIDs int Base58 using the Bitcoin digits: 123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz In order to disambiguate an F58 encoded FLUID from decimal or hex encoding, a special character, the Unicode script f (U+0192) is prepended to the base58 encoding. Add a FLUID_STRING_F58 type for manual conversion to/from the F58 representation with fluid_encode(3) and fluid_decode(3).
Add unit tests for fluid_encode/decode with F58 strings.
Problem: fluid_decode(3) requires a priori knowledge of the encoding of the source string, and doesn't support integer representations of a FLUID as decimal or hex. This makes usage of the function multi-step for utilities that want to allow FLUID input from any representation. Add a function to auto-detect the encoding type of a FLUID string representation, along with a new `fluid_parse()` function that decodes from any supported representation.
Add unit tests for the generic fluid_parse() function.
Problem: FLUID string encoding is not exported via a public API, but this functionality may be useful and/or necessary for callers, especially for the case of converting JOBIDs between different encodings. Add flux_job_id_parse() and flux_jobid_encode() to libjob to allow Flux API users to easily parse a Flux JOBID from any supported string representation, or encode a `flux_jobid_t` to any supported representation. The supported representations currently include all standard FLUID encodings: - decimal integer - hexidecimal integer, prefixed with "0x" - dotted hex format, "XXXX.XXXX.XXXX.XXXX" - KVS path, "job.XXXX.XXXX.XXXX.XXXX" - Words, "X-X-X-X--X-X-X-X" - RFC 19 F58 encoding, prefixed with ƒ or f Addition of these functions to the libflux API will also make it easy to add support to bindings for encoding and decoding JOBIDs. The functions are named `flux_job_id_*` instead of `flux_jobid_*` to keep consistent naming with the other libjob functions, as well as to make it easy to call the functions from Python cffi.
The flux-job id --from option will be deprecated, and the "hex" encoding will change names. Remove these options from tests to avoid breakage during these changes.
Update the `flux-job id` utility to use the new flux_job_id_parse() and flux_job_id_encode() functions for parsing JOBIDs from the command line, and for conversion to the requested representation. The --from=TYPE option is no longer required, since flux_job_id_parse can unambiguously decode from all supported FLUID/JOBID encodings. This option is dropped. This change results in a renaming of the "hex" encoding to "dothex". A "hex" representation is still supported, but is encoded as hexidecimal with a "0x" prefix.
Problem: Flux JOBIDs support multiple representations as described in RFC 19, but many flux-job subcommands take input in only decimal integer format. Use flux_job_id_parse() instead of parse_arg_unsigned() specifically for jobids to support JOBID input in any supported format.
Update tests for `flux-job id` in t2201-job-cmd.t to reflect recent updates in this command, including F58 JOBID encoding, and rename of "hex" to "dothex" with introduction of new "hex" representation.
Problem: When passing commandline arguments (i.e. sys.argv) from Python scripts to C API functions via the wrapper.py class, encoding errors are possible if a utf-8 string is supplied in argv if the current environment does not have a UTF-8 LANG or LC_ALL environemnt set. This seems to be because Python encodes sys.argv to `str` with the current encoding, and (at least as of Python 3.6) there is no way to get the actual bytes given in argv in this environment without throwing a UnicodeEncode error. Fortunately this has been a big enough problem that PEP383[1] was devised, introducing the 'surrogatescape' error handler for encode(), which allows the so-called "smuggling" bytes in character strings. Since the alternative is to possibly throw UnicodeEncode errors, add `errors='surrogateescape'` to the wrapper.py __call__() method, when it encodes string arguments to pass into C. There is likely no downside here, since there is presumably no existing usage depending on the current behavior. [1] https://www.python.org/dev/peps/pep-0383/
Add a JobID class for working with multiple representations of Flux JobIDs. The JobID class inherints from Python int class, but its constructor takes a JobID in any valid representation using `flux_job_id_parse()` under the covers. The JobID class also provides an `encode()` method to encode a JobID to any supported representation, along with convenience named properties, e.g. jobid = JobID(arg) print (jobid.f58) will emit a F58 encoded jobid.
Add tests for the JobID class and its supported encodings.
Store JOBIDs as JobID objects in flux-jobs. Support all JobID format "properties", e.g. id.f58, id.hex, etc.
Change the default flux-jobs output format to use id.f58 instead of just id. This results in JOBIDs taking up less horizontal space in the default output - 12 columns instead of 20.
Add tests for JOBID encodings in output and input for flux-jobs.
Problem: The status_abbrev field only takes up two characters of width, but the field has to be six chars wide due to the default heading of STATUS. Change the heading for {status_abbrev} to "ST" and shorten the default width to 2.
Check for the correct default heading for status and status_abbrev.
Ok, I think addressed all your comments @garlick, squashed and pushed the result. |
Great! I only skimmed through the python bits, but I see you did go back and forth with @SteVwonder so maybe that's good to go? Up to you - I'm fine with MWP. |
Let's do it! Edit: It will be easy to roll back the |
Codecov Report
@@ Coverage Diff @@
## master #3045 +/- ##
==========================================
+ Coverage 81.13% 81.19% +0.06%
==========================================
Files 285 285
Lines 44063 44229 +166
==========================================
+ Hits 35750 35912 +162
- Misses 8313 8317 +4
|
Thanks! |
Yeah, the Python bits looked good to me when I reviewed them a week or two ago. Just learned about the surrogateencoding functionality. Thanks for the nice commit message on that @grondo. |
This PR adds support for the proposed RFC19 F58 encoded JOBIDs.
It is a WIP because the final set of testing is not in place yet, in case the result is not acceptable for merging.
At its core, this PR adds a new FLUID string type
FLUID_STRING_F58
, which represents the Base58 encoding of FLUIDS (and therefore Flux JOBIDs).Since all supported FLUID string representations are somewhat unambiguous, a
fluid_parse()
function is introduced which can automatically parse a FLUID from a string in any supported encoding.The
fluid_parse()
function is then used to implement aflux_job_id_parse()
to the public API (FLUIDs do not seem to be directly exposed in libflux API).flux_job_id_parse()
is then used throughoutflux-job.c
to allow input of a JOBID in any supported encoding, instead of just decimal integers.This also allowed dropping the need for the
flux job id --from
option, since the encoding of input is now auto-detected, e.g.The JOBID conversion functions are then added to python bindings, and JobID class introduced, which allows easy conversion to/from the different JOBID representations.
Finally, the JobID class is introduced into
flux-jobs
, which allows various output fields likeid.hex
id.dothex
id.words
.id.f58
is then made the default: