Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc19: propose a standard (alternate) encoding for FLUIDs #255

Merged
merged 1 commit into from
Jul 15, 2020

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Jul 8, 2020

As discussed in flux-framework/flux-core#3038, this PR proposes a standard alternate encoding for FLUIDs, most useful in decreasing the space required for JOBID output in tools such as flux jobs. I figured we'd get the idea down in the RFC so we could refer to it during development (assuming there is concensus).

@grondo grondo force-pushed the fse branch 3 times, most recently from 65de9e1 to 906cd63 Compare July 8, 2020 23:27
@grondo
Copy link
Contributor Author

grondo commented Jul 9, 2020

Pushed a commit to update the example FLUID (to one with more human interest.)

Also thinking that FSE is too close to FSD, and maybe a better name for this "standard" FLUID encoding is simply "f58".

@garlick
Copy link
Member

garlick commented Jul 9, 2020

Fuzzbunny - Love it.!

@grondo
Copy link
Contributor Author

grondo commented Jul 9, 2020

Yeah, who is against fuzzy bunnies? I dare you to speak up!

spec_19.rst Outdated Show resolved Hide resolved
Copy link
Member

@SteVwonder SteVwonder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! :) Thanks @grondo!

@grondo
Copy link
Contributor Author

grondo commented Jul 10, 2020

Looks like the Wikipedia reference was literally just deleted. I have never seen that before (is un serendipitous a word?)

Maybe drop the reference since the concept of baseN should be clear? There's also a rfc draft, but it is expired.

@grondo
Copy link
Contributor Author

grondo commented Jul 10, 2020

Pushed a fixup commit to replace the Wikipedia reference with a reference to the Bitcoin wiki description of Base58. Honestly we could also probably drop the reference since we're just representing 64-bit integers in base58, not describing a generic binary encoding.

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Please set MWP when you're ready.

@grondo
Copy link
Contributor Author

grondo commented Jul 12, 2020

Unfortunately, once I had an implementation mostly complete, I did run into a problem with Unicode support as @SteVwonder feared. If the current locale is set to a non-UTF-8 locale, and attempt to print a jobid encoded in F58 fails with:

flux-jobs: ERROR: 'ascii' codec can't encode character '\u0192' in position 2: ordinal not in range(128)

sharness explicitly set the locale to C, so this problem hits the sharness tests.

It seems like this is a well known issue such that PEP 538 (coercion to UTF-8 locale) was proposed, but it seems like that PEP only made it into 3.7.

Not sure if we can do a similar thing and coerce locales to a valid UTF-8. This kludge does seem to work:

    # Re-open stdout with utf8 encoding
    sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8')

Only needed in flux-*.py commands that output JOBIDs encoded as F58 strings.

Also seems that PYTHONIOENCODING=utf-8:surrogateescape works.

I guess if these minor changes work around the issue in sharness, we shouldn't have troubles in various locales that might be set on production systems, but also maybe it would be safer to use ascii f as the F85 default prefix.

(Sorry for doubting you earlier @SteVwonder, this unicode thing in Python is indeed a mess!)

@garlick
Copy link
Member

garlick commented Jul 13, 2020

I'm fine with the coersion as it seems like that's where python is heading. The unicode output looks cool and we do have a fallback plan if problems arise.

@grondo
Copy link
Contributor Author

grondo commented Jul 14, 2020

In the implementation proposed in flux-framework/flux-core#3045, a new fluid_parse() function was introduced that allows decoding a FLUID/JOBID string from any supported encoding using the following rules:

  • if the string contains a ., decode as dothex
  • if the string contains a -, decode as words (mnemonic)
  • if the string starts with ƒ or f, decode as F58
  • if the string starts with 0x decode as hexidecimal
  • O/w, decode as decimal

E.g.:

$ flux job id ƒLUX  
65512
$ flux job id 0000.0000.0000.ffe8
65512
$ flux job id modular-archive-academy--academy-academy-academy
65512

Should these rules be made explicit in this RFC? Or is that overkill? Just throwing it out there.

@garlick
Copy link
Member

garlick commented Jul 14, 2020

Should these rules be made explicit in this RFC? Or is that overkill? Just throwing it out there.

Yeah, that IMHO would be useful to document.

@grondo
Copy link
Contributor Author

grondo commented Jul 15, 2020

I've pushed an update here that reworks the "Representation" section to include descriptions of all supported FLUID encodings, as well as a set of "rules" for decoding FLUID strings in supported representations.

If this looks ok (might need some wordsmithing), then I can squash all the commits together.

@garlick
Copy link
Member

garlick commented Jul 15, 2020

Nice! That works for me. Please squash and set MWP when you're ready.

Expand the "Representation" section to detail the FLUID encodings
which should be supported by a Flux instance, including the existing
representations already supported: decimal, hex, dothex and "words"
(mnemonic encoding).

Additionally, introduce a preferred alternate encoding for FLUIDs
(F58) based on Base58 (using the Bitcoin dictionary) which has
been optimized for human readability by dropping similar letters
and numbers.  (See Bitcoin Wiki reference in the text).

Add a set of "rules" that should be followed to unambiguously decode a
FLUID encoded in one of the standard representations to a 64b integer.
@mergify mergify bot merged commit 437c8bf into flux-framework:master Jul 15, 2020
@grondo grondo deleted the fse branch July 15, 2020 23:12
@grondo
Copy link
Contributor Author

grondo commented Jul 15, 2020

Cast in stone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants