Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON and 64-bit integer precision in JS #153

Open
AndrewJDR opened this issue Jun 19, 2020 · 8 comments
Open

JSON and 64-bit integer precision in JS #153

AndrewJDR opened this issue Jun 19, 2020 · 8 comments

Comments

@AndrewJDR
Copy link
Contributor

AndrewJDR commented Jun 19, 2020

Hello,

Given the well-known issue of many (all?) popular javascript implementations not supporting 64-bit values in their JSON.parse function, has an optional alternate encoding for 64-bit ints in flatcc's JSON printer/parser been considered?

A couple of options would be either a base10 string or a base64 string -- base10 is probably nicer just for readability. I imagine the printer flag would be something like flatcc_json_printer_f_quoted_longs and the parser would just automatically know to treat a string in a long field as either base10 or base64 (depending upon what is settled upon).

I will consider a patch for this but I wanted to see if it's something you've explored already and whether I'm missing something obviously problematic about it (apart from maybe being a non-standard feature vs. the flatc project).

This is something that can and imo should ultimately be better handled on the javascript side, but it's not a very pleasant option at times. Also, the impetus for this is that when doing round-tripping (flatcc print -> Javascript's JSON.parse -> flatcc parse) there is loss of information.

Reference:
https://github.com/sidorares/json-bigint

@AndrewJDR AndrewJDR changed the title JSON and 64-bit integer precision JSON and 64-bit integer precision in JS Jun 19, 2020
@mikkelfj
Copy link
Contributor

Hmm, I'm not too keen on this. Mainly because

  1. it seems like a hack, and
  2. it would increase code size of the runtime (but maybe not much)
  3. another check to slow down normal operation for an edge case - given the proposed runtime flag.

From an architectural perspective this can be done fairly easily and there is already precedence: enum values can be either numeric or named strings.

Since you also mention roundtrip, this implies also parsing strings as numbers. Ignoring the 64-bit thing, a case could be made for supporting this in general, via a runtime option. The reason being that some JSON generators are sloppy in this respect. But then gain I have never heard of this being an issue before in praxis and it would add overhead in various places. The parser can do this without slowing down because it will try an alternative only when the common case is about to report an error. But this special handling can increase the already large amount of generated parser code.

Aside from this my thinking is that integers in 64-bit still works perfectly fine in Javascript as long as they are not too large, so it is not the 64-bit value in itself that is the problem. For big values, yes, a generic parser would accept this, but it would be the wrong type and just defer the problem. Of course you if you are aware of this, you can test for strings on all numeric values and do something, but this is where it seems like a hack more than anything.

As you say, this is much better dealt with in Javascript. Also, assuming the messages are not too large, and time is not of essence, if you really need to fix this, you could probably write a function to scan the input for long intergers and convert them into strings, at least until you get around to fixing your JSON parser.

Likewise, the flatcc JSON parser will fail if it for example receives a large 32 bit value in field of type int16, or a negative number for an unsigned type. The philosophy is to fail if processing is not safe, with the exception of optionally ignoring unknown fields.

What is holding you back from addressing this javascript side instead?

@mikkelfj
Copy link
Contributor

Actually I have another concern as well: I have an unpublished configuration language based on the same parsing engine which is a strict superset of valid FlatBuffers JSON. There is a lot of resolvable ambiguity in the grammar, but if JSON itself also starts to become more ambiguous this could limit the expressiveness of the language, or require odd restrictions on flags that are hard to explain.

@AndrewJDR
Copy link
Contributor Author

AndrewJDR commented Jun 21, 2020

What is holding you back from addressing this javascript side instead?

If end users of an API are working with the json produced by flatcc's printer, and they're using javascript, they'll have to install non-standard modules such as the json-bigint module mentioned in order to safely work with the json (otherwise when that JSON round-trips back to a flatbuffer, information will be lost). Also, in this case, the API user may have no knowledge at all that flatbuffers are even being employed, as the JSON<->flatbuffer conversions may take place on a server. So it's just too cumbersome of an implementation detail to burden an API user with when they're accustomed to JSON things "just working" out of the box in javascript-land.

No worries though if it's not something you want mainlined, I'll probably just hack it into my copy of flatcc.

Also, with all of this said, all of these 64-bit fields I'm thinking of in our schema are used to store data hashes, so I'm considering just changing our schema to use a struct of two 32 bit ints instead in these cases. And I'll just try my best to avoid using 64 bit values entirely moving forward, since it turns out to be a dicey situation if "API users" are handling the JSON part of a flatbuffer round-tripping workflow. I'm now leaning toward this, it seems the path of least resistance.

@mikkelfj
Copy link
Contributor

Yeah, if you are concerned with arbitrary unknown API users, it seems like 32 bit fields would be better since I don't know how a string of 64 bit values would help them beyond not getting an immediate error message - except perhaps transparent round tripping.

It might be worth discussing this on Flatbuffers Discord forum or google group, if it is considered a broad problem.

@AndrewJDR
Copy link
Contributor Author

AndrewJDR commented Jun 21, 2020

I don't know how a string of 64 bit values would help them beyond not getting an immediate error message - except perhaps transparent round tripping

Yeah, as much as I dislike it, using strings for large ints has apparently emerged as an informal convention to move json around within javascript-land because of the 53 bits of precision issue that exists there (i.e. they use double floats to represent all numbers).

Thanks for the discord/google group, if I'm ever feeling motivated enough to propose it I may do that. For now though, the struct workaround seems quite tractable.

@mikkelfj
Copy link
Contributor

I posted a reference to here on Discord earlier today.

I am a bit concerned if this is a broader issue when using flatcc as a JSON gateway, but I still do not feel at ease about this, and it would also affect Googles flatc tool that also produce JSON.

@mikkelfj
Copy link
Contributor

mikkelfj commented Jun 21, 2020

If you feel like it, you can make a PR with the feature behind a feature flag. The parser already has several of these. This will make no change to normal releases but can be flagged in at compile time. There might be both a static flag to enable the feature and a runtime to activate it for a specific instance iff the static feature is flagged in.

For example:

/*
* Always print multipe enum flags like `color: "Red Green"`
* even when unquote is selected as an option for single
* value like `color: Green`. Otherwise multiple values
* are printed as `color: Red Green`, but this could break
* some flatbuffer json parser.
*/
#ifndef FLATCC_JSON_PRINT_ALWAYS_QUOTE_MULTIPLE_FLAGS
#define FLATCC_JSON_PRINT_ALWAYS_QUOTE_MULTIPLE_FLAGS 1
#endif

@mikkelfj
Copy link
Contributor

Maybe that was a bad example, I think exactly this feature is being forced always in the future exactly because of conflicts with mentioned config language. But the point is, there is a mechanism to flag such digressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants