Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mongodb: nan, inf #1599

Closed
AlexanderZvyagin opened this issue May 13, 2019 · 14 comments
Closed

mongodb: nan, inf #1599

AlexanderZvyagin opened this issue May 13, 2019 · 14 comments
Labels
kind: question state: please discuss please discuss the issue or vote for your favorite option state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated

Comments

@AlexanderZvyagin
Copy link

I have a document from mongodb which contains an array of floats: "[123, nan, inf, -inf]". It is not standard JSON, so the library correctly throws json.exception.parse_error.101

But the array has a meaning, it contains a finite float 123, not-a-number value, and +/- infinity. So the array can be converted into std::vector.

My question is: how can I achieve that in a most elegant way?! Probably the best would be to add user-defined literals (like true,false,null) which are automatically converted into floats (macros NAN and INFINITY). But I don't see that the library allows that...

Thanks a lot in advance!

@stale
Copy link

stale bot commented Jun 12, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Jun 12, 2019
@nlohmann
Copy link
Owner

The library does not support extensions like this. As NAN or INFINITY are not part of the JSON specification, they will yield parse errors. There is no elegant way in changing this other than touching the lexer and parser I'm afraid.

@stale stale bot removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Jun 15, 2019
@nlohmann nlohmann added state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated and removed state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated labels Jun 15, 2019
@nickaein
Copy link
Contributor

One dirty hack is to replace NaN, inf, -inf in the original string with a valid placeholder while keeping the track of replacements and then parse the string, but this could be very fragile and only can work is very limited circumstances.

@nlohmann What is your thoughts on providing an option so to allow and serialize/deserialize these values? I understand that these are not supported in JSON standard and the default behavior should be to reject them, but given that they are common is some use-cases, supporting them (with an opt-in option) could be very beneficial to the users since there is no clean workaround for them.

@nlohmann
Copy link
Owner

I honestly do not like extending JSON in any way (even with an option).

@AlexanderZvyagin
Copy link
Author

"replace NaN, inf, -inf in the original string" - this is what I have done. It works in my project because I don't have any strings which contains "NaN, inf". But it is just so ugly and error prone :(

But I was asking myself: if I have a 'double' value in my C++ code, and the value is allowed to be a finite, +/- inf real or NAN, how would I store it in JSON?! To build a proper JSON object I probably need to introduce a new structure to hold inf/NAN values. And an array of real numbers can easily become:

  • [0, 1.0, 'nan', '-inf']
    OR
  • [0, 1.0, {'type':'real','value':'nan'}]

Anyway, the core of the problem is incomplete JSON specs on how to represent NAN and Inf. The library (nlohmann::json) may do something to help, but if it is too difficult... Ok, let it be...

@nickaein
Copy link
Contributor

@WarShoe

the core of the problem is incomplete JSON specs on how to represent NAN and Inf

That's exactly the issue! While NaN, +Inf, -Inf are well-defined floating-point values in IEEE standard, JSON spec doesn't support them. One can argue this is an oversight in JSON spec, but nevertheless any software that generates such values for floating-point number is not following the standard and its output cannot be considered as valid JSON. Therefore, as this library follows the JSON spec, these values cannot (and should not) be serialized/deserialized.

if I have a 'double' value in my C++ code, and the value is allowed to be a finite, +/- inf real or NAN, how would I store it in JSON?!

Your proposal about encoding the NaN values as objects seems a good approach since the resulting string remains as a valid JSON string.

@nickaein
Copy link
Contributor

@nlohmann
Besides the cost-benefit argument, what are the other reasons that you think such feature is not desirable to have?

I imagine there are three approaches to support such customization (in order of discreetness):

  1. Modify current lexer/parser to recognize these special values given an option during run-time.
  2. Enable such option during compile time using template programming.
  3. Making the lexer pluggable: We already have SAX interface for the parser. I imagine if the same can be done for the lexer.

Approach 1 is not desirable since it impacts the performance of lexer/parser by introducing more branching.

Approach 2 is less intrusive and doesn't impact the performance for the users who don't enable the option. It adds more complexity to the current lexer/parser code though.

For Approach 3, the user has to do some work but it is a general solution and gives more flexibility since the whole lexer can be substituted. This can be helpful in other circumstances too.

While Approach 1 is off the table, I wonder whether Approach 2 or 3 can be implement without any negative impact either on the performance or design of the library.

Nevertheless, I understand that regardless of which approach we choose, there still could be a considerable amount of work that should be done in other places too (e.g. handling CBOR/MessagePack for these values). But as some libraries support these cases (e.g. Newtonsoft.Json for .Net or allow_nan option of builtin json package in python), I think it worth considering it, either as a narrow option (e.g. on/off option for parsing NaN/Inf) or as a boarder solution (e.g. Approach 3) that gives more flexibility.

@nlohmann
Copy link
Owner

I do not like any extension of this library that makes it possible to process non-conforming JSON text. The reasons are similar to those regarding comments, see https://github.com/nlohmann/json#comments-in-json.

If anything is possible with little effort, I would however reconsider it.

@AlexanderZvyagin
Copy link
Author

I don't think that support for comments (major change) and adding 'nan', 'inf' keywords (minor, the library already has the relevant code) are comparable. I also think it should be quite simple (for the library maintainer) to do. When I looked at the code, my starting point was here:

            // literals
            case 't':
                return scan_literal("true", 4, token_type::literal_true);
            case 'f':
                return scan_literal("false", 5, token_type::literal_false);
            case 'n':
                return scan_literal("null", 4, token_type::literal_null);

But then I decided to simplify my life, and instead of changing the library code, I have started this thread :)

@nlohmann
Copy link
Owner

Yes, you're right, and I forgot how Python's json package is doing it.

If allow_nan is false (default: True), then it will be a ValueError to serialize out of range float values (nan, inf, -inf) in strict compliance of the JSON specification. If allow_nan is true, their JavaScript equivalents (NaN, Infinity, -Infinity) will be used.

I'll check.

@nlohmann
Copy link
Owner

While it is easy to extend the internal parser/lexer to cope with NaN and the like, adding another bool to the parse functions makes the interface even uglier than it is. We already have the following parameters:

  • detail::input_adapter or (IteratorType first, IteratorType last) to describe where to read from
  • parser_callback_t = nullptr for an optional callback function
  • bool allow_exceptions = true to control whether we want to throw exceptions on parse errors
  • SAX* a SAX consumer
  • bool strict = true whether the input has to be consumed completely (currently only used with `operator>> where it is always false or with the binary formats)

So maybe a parse_option_t struct would be helpful to simplify the interfaces. What do you think? I would like to address this issue before thinking about another additional parameter.

@nlohmann nlohmann added the state: please discuss please discuss the issue or vote for your favorite option label Jun 30, 2019
@nickaein
Copy link
Contributor

So maybe a parse_option_t struct would be helpful to simplify the interfaces. What do you think?

That would be great! This also makes the addition of parameters easier in the future. The code on the call site will be also more readable.

I believe the same thing can be done for dump() method too if there were ever the need to extend its parameters

@stale
Copy link

stale bot commented Aug 11, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Aug 11, 2019
@stale stale bot closed this as completed Aug 18, 2019
@amcgregor
Copy link

Stale, old, but a rationale for needing to store +Inf and -Inf: scoring weights. In a recommendation engine I'm working on, individual recommender plugins may suppress (-Inf weight, 1 score; no positive contribution to the score will be effective) or force (+Inf weight, 1 score; no negative contribution will be effective) for specific recommendations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: question state: please discuss please discuss the issue or vote for your favorite option state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated
Projects
None yet
Development

No branches or pull requests

4 participants