mongodb: nan, inf #1599

AlexanderZvyagin · 2019-05-13T05:51:01Z

I have a document from mongodb which contains an array of floats: "[123, nan, inf, -inf]". It is not standard JSON, so the library correctly throws json.exception.parse_error.101

But the array has a meaning, it contains a finite float 123, not-a-number value, and +/- infinity. So the array can be converted into std::vector.

My question is: how can I achieve that in a most elegant way?! Probably the best would be to add user-defined literals (like true,false,null) which are automatically converted into floats (macros NAN and INFINITY). But I don't see that the library allows that...

Thanks a lot in advance!

stale · 2019-06-12T06:40:29Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

nlohmann · 2019-06-15T06:31:51Z

The library does not support extensions like this. As NAN or INFINITY are not part of the JSON specification, they will yield parse errors. There is no elegant way in changing this other than touching the lexer and parser I'm afraid.

nickaein · 2019-06-15T09:40:26Z

One dirty hack is to replace NaN, inf, -inf in the original string with a valid placeholder while keeping the track of replacements and then parse the string, but this could be very fragile and only can work is very limited circumstances.

@nlohmann What is your thoughts on providing an option so to allow and serialize/deserialize these values? I understand that these are not supported in JSON standard and the default behavior should be to reject them, but given that they are common is some use-cases, supporting them (with an opt-in option) could be very beneficial to the users since there is no clean workaround for them.

nlohmann · 2019-06-15T13:18:58Z

I honestly do not like extending JSON in any way (even with an option).

AlexanderZvyagin · 2019-06-15T13:39:12Z

"replace NaN, inf, -inf in the original string" - this is what I have done. It works in my project because I don't have any strings which contains "NaN, inf". But it is just so ugly and error prone :(

But I was asking myself: if I have a 'double' value in my C++ code, and the value is allowed to be a finite, +/- inf real or NAN, how would I store it in JSON?! To build a proper JSON object I probably need to introduce a new structure to hold inf/NAN values. And an array of real numbers can easily become:

[0, 1.0, 'nan', '-inf']
OR
[0, 1.0, {'type':'real','value':'nan'}]

Anyway, the core of the problem is incomplete JSON specs on how to represent NAN and Inf. The library (nlohmann::json) may do something to help, but if it is too difficult... Ok, let it be...

nickaein · 2019-06-15T14:30:51Z

@WarShoe

the core of the problem is incomplete JSON specs on how to represent NAN and Inf

That's exactly the issue! While NaN, +Inf, -Inf are well-defined floating-point values in IEEE standard, JSON spec doesn't support them. One can argue this is an oversight in JSON spec, but nevertheless any software that generates such values for floating-point number is not following the standard and its output cannot be considered as valid JSON. Therefore, as this library follows the JSON spec, these values cannot (and should not) be serialized/deserialized.

if I have a 'double' value in my C++ code, and the value is allowed to be a finite, +/- inf real or NAN, how would I store it in JSON?!

Your proposal about encoding the NaN values as objects seems a good approach since the resulting string remains as a valid JSON string.

nickaein · 2019-06-15T15:03:06Z

@nlohmann
Besides the cost-benefit argument, what are the other reasons that you think such feature is not desirable to have?

I imagine there are three approaches to support such customization (in order of discreetness):

Modify current lexer/parser to recognize these special values given an option during run-time.
Enable such option during compile time using template programming.
Making the lexer pluggable: We already have SAX interface for the parser. I imagine if the same can be done for the lexer.

Approach 1 is not desirable since it impacts the performance of lexer/parser by introducing more branching.

Approach 2 is less intrusive and doesn't impact the performance for the users who don't enable the option. It adds more complexity to the current lexer/parser code though.

For Approach 3, the user has to do some work but it is a general solution and gives more flexibility since the whole lexer can be substituted. This can be helpful in other circumstances too.

While Approach 1 is off the table, I wonder whether Approach 2 or 3 can be implement without any negative impact either on the performance or design of the library.

Nevertheless, I understand that regardless of which approach we choose, there still could be a considerable amount of work that should be done in other places too (e.g. handling CBOR/MessagePack for these values). But as some libraries support these cases (e.g. Newtonsoft.Json for .Net or allow_nan option of builtin json package in python), I think it worth considering it, either as a narrow option (e.g. on/off option for parsing NaN/Inf) or as a boarder solution (e.g. Approach 3) that gives more flexibility.

nlohmann · 2019-06-22T08:06:20Z

I do not like any extension of this library that makes it possible to process non-conforming JSON text. The reasons are similar to those regarding comments, see https://github.com/nlohmann/json#comments-in-json.

If anything is possible with little effort, I would however reconsider it.

AlexanderZvyagin · 2019-06-22T09:14:15Z

I don't think that support for comments (major change) and adding 'nan', 'inf' keywords (minor, the library already has the relevant code) are comparable. I also think it should be quite simple (for the library maintainer) to do. When I looked at the code, my starting point was here:

            // literals
            case 't':
                return scan_literal("true", 4, token_type::literal_true);
            case 'f':
                return scan_literal("false", 5, token_type::literal_false);
            case 'n':
                return scan_literal("null", 4, token_type::literal_null);

But then I decided to simplify my life, and instead of changing the library code, I have started this thread :)

nlohmann · 2019-06-22T16:15:21Z

Yes, you're right, and I forgot how Python's json package is doing it.

If allow_nan is false (default: True), then it will be a ValueError to serialize out of range float values (nan, inf, -inf) in strict compliance of the JSON specification. If allow_nan is true, their JavaScript equivalents (NaN, Infinity, -Infinity) will be used.

I'll check.

nlohmann · 2019-06-30T10:33:19Z

While it is easy to extend the internal parser/lexer to cope with NaN and the like, adding another bool to the parse functions makes the interface even uglier than it is. We already have the following parameters:

detail::input_adapter or (IteratorType first, IteratorType last) to describe where to read from
parser_callback_t = nullptr for an optional callback function
bool allow_exceptions = true to control whether we want to throw exceptions on parse errors
SAX* a SAX consumer
bool strict = true whether the input has to be consumed completely (currently only used with `operator>> where it is always false or with the binary formats)

So maybe a parse_option_t struct would be helpful to simplify the interfaces. What do you think? I would like to address this issue before thinking about another additional parameter.

nickaein · 2019-07-12T18:49:34Z

So maybe a parse_option_t struct would be helpful to simplify the interfaces. What do you think?

That would be great! This also makes the addition of parameters easier in the future. The code on the call site will be also more readable.

I believe the same thing can be done for dump() method too if there were ever the need to extend its parameters

stale · 2019-08-11T19:25:43Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

amcgregor · 2024-02-05T15:46:53Z

Stale, old, but a rationale for needing to store +Inf and -Inf: scoring weights. In a recommendation engine I'm working on, individual recommender plugins may suppress (-Inf weight, 1 score; no positive contribution to the score will be effective) or force (+Inf weight, 1 score; no negative contribution will be effective) for specific recommendations.

AlexanderZvyagin added the kind: question label May 13, 2019

stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Jun 12, 2019

stale bot removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Jun 15, 2019

nlohmann added state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated and removed state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated labels Jun 15, 2019

nlohmann added the state: please discuss please discuss the issue or vote for your favorite option label Jun 30, 2019

stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Aug 11, 2019

stale bot closed this as completed Aug 18, 2019

nlohmann mentioned this issue Oct 23, 2019

Add support for parse stack limiting #1788

Closed

vinllen mentioned this issue Oct 14, 2020

mongo-shake-v2.4.7同步mongo集群数据到kafka刷屏报错 alibaba/MongoShake#430

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mongodb: nan, inf #1599

mongodb: nan, inf #1599

AlexanderZvyagin commented May 13, 2019

stale bot commented Jun 12, 2019

nlohmann commented Jun 15, 2019

nickaein commented Jun 15, 2019

nlohmann commented Jun 15, 2019

AlexanderZvyagin commented Jun 15, 2019

nickaein commented Jun 15, 2019

nickaein commented Jun 15, 2019

nlohmann commented Jun 22, 2019

AlexanderZvyagin commented Jun 22, 2019

nlohmann commented Jun 22, 2019

nlohmann commented Jun 30, 2019

nickaein commented Jul 12, 2019

stale bot commented Aug 11, 2019

amcgregor commented Feb 5, 2024

mongodb: nan, inf #1599

mongodb: nan, inf #1599

Comments

AlexanderZvyagin commented May 13, 2019

stale bot commented Jun 12, 2019

nlohmann commented Jun 15, 2019

nickaein commented Jun 15, 2019

nlohmann commented Jun 15, 2019

AlexanderZvyagin commented Jun 15, 2019

nickaein commented Jun 15, 2019

nickaein commented Jun 15, 2019

nlohmann commented Jun 22, 2019

AlexanderZvyagin commented Jun 22, 2019

nlohmann commented Jun 22, 2019

nlohmann commented Jun 30, 2019

nickaein commented Jul 12, 2019

stale bot commented Aug 11, 2019

amcgregor commented Feb 5, 2024