Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve error handling #1152

Closed
maddanio opened this issue Jul 3, 2018 · 47 comments
Closed

improve error handling #1152

maddanio opened this issue Jul 3, 2018 · 47 comments
Labels
state: needs more info the author of the issue needs to provide more details state: please discuss please discuss the issue or vote for your favorite option state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated

Comments

@maddanio
Copy link

maddanio commented Jul 3, 2018

we commonly get quite horrible errors from the parser or when converting to user types. also error handling is not even mentioned in the docs.

I would propose making error handling a first class citizen, report lines/columns, give more detailed info in conversion to user types, etc.

I am willing to help, as this library is currently quite integral to our codebase

@theodelrieu
Copy link
Contributor

give more detailed info in conversion to user types

Are you talking about runtime errors here too? Or compile-time?

report lines/columns

There are new exceptions since 3.0, however they're in the detail namespace, we sould expose them publicly. parse_error shows at which byte the parsing failed. I guess it can be enhanced to contain more information.

@maddanio
Copy link
Author

maddanio commented Jul 3, 2018

i mean run-time, yeah, like
"invalid token on line x col y"
"cannot convert int at line x col y to T"
the latter would require json elements carrying source info, obviously optional, probably as an uint64 with 0 meaning no info or something. but i think it would be worth it. or one can make that compile time optional

@maddanio
Copy link
Author

maddanio commented Jul 3, 2018

so do these exceptions propagate out? thing is the user really wants line/col info, so it would be nice if the exceptio would report that also. the parser would simply have to keep track of newlines as it is chugging along. again i would say well worth it.

@maddanio
Copy link
Author

maddanio commented Jul 3, 2018

but yeah, a first remedy would be exposing them, so we can catch them and retro-provide line information. still think it would be worthwhile collecting line inf oduring parsing. its a simple matter of counting line breaks, and the number of bytes since the last line break...or do you think it would be very complicated?

@theodelrieu
Copy link
Contributor

Unfortunately, I do know very little about the parsing/lexing aspect of this library. This is a question for the maintainer :)

@maddanio
Copy link
Author

maddanio commented Jul 3, 2018

too a quick look at the parser. so adding line counting would be quite simply done in the lexer. then current_position would return row/col instead of byte count, or both. that could then be put directly into parse_error (which actually seems to already be exposed via basic_json::parse_error). adding source info to the json value is also simply done when constructing it in the parser. the sax interface would simply take source location as an argument for the events. i will think about how the source info can be made optional, probaby using a second template argument to basic_json.

@maddanio
Copy link
Author

maddanio commented Jul 3, 2018

ok, was possible, did it on out branch. i will send a pull request sometime where source locations per json value will be off by default (with no penalty) and easy to opt in to

@maddanio
Copy link
Author

maddanio commented Jul 3, 2018

also tomorrow will push to a public branch and link that here
nightnight

@maddanio
Copy link
Author

maddanio commented Jul 3, 2018

so, why wait:
branch

@nlohmann
Copy link
Owner

nlohmann commented Jul 4, 2018

This change would (strictly speaking) break compatibility with the 3.x.x versions and should not be added before version 4.0.0. This is not necessarily a bad thing - I just think that if we want to improve the diagnosis information, we may want to think bigger. For instance, we may think about using json itself as payload for the exceptions. Then we are free to choose, for each exception, what data to include. For parse errors, we thereby would not only be able to add line and column information, but also the read and expected tokens, etc. Also other issues like #932 could be address this way.

What do you think?

@nlohmann nlohmann added the state: please discuss please discuss the issue or vote for your favorite option label Jul 4, 2018
@maddanio
Copy link
Author

maddanio commented Jul 4, 2018

well, it should be doable that programs usually just compile, but abi may break, and the change would be detectable, as some types change.
Hmm, for the parse error you can add all that information, but thats the only one i think. For deserialization i think source location is enough per json entity. what is more interesting is to think of not only the one entity that has an unexpected type, but the compund entity. but to make that better the whole deserialization has to be changed i think, so that deserializing a complex type is more than a series of deserializations of fields and such. or one just wraps all of the conversions from json to user type in try/catch to enrich the exception on the way up?

@maddanio
Copy link
Author

maddanio commented Jul 4, 2018

also for the parser error you can just directly generate all that information at the throw site, i.e. in the parser, as you should have it all right there,...

@nlohmann
Copy link
Owner

nlohmann commented Jul 4, 2018

So what is your proposal?

@nlohmann
Copy link
Owner

Any ideas on this?

@maddanio
Copy link
Author

maddanio commented Jul 24, 2018 via email

@nlohmann
Copy link
Owner

My suggestion would still be just a source info type which is a template parameter. And the possibility to omit it at no cost (probably using empty base class optimization).

I don't understand. Can you make an example?

@maddanio
Copy link
Author

I am traveling, but maybe I can code it up rough next week sometime on my clone

@nlohmann
Copy link
Owner

nlohmann commented Aug 4, 2018

Any news on this?

@maddanio
Copy link
Author

maddanio commented Aug 4, 2018

i have a working solution here:
https://github.com/maddanio/json
what is missing is real support for reduced source_location types, like an empty one to opt out and one only with byte pos, which suffices for binary formats and is needed to properly pass the tesgts relying on byte-pos only.
it pases most of check-fast, except for:

        SECTION("StandardLayoutType")
        {
            CHECK(std::is_standard_layout<json>::value);
        }

why is this needed? i think this is because of the private base class, but honestly i got dizzy reading the concept...

@maddanio
Copy link
Author

maddanio commented Aug 4, 2018

btw, the base class is needed to properly support opting out, because only this way wil an empty source location storage truly be empty, due to the empty base class optimization (usually in c++ classes cann not be smaller than 1 byte)

@nlohmann
Copy link
Owner

I am currently not sure how to proceed here. I would need a concrete example with concrete expected error messages.

@nlohmann nlohmann added the state: needs more info the author of the issue needs to provide more details label Aug 15, 2018
@maddanio
Copy link
Author

example:
[json.exception.type_error.305@byte: 6608 line: 180 column: 39] cannot use operator[] with null

@maddanio
Copy link
Author

i.e. usage errors on a json object now tell me the source location of the object. helps enormously, since json does not have a schema.

@maddanio
Copy link
Author

very simple in principle. just preserve source location along with the json objects.

@maddanio maddanio reopened this Aug 20, 2018
@maddanio
Copy link
Author

oops, wrong button, sorry

@maddanio
Copy link
Author

basically this servers the use case where json is used as configuration language, and the config file is broken

@nlohmann
Copy link
Owner

This would mean an overhead of a string and at least one integer per JSON value. Right?

@maddanio
Copy link
Author

3 integers: byte pos, row, column. by the string i suppose you mean the filename. no, that is not saved. the data maybe from memory also, i.e. a rest resource loaded from http or something, but location info is still useful since the resource can also be viewed in a browser.
my idea to mitigate this was to make the representation of the source location a template parameter of the json type. by exploiting the empty base class optimization an empty representation would then imply "opting out"

@maddanio
Copy link
Author

other alternatives are shorter integer types, or ommitting any of the values (i.e. row alone could often be enough)

@nlohmann
Copy link
Owner

This would only work if the value is unchanged between parsing and the error message. What would you do if a value is changed after the fact?

@maddanio
Copy link
Author

change would mean re-assignment i guess? or do you allow in-place changes beyond container modifications? for re-assignments the source location would be overwritten too. for container modifications i would just leave the old info there. this might not be academically correct for some definition of correct, but i would consider this a case of perfect vs good.

@nlohmann
Copy link
Owner

I mean

json j = parse(...);

j["foo"] = false;

std::string s = j["foo"];

The last line would trigger an exception. For this, we would not have source locations, right?

@maddanio
Copy link
Author

yes, but i think that actually makes sense, because there is none, at least not at runtime

@maddanio
Copy link
Author

if insteade on the last line you where to index with an unknown key you would in turn get one.

@efp
Copy link

efp commented Aug 21, 2018

Howdy. I was poking around just for this, wondering if there was a way to customize the error messages, especially for source line info. I was thinking of hacking this into the parse function myself... once the parse is done, I don't see how source locations are valid.

It might be nice if the exception class had a few more fields than 'id' so we could construct our own error messages. For instance, name (ename) and desc (the what_arg) fields. And for parse errors only, perhaps byte, line and column.

I've spent all of 10 minutes looking at this, but I'd say have lexer.get_position() return a struct with byte, line, and column?

@efp
Copy link

efp commented Aug 22, 2018

P.S. If one forgets a comma between objects, e.g.

{ "foo" : 5 "bar" : 6 }

I get 'unexpected string literal; expected '}' ... should be: expected ',' or '}'

@maddanio
Copy link
Author

Customizing the message would bevrelatively easy

@maddanio
Copy link
Author

Also FYI for me it was about errors also in usage, i.e. after parsing

@efp
Copy link

efp commented Aug 22, 2018

In the code where I'm using this lib I report errors in usage by the path in the json tree, e.g.

{ "foo" : {"bar" : 1, "baz" : 2}}

might yield something like "error [foo.baz] should be a string"

But this is done with a custom front end class. I don't think that functionality belongs in this library, keeping track of that goes beyond the json spec.

But, I do agree that reporting parse errors by line and column (or at least having this info available to the client) would be quite useful. I've got this working via a 'position' struct.

@nlohmann
Copy link
Owner

@efp Thanks for reporting the issue in
#1152 (comment). Adding a column to the parse_error should be easy. I'll follow up on this.

@maddanio
Copy link
Author

@efp the last case you talked about is exactly why i added source location to the json data. while i agree that validation may not belong in the json code (though since there are things like bson json schema may also be of benefit?). but without the sourvce info a frontend cannot tell the user where the error happened in the config file.

@efp
Copy link

efp commented Aug 23, 2018

Here's what I did:

https://github.com/efp/json/tree/parse_error_lines_cols

Let me know if you'd like it as a pull request.

@nlohmann
Copy link
Owner

@efp This looks good! I would be happy to have a PR!

@stale
Copy link

stale bot commented Sep 22, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Sep 22, 2018
@stale stale bot closed this as completed Sep 29, 2018
@maddanio
Copy link
Author

maddanio commented Jan 9, 2025

@efp you still planning on making that PR? looks very useful :)

@nlohmann
Copy link
Owner

nlohmann commented Jan 9, 2025

As this is a while ago, I am not sure what is needed exactly:

The library already has line and column information for parse errors like this:

[json.exception.parse_error.101] parse error at line 1, column 2: syntax error while parsing value - invalid string: control character U+0009 (HT) must be escaped to \u0009 or \\; last read: '"<U+0009>'"

The library can be compiled with extended diagnostic messages which extends exceptions with a JSON pointer:

[json.exception.type_error.302] (/address/housenumber) type must be number, but is string

Furthermore, there is a merge request #4585 to use the recently added diagnostic positions in exceptions which allows byte positions from the input JSON text like this:

[json.exception.type_error.302] (bytes 92-95) type must be number, but is string

the latter two can be combined which will after merging #4885 create an error like this:

[json.exception.type_error.302] (/address/housenumber) (bytes 92-95) type must be number, but is string

@maddanio Do you know if anything from this issue is still open?

@maddanio
Copy link
Author

maddanio commented Jan 9, 2025

ah, no, sorry, wasnt up to date right there :-D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state: needs more info the author of the issue needs to provide more details state: please discuss please discuss the issue or vote for your favorite option state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated
Projects
None yet
Development

No branches or pull requests

4 participants