Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stricter behavior for parse_json, add try_parse_json, remove to_json #12920

Merged
merged 1 commit into from
Aug 23, 2022

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Aug 18, 2022

Description

This PR adjusts the behavior of PARSE_JSON to be strict and a bit more intuitive, adds a new TRY_PARSE_JSON which will produce NULL values instead of errors for bad or non-string inputs, and removes TO_JSON which no longer had a use since its behavior has been made intrinsic to all JSON expressions.

Previously PARSE_JSON was far far too forgiving, allowing stuff like PARSE_JSON('hello world') to pass through into a string root literal even though it isn't valid JSON. Meanwhile, something like PARSE_JSON('{') would be an error. Basically, the behavior was TO_JSON if the input was not actually JSON, which is far too magical and not particularly useful (apologies this made it in like this, it was a relic of an earlier iteration).

Now, PARSE_JSON is strict and will fail with an error if provided non-string input (with the exception of null, which will output a null) or if the string contains invalid JSON. TRY_PARSE_JSON has also been added, which will return NULL for anything that is not a string containing valid JSON in the cases where PARSE_JSON would have resulted in an error.

I made some adjustments to StructuredDataProcessor to look for and 'unwrap' any StructuredData objects it happens to encounter, which would allow for the JSON expressions which produce COMPLEX<json> outputs to return them as StructuredData, but have not actually made this change to the expressions at this time. This change should be harmless, but isn't currently necessary so I can revert it if any reviewers wish.


This PR has:

  • been self-reviewed.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

@vogievetsky
Copy link
Contributor

I have not had the pleasure yet of checking out this branch and playing with it but from the description provided it sounds great. Very excited for TRY_PARSE_JSON!

Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM one minor comment regarding docs.
+1 non binding

{
public static final String NAME = "to_json_string";
public static final String NAME = "parse_json";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a doc update required for this ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, docs in #12922 were written as if this PR were already merged, (to_json_string still exists, but to_json is removed and try_parse_json is added, so this is a result of funny diff stuff i think)

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The new behavior is more internally consistent, and is therefore an improvement.

@gianm gianm merged commit 289e432 into apache:master Aug 23, 2022
@clintropolis clintropolis deleted the fix-parse-json branch August 23, 2022 03:19
@abhishekagarwal87 abhishekagarwal87 added this to the 24.0.0 milestone Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants