Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] JsonToStructs and ScanJson do not normalize numeric output when read as a string #10458

Open
revans2 opened this issue Feb 21, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@revans2
Copy link
Collaborator

revans2 commented Feb 21, 2024

Describe the bug
This is almost identical to #10218, but is for from_json and reading json lines formatted files.

Numbers like 1.00000 and -0 are not normalized to match what Apache Spark would do.

@revans2 revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Feb 21, 2024
@revans2
Copy link
Collaborator Author

revans2 commented Feb 22, 2024

Another odd example of this is +INF and -INF. Even if allowNonNumericNumbers is disabled +INF and -INF are valid floats and are normalized to "Infinity" and "-Infinity" respectively. And the quotes come out in the string itself. This is also true for unquoted Infinity, -Infinity, and NaN

@revans2
Copy link
Collaborator Author

revans2 commented Jun 25, 2024

Technically in Spark 4.0 this was reverted (at least for scan by default)

https://issues.apache.org/jira/browse/SPARK-48148

apache/spark#46408

This functionality was put under a config spark.sql.json.enableExactStringParsing with it on by default.

It appears to work for scan, but not for get_json_object. It also does not remove the white space any longer or normalize single quotes, which will make things a lot more interesting to try and make this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants