[BUG] JsonToStructs and ScanJson do not normalize numeric output when read as a string #10458

revans2 · 2024-02-21T21:57:58Z

Describe the bug
This is almost identical to #10218, but is for from_json and reading json lines formatted files.

Numbers like 1.00000 and -0 are not normalized to match what Apache Spark would do.

revans2 · 2024-02-22T20:50:30Z

Another odd example of this is +INF and -INF. Even if allowNonNumericNumbers is disabled +INF and -INF are valid floats and are normalized to "Infinity" and "-Infinity" respectively. And the quotes come out in the string itself. This is also true for unquoted Infinity, -Infinity, and NaN

revans2 · 2024-06-25T16:34:57Z

Technically in Spark 4.0 this was reverted (at least for scan by default)

https://issues.apache.org/jira/browse/SPARK-48148

apache/spark#46408

This functionality was put under a config spark.sql.json.enableExactStringParsing with it on by default.

It appears to work for scan, but not for get_json_object. It also does not remove the white space any longer or normalize single quotes, which will make things a lot more interesting to try and make this work.

revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Feb 21, 2024

revans2 mentioned this issue Feb 26, 2024

[FEA] JSON input support #9

Open

62 tasks

mattahrens removed the ? - Needs Triage Need team to review and classify label Feb 27, 2024

revans2 mentioned this issue Mar 15, 2024

[FEA] JSON number normalization when returned as a string rapidsai/cudf#15318

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] JsonToStructs and ScanJson do not normalize numeric output when read as a string #10458

[BUG] JsonToStructs and ScanJson do not normalize numeric output when read as a string #10458

revans2 commented Feb 21, 2024

revans2 commented Feb 22, 2024 •

edited

Loading

revans2 commented Jun 25, 2024

[BUG] JsonToStructs and ScanJson do not normalize numeric output when read as a string #10458

[BUG] JsonToStructs and ScanJson do not normalize numeric output when read as a string #10458

Comments

revans2 commented Feb 21, 2024

revans2 commented Feb 22, 2024 • edited Loading

revans2 commented Jun 25, 2024

revans2 commented Feb 22, 2024 •

edited

Loading