Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL][1.2] Result mismatch of get_json_object when json string has newline #7777

Open
wForget opened this issue Nov 1, 2024 · 9 comments
Open
Labels
bug Something isn't working triage

Comments

@wForget
Copy link
Member

wForget commented Nov 1, 2024

Backend

VL (Velox)

Bug description

sql:

select get_json_object('{"c1":"test\ntest"}', '$.c1')

result of gluten 1.2.0 with velox:

+--------------------------------------------+--+
| get_json_object({"c1":"test
test"}, $.c1)  |
+--------------------------------------------+--+
| NULL                                       |
+--------------------------------------------+--+

result of valilla spark:

+--------------------------------------------+--+
| get_json_object({"c1":"test
test"}, $.c1)  |
+--------------------------------------------+--+
| test
test                                  |
+--------------------------------------------+--+

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

@wForget wForget added bug Something isn't working triage labels Nov 1, 2024
@rui-mo
Copy link
Contributor

rui-mo commented Nov 4, 2024

cc: @PHILO-HE

@PHILO-HE
Copy link
Contributor

PHILO-HE commented Nov 4, 2024

@wForget, it's strange. I just applied the below patch to test your case on Velox side (1.2.0 velox branch), the test passed.

diff --git a/velox/functions/sparksql/tests/JsonFunctionsTest.cpp b/velox/functions/sparksql/tests/JsonFunctionsTest.cpp
index c0c8ecc90..f9448733a 100644
--- a/velox/functions/sparksql/tests/JsonFunctionsTest.cpp
+++ b/velox/functions/sparksql/tests/JsonFunctionsTest.cpp
@@ -119,5 +119,9 @@ TEST_F(GetJsonObjectTest, nullResult) {
       std::nullopt);
 }
 
+TEST_F(GetJsonObjectTest, escaped) {
+  EXPECT_EQ(getJsonObject(R"({"c1":"test\ntest"})", "$.c1"), "test\ntest");
+}
+
 } // namespace
 } // namespace facebook::velox::functions::sparksql::test

@wForget
Copy link
Member Author

wForget commented Nov 4, 2024

R"({"c1":"test\ntest"})"

Does this mean that \n is not escaped?

@PHILO-HE
Copy link
Contributor

PHILO-HE commented Nov 4, 2024

@wForget, no, it's escaped. Just verified by printing getJsonObject(R"({"c1":"test\ntest"})", "$.c1")

@wForget
Copy link
Member Author

wForget commented Nov 4, 2024

@wForget, no, it's escaped. Just verified by printing getJsonObject(R"({"c1":"test\ntest"})", "$.c1")

Could you try:

const std::string json= R"(
  {
    "c1":"test
test"
  }
  )";
getJsonObject(json, "$.c1")

@wForget
Copy link
Member Author

wForget commented Nov 4, 2024

@wForget
Copy link
Member Author

wForget commented Nov 4, 2024

It seems that SINGLE QUOTES is also not allowed.

select get_json_object('{\'c1\':\'test test\'}', '$.c1');

gluten disabled:

+--------------------------------------------+--+
| get_json_object({'c1':'test test'}, $.c1)  |
+--------------------------------------------+--+
| test test                                  |
+--------------------------------------------+--+

gluten enabled:

+--------------------------------------------+--+
| get_json_object({'c1':'test test'}, $.c1)  |
+--------------------------------------------+--+
| NULL                                       |
+--------------------------------------------+--+

@PHILO-HE
Copy link
Contributor

PHILO-HE commented Nov 4, 2024

@wForget, it's a known incompatibility issue in using single quotes. See doc link.

As far as I know, using single quote to enclose JSON content is not allowed in JSON standard. Not sure why Spark allows using it to replace double quote. We have no plan to support it.

@PHILO-HE
Copy link
Contributor

PHILO-HE commented Nov 5, 2024

@wForget, it's strange. I just applied the below patch to test your case on Velox side (1.2.0 velox branch), the test passed.

diff --git a/velox/functions/sparksql/tests/JsonFunctionsTest.cpp b/velox/functions/sparksql/tests/JsonFunctionsTest.cpp
index c0c8ecc90..f9448733a 100644
--- a/velox/functions/sparksql/tests/JsonFunctionsTest.cpp
+++ b/velox/functions/sparksql/tests/JsonFunctionsTest.cpp
@@ -119,5 +119,9 @@ TEST_F(GetJsonObjectTest, nullResult) {
       std::nullopt);
 }
 
+TEST_F(GetJsonObjectTest, escaped) {
+  EXPECT_EQ(getJsonObject(R"({"c1":"test\ntest"})", "$.c1"), "test\ntest");
+}
+
 } // namespace
 } // namespace facebook::velox::functions::sparksql::test

Using regular string instead of raw string can reproduce this issue. It also occurs on the main branch. I found Presto also allows control characters, like Spark. We may have to change simdjson's code to fix this issue. But not sure whether it is acceptable. See Velox PR: facebookincubator/velox#11433

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

3 participants