Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-5341][VL][TEST] Fix SPARK-42782: Hive compatibility check for get_json_object #5467

Merged
merged 3 commits into from
Apr 25, 2024

Conversation

ayushi-agarwal
Copy link
Contributor

What changes were proposed in this pull request?

Fixes get_json_object test introduced in spark 3.5 to check compatibility with hive. * character is not supported in velox in json queries.

(Fixes: #5341)

How was this patch tested?

Ran in local

Copy link

#5341

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

@ayushi-agarwal
Copy link
Contributor Author

@gaoyangxiaozhu @yma11

@ayushi-agarwal
Copy link
Contributor Author

@zhli1142015 Could you please review? Thanks

@@ -126,7 +126,7 @@ class VeloxTestSettings extends BackendTestSettings {
enableSuite[GlutenHashExpressionsSuite]
enableSuite[GlutenIntervalExpressionsSuite]
enableSuite[GlutenJsonFunctionsSuite]
// Disable for Spark3.5.
// * in get_json_object expression not supported in velox
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any issue to track this, if not please open one, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will open one, this is the PR where get_json_object support was added facebookincubator/velox@20e4678

@zhouyuan
Copy link
Contributor

CC @PHILO-HE

runTest(json, "$.store.bicycle", "{\"price\":19.95,\"color\":\"red\"}")
runTest(json, "$.store.book", book)
runTest(json, "$.store.book[0]", book0)
runTest(json, "$.store.book[*]", null) // not supported in velox
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As null is not the expected result for Spark user, I think we should comment or just remove such test cases. And it would be nice to document this limitation here: https://github.com/apache/incubator-gluten/blob/main/docs/velox-backend-limitations.md#json-functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks for the suggestion @PHILO-HE

Copy link

Run Gluten Clickhouse CI

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one nit.

docs/velox-backend-limitations.md Outdated Show resolved Hide resolved
Copy link

Run Gluten Clickhouse CI

@ayushi-agarwal
Copy link
Contributor Author

Only one nit.

I have addressed it, thanks.

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@PHILO-HE PHILO-HE changed the title [GLUTEN-5341] Fixes SPARK-42782: Hive compatibility check for get_json_object [GLUTEN-5341][VL] Fix SPARK-42782: Hive compatibility check for get_json_object Apr 25, 2024
@PHILO-HE PHILO-HE merged commit f1b0054 into apache:main Apr 25, 2024
43 of 44 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_5467_time.csv log/native_master_04_24_2024_89015c915_time.csv difference percentage
q1 35.29 36.07 0.785 102.23%
q2 26.27 27.37 1.095 104.17%
q3 36.92 37.10 0.184 100.50%
q4 41.61 41.34 -0.265 99.36%
q5 69.01 69.85 0.842 101.22%
q6 5.91 7.48 1.566 126.48%
q7 86.53 87.34 0.813 100.94%
q8 86.59 85.05 -1.541 98.22%
q9 125.51 124.31 -1.197 99.05%
q10 44.55 45.36 0.810 101.82%
q11 20.46 20.43 -0.031 99.85%
q12 27.62 28.57 0.951 103.44%
q13 55.70 54.36 -1.344 97.59%
q14 19.08 18.41 -0.663 96.53%
q15 31.26 30.50 -0.757 97.58%
q16 13.49 13.97 0.477 103.54%
q17 103.04 102.31 -0.733 99.29%
q18 147.67 147.87 0.207 100.14%
q19 13.64 13.55 -0.088 99.36%
q20 27.55 28.75 1.202 104.36%
q21 284.97 288.81 3.846 101.35%
q22 14.65 15.13 0.483 103.30%
total 1317.31 1323.95 6.640 100.50%

Preetesh2110 pushed a commit to Preetesh2110/incubator-gluten that referenced this pull request Apr 25, 2024
@ulysses-you ulysses-you changed the title [GLUTEN-5341][VL] Fix SPARK-42782: Hive compatibility check for get_json_object [GLUTEN-5341][VL][TEST] Fix SPARK-42782: Hive compatibility check for get_json_object Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UT failures for Spark3.5
5 participants