Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UT failures for Spark3.5 #5341

Closed
35 tasks done
yma11 opened this issue Apr 9, 2024 · 9 comments · Fixed by #5379, #5411, #5424, #5426 or #5463
Closed
35 tasks done

UT failures for Spark3.5 #5341

yma11 opened this issue Apr 9, 2024 · 9 comments · Fixed by #5379, #5411, #5424, #5426 or #5463
Labels
bug Something isn't working

Comments

@yma11
Copy link
Contributor

yma11 commented Apr 9, 2024

Backend

VL (Velox)

Bug description

Note that following list doesn't include the possible failures for the UTs newly added Spark3.5. Will update the list once them collected.

  • generate hash join plan - v2

  • test write parquet with compression codec

  • regr_r2

  • regr_slope

  • regr_intercept

  • regr_sxy

  • input row

  • test hive static partition write table

  • test hive write table

  • test hive write dir

  • column mapping mode = id

  • column mapping mode = name

  • delta: time travel

  • delta: partition filters

  • basic test with stats.skipping disabled

  • column mapping with complex type

  • iceberg bucketed join

  • iceberg bucketed join with partition

  • iceberg bucketed join with partition filter

  • GlutenExpressionMappingSuite

  • SPARK-26708 Cache data and cached plan should stay consistent

  • SPARK-46590 adaptive query execution works correctly with broadcast join and union

  • SPARK-46590 adaptive query execution works correctly with cartesian join and union

  • SPARK-40618: Regression test for merging subquery bug with nested subqueries

  • SPARK-40615: Check unsupported data type when decorrelating subqueries

  • SPARK-40618: Regression test for merging subquery bug with nested subqueries

  • Gluten - test for FAILFAST parsing mode

  • SPARK-36612: Support left outer join build left or right outer join build right in shuffled hash join

  • A cached table preserves the partitioning and ordering of its cached SparkPlan

  • SPARK-42782: Hive compatibility check for get_json_object

  • SPARK-43876: Enable fast hashmap for distinct queries

  • SPARK-41896: Filter on row_index and a stored column at the same time

  • SPARK-43450: Filter on full _metadata column struct

  • SPARK-43450: Filter on aliased _metadata.row_index

  • SPARK-45171: Handle evaluated nondeterministic expression (VeloxRuntimeError: the key in unnest Operator only support field)

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

@yma11 yma11 added bug Something isn't working triage labels Apr 9, 2024
@FelixYBW
Copy link
Contributor

FelixYBW commented Apr 9, 2024

does it cover all the UT failures of 3.5? If so let's pin to top

@yma11
Copy link
Contributor Author

yma11 commented Apr 10, 2024

does it cover all the UT failures of 3.5? If so let's pin to top

Not quite, there should some Spark3.5 specific UTs(newly added in Spark3.5) we haven't ported yet.

@yma11
Copy link
Contributor Author

yma11 commented Apr 15, 2024

@zhouyuan This issue is used for tracking all failures in Spark3.5 UT and seems not all of them have been addressed yet. Should we keep it open?

@gaoyangxiaozhu
Copy link
Contributor

@yma11 once the PR #5411 merged, you can mark below to be done:

SPARK-26708 Cache data and cached plan should stay consistent

SPARK-46590 adaptive query execution works correctly with broadcast join and union

SPARK-46590 adaptive query execution works correctly with cartesian join and union

SPARK-40618: Regression test for merging subquery bug with nested subqueries

SPARK-40615: Check unsupported data type when decorrelating subqueries

SPARK-40618: Regression test for merging subquery bug with nested subqueries

A cached table preserves the partitioning and ordering of its cached SparkPlan

SPARK-41896: Filter on row_index and a stored column at the same time

SPARK-43450: Filter on full _metadata column struct

SPARK-43450: Filter on aliased _metadata.row_index

@liujiayi771
Copy link
Contributor

liujiayi771 commented Apr 17, 2024

@yma11 @zhouyuan I will help to check the regr related UTs.

@gaoyangxiaozhu
Copy link
Contributor

gaoyangxiaozhu commented Apr 19, 2024

for test case SPARK-42782: Hive compatibility check for get_json_object

it fail due to this PR facebookincubator/velox@20e4678

the SIMD based get_json_object not cover case if path contains * example is .store.book[*].
Need fix in velox part.

@zhouyuan / @PHILO-HE

@gaoyangxiaozhu
Copy link
Contributor

gaoyangxiaozhu commented Apr 19, 2024

@yma11 @zhouyuan I will help to check the regr related UTs.

oh, @liujiayi771 i missed this comment, i have already send PR to fix it https://github.com/apache/incubator-gluten/pull/5466/files

@ayushi-agarwal
Copy link
Contributor

ayushi-agarwal commented Apr 19, 2024

for test case SPARK-42782: Hive compatibility check for get_json_object

it fail due to this PR facebookincubator/velox@20e4678

the SIMD based get_json_object not cover case if path contains * example is .store.book[*]. Need fix in velox part.

@zhouyuan / @PHILO-HE

@gaoyangxiaozhu I have created a PR where I have set the output to null and added a comment in front of each line which is not supported.

@gaoyangxiaozhu
Copy link
Contributor

gaoyangxiaozhu commented Apr 23, 2024

the task lists missed below 3 uts:

  1. test udf
  2. test udaf
  3. GlutenDeleteFromTableSuite

@zhouyuan / @yma11 FYI, this PR is address to enable this remaning 3 UTs, after this PR is merged, all UT with disabled for spark3.5 keyword should be covered and enabled, except Hive compatibility check for get_json_object with is address by this PR #5467

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment