Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-5341] Fix some Spark 3.5 UTs #5445

Merged
merged 1 commit into from
Apr 19, 2024
Merged

[GLUTEN-5341] Fix some Spark 3.5 UTs #5445

merged 1 commit into from
Apr 19, 2024

Conversation

yma11
Copy link
Contributor

@yma11 yma11 commented Apr 18, 2024

What changes were proposed in this pull request?

Fix some Spark3.5 UTs

How was this patch tested?

CI

Copy link

#5341

Copy link

Run Gluten Clickhouse CI

@@ -88,6 +87,8 @@ class VeloxHashJoinSuite extends VeloxWholeStageTransformerSuite {
val wholeStages = plan.collect { case wst: WholeStageTransformer => wst }
if (SparkShimLoader.getSparkVersion.startsWith("3.2.")) {
assert(wholeStages.length == 1)
} else if (SparkShimLoader.getSparkVersion.startsWith("3.5.")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did it increase to 5 in 3.5? I was also debugging this and saw there were two more exchanges coming in 3.5, shall we debug why there are more exchanges?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The physical plan in spark3.5 seems changed:

*(9) Project [l_partkey#200L]
+- *(9) SortMergeJoin [l_suppkey#201L], [ps_suppkey#156L], Inner
   :- *(6) Sort [l_suppkey#201L ASC NULLS FIRST], false, 0
   :  +- Exchange hashpartitioning(l_suppkey#201L, 5), ENSURE_REQUIREMENTS, [plan_id=300]
   :     +- *(5) Project [l_partkey#200L, l_suppkey#201L]
   :        +- *(5) SortMergeJoin [l_partkey#200L], [p_partkey#123L], Inner
   :           :- *(2) Sort [l_partkey#200L ASC NULLS FIRST], false, 0
   :           :  +- Exchange hashpartitioning(l_partkey#200L, 5), ENSURE_REQUIREMENTS, [plan_id=283]
   :           :     +- *(1) Filter (isnotnull(l_partkey#200L) AND isnotnull(l_suppkey#201L))
   :           :        +- *(1) ColumnarToRow
   :           :           +- BatchScan parquet file:/root/workspace/apache_1/backends-velox/target/scala-2.12/test-classes/tpch-data-parquet-velox/lineitem[l_partkey#200L, l_suppkey#201L] ParquetScan DataFilters: [isnotnull(l_partkey#200L), isnotnull(l_suppkey#201L)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/root/workspace/apache_1/backends-velox/target/scala-2.12/test-cl..., PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(l_partkey), IsNotNull(l_suppkey)], PushedGroupBy: [], ReadSchema: struct<l_partkey:bigint,l_suppkey:bigint> RuntimeFilters: []
   :           +- *(4) Sort [p_partkey#123L ASC NULLS FIRST], false, 0
   :              +- Exchange hashpartitioning(p_partkey#123L, 5), ENSURE_REQUIREMENTS, [plan_id=292]
   :                 +- *(3) Filter isnotnull(p_partkey#123L)
   :                    +- *(3) ColumnarToRow
   :                       +- BatchScan parquet file:/root/workspace/apache_1/backends-velox/target/scala-2.12/test-classes/tpch-data-parquet-velox/part[p_partkey#123L] ParquetScan DataFilters: [isnotnull(p_partkey#123L)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/root/workspace/apache_1/backends-velox/target/scala-2.12/test-cl..., PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(p_partkey)], PushedGroupBy: [], ReadSchema: struct<p_partkey:bigint> RuntimeFilters: []
   +- *(8) Sort [ps_suppkey#156L ASC NULLS FIRST], false, 0
      +- Exchange hashpartitioning(ps_suppkey#156L, 5), ENSURE_REQUIREMENTS, [plan_id=309]
         +- *(7) Filter isnotnull(ps_suppkey#156L)
            +- *(7) ColumnarToRow
               +- BatchScan parquet file:/root/workspace/apache_1/backends-velox/target/scala-2.12/test-classes/tpch-data-parquet-velox/partsupp[ps_suppkey#156L] ParquetScan DataFilters: [isnotnull(ps_suppkey#156L)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/root/workspace/apache_1/backends-velox/target/scala-2.12/test-cl..., PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(ps_suppkey)], PushedGroupBy: [], ReadSchema: struct<ps_suppkey:bigint> RuntimeFilters: []

My understanding is that it has 4 exchanges so 5 stages.

Copy link
Contributor

@ayushi-agarwal ayushi-agarwal Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in 3.4 plan I saw there were only 2 exchanges, there was no exchange after part table and lineitem table scan for their join. Seems some regression in 3.5 resulting in 2 more exchanges.

Copy link
Contributor Author

@yma11 yma11 Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayushi-agarwal Do you want to further dig out the plan change? if so, you can open a ticket dedicated for it. Maybe related with data set or some configurations. Let's merge this PR first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will create a ticket to investigate further. Thanks @yma11

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhouyuan zhouyuan merged commit eb5b27a into apache:main Apr 19, 2024
39 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_5445_time.csv log/native_master_04_17_2024_9b3f59a1c_time.csv difference percentage
q1 36.33 36.61 0.273 100.75%
q2 23.73 24.12 0.386 101.63%
q3 37.34 37.01 -0.326 99.13%
q4 40.70 38.03 -2.669 93.44%
q5 69.09 70.92 1.831 102.65%
q6 5.72 5.81 0.097 101.70%
q7 85.18 86.12 0.938 101.10%
q8 85.61 82.86 -2.756 96.78%
q9 125.12 123.45 -1.665 98.67%
q10 42.55 45.70 3.147 107.40%
q11 20.35 20.35 -0.005 99.98%
q12 28.72 28.15 -0.573 98.00%
q13 54.60 54.46 -0.136 99.75%
q14 17.48 17.88 0.400 102.29%
q15 30.81 30.69 -0.116 99.62%
q16 14.25 14.00 -0.246 98.27%
q17 101.97 101.27 -0.698 99.32%
q18 142.95 143.83 0.879 100.62%
q19 13.73 13.61 -0.115 99.16%
q20 28.62 27.86 -0.762 97.34%
q21 287.59 285.86 -1.725 99.40%
q22 14.74 14.42 -0.321 97.82%
total 1307.17 1303.01 -4.164 99.68%

@yma11 yma11 deleted the ut-418 branch April 23, 2024 12:40
Preetesh2110 pushed a commit to Preetesh2110/incubator-gluten that referenced this pull request Apr 25, 2024
enable VeloxCacheSuite, VeloxHashJoinSuite
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants