[VL] Columnar shuffle write byte increase as the number of partitions increases #7696

j7nhai · 2024-10-28T03:45:08Z

Backend

VL (Velox)

Bug description

When run ssb-q4.2 with scale 100T and enable columnar shuffle writes, we found that shuffle write byte added up of all stages increase as the number of partitions increases. However, when disable gluten, the growth trend of vanilla spark is not so obvious.

spark.sql.adaptive.coalescePartitions.initialPartitionNum=k

The following table shows the shuffle write bytes sum by all stages.

	k=1000	k=2000	k=4000	k=8000	k=16000
enable gluten (with columnar shuffle)	25569016307	27297986268	29890895841	37058299593	40684916742
disable gluten	30355717816	30463886189	30595400121	31204187443	32796457022

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

The text was updated successfully, but these errors were encountered:

FelixYBW · 2024-10-28T23:16:33Z

Did you enable sort based shuffle? Hash shuffle has this issue.

j7nhai · 2024-10-29T03:40:23Z

Did you enable sort based shuffle? Hash shuffle has this issue.

Is it a bug and will be fixed in the future? May I know the reason why the disk is rising?

I didn't see any config to force enable sort based shuffle to avoid hash shuffle. Could I just decrease the value of spark.gluten.sql.columnar.shuffle.sort.columns.threshold

FelixYBW · 2024-10-29T04:00:59Z

It's the design of hash shuffle and one reason we implemented the sort shuffle.

You may decrease the two, first one is the threshold of reducer#, second one is the threshold of column#.

spark.gluten.sql.columnar.shuffle.sort.partitions.threshold
spark.gluten.sql.columnar.shuffle.sort.columns.threshold

j7nhai added bug Something isn't working triage labels Oct 28, 2024

FelixYBW changed the title ~~Columnar shuffle write byte increase as the number of partitions increases~~ [VL] Columnar shuffle write byte increase as the number of partitions increases Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] Columnar shuffle write byte increase as the number of partitions increases #7696

[VL] Columnar shuffle write byte increase as the number of partitions increases #7696

j7nhai commented Oct 28, 2024 •

edited

Loading

FelixYBW commented Oct 28, 2024

j7nhai commented Oct 29, 2024

FelixYBW commented Oct 29, 2024

[VL] Columnar shuffle write byte increase as the number of partitions increases #7696

[VL] Columnar shuffle write byte increase as the number of partitions increases #7696

Comments

j7nhai commented Oct 28, 2024 • edited Loading

Backend

Bug description

Spark version

Spark configurations

System information

Relevant logs

FelixYBW commented Oct 28, 2024

j7nhai commented Oct 29, 2024

FelixYBW commented Oct 29, 2024

j7nhai commented Oct 28, 2024 •

edited

Loading