Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Columnar shuffle write byte increase as the number of partitions increases #7696

Open
j7nhai opened this issue Oct 28, 2024 · 3 comments
Labels
bug Something isn't working triage

Comments

@j7nhai
Copy link
Contributor

j7nhai commented Oct 28, 2024

Backend

VL (Velox)

Bug description

When run ssb-q4.2 with scale 100T and enable columnar shuffle writes, we found that shuffle write byte added up of all stages increase as the number of partitions increases. However, when disable gluten, the growth trend of vanilla spark is not so obvious.

spark.sql.adaptive.coalescePartitions.initialPartitionNum=k

The following table shows the shuffle write bytes sum by all stages.

k=1000 k=2000 k=4000 k=8000 k=16000
enable gluten (with columnar shuffle) 25569016307 27297986268 29890895841 37058299593 40684916742
disable gluten 30355717816 30463886189 30595400121 31204187443 32796457022

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

@j7nhai j7nhai added bug Something isn't working triage labels Oct 28, 2024
@FelixYBW FelixYBW changed the title Columnar shuffle write byte increase as the number of partitions increases [VL] Columnar shuffle write byte increase as the number of partitions increases Oct 28, 2024
@FelixYBW
Copy link
Contributor

Did you enable sort based shuffle? Hash shuffle has this issue.

@j7nhai
Copy link
Contributor Author

j7nhai commented Oct 29, 2024

Did you enable sort based shuffle? Hash shuffle has this issue.

Is it a bug and will be fixed in the future? May I know the reason why the disk is rising?

I didn't see any config to force enable sort based shuffle to avoid hash shuffle. Could I just decrease the value of spark.gluten.sql.columnar.shuffle.sort.columns.threshold

@FelixYBW
Copy link
Contributor

It's the design of hash shuffle and one reason we implemented the sort shuffle.

You may decrease the two, first one is the threshold of reducer#, second one is the threshold of column#.

spark.gluten.sql.columnar.shuffle.sort.partitions.threshold
spark.gluten.sql.columnar.shuffle.sort.columns.threshold

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants