Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BLAZE-287] Integrate Blaze with Celeborn #596

Merged
merged 1 commit into from
Sep 27, 2024

Conversation

RexXiong
Copy link
Contributor

Which issue does this PR close?

Closes #287 .

What changes are included in this PR?

Celeborn is an intermediate data service for Big Data compute engines (i.e. ETL, OLAP and Streaming engines) to boost performance, stability, and flexibility. Intermediate data typically include shuffle and spilled data. Integrate Blaze with Celeborn(0.5.1 latest stable version), will benefit Blaze + Spark in terms of shuffle performance and stability, allowing for a greater focus on optimizing computations.

Are there any user-facing changes?

No

Testing

Testing Blaze + Spark3.4+ Celeborn 0.5.1 use Some TPCDS queries with Flowing Configurations

  • spark.shuffle.manager org.apache.spark.sql.execution.blaze.shuffle.celeborn.BlazeCelebornShuffleManager
  • spark.celeborn.master.endpoints master-1-1:9097
  • spark.blaze.enable true
  • spark.sql.extensions org.apache.spark.sql.blaze.BlazeSparkSessionExtension
  • spark.memory.offHeap.enabled false

@richox richox merged commit 6918f75 into kwai:master Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Supports apache-celeborn
2 participants