Skip to content

Commit

Permalink
[SPARK-49447][K8S] Fix spark.kubernetes.allocation.batch.delay to p…
Browse files Browse the repository at this point in the history
…revent small values less than 100

### What changes were proposed in this pull request?

This PR aims to fix `spark.kubernetes.allocation.batch.delay` to prevent small values less than 100 from Apache Spark 4.0.0.

### Why are the changes needed?

The default value is `1s` (=1000). Usually, a small value like `1` happens due to the missing unit, `s`, when users do mistakes. We had better prevent this. In addition, a misconfigured value like `1` can cause a high frequency traffic from Spark drivers to K8s control plan accidentally.

### Does this PR introduce _any_ user-facing change?

For the misconfigured values, Spark will complain and fail from Apache Spark 4.0.0.

### How was this patch tested?

Pass the CIs with the newly added test case.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47913 from dongjoon-hyun/SPARK-49447.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
  • Loading branch information
dongjoon-hyun authored and himadripal committed Oct 19, 2024
1 parent 46e83ed commit 9e8b3c1
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -457,7 +457,7 @@ private[spark] object Config extends Logging {
.doc("Time to wait between each round of executor allocation.")
.version("2.3.0")
.timeConf(TimeUnit.MILLISECONDS)
.checkValue(value => value > 0, "Allocation batch delay must be a positive time value.")
.checkValue(value => value > 100, "Allocation batch delay must be greater than 0.1s.")
.createWithDefaultString("1s")

val KUBERNETES_ALLOCATION_DRIVER_READINESS_TIMEOUT =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,14 @@ class ExecutorPodsAllocatorSuite extends SparkFunSuite with BeforeAndAfter {
when(persistentVolumeClaimList.getItems).thenReturn(Seq.empty[PersistentVolumeClaim].asJava)
}

test("SPARK-49447: Prevent small values less than 100 for batch delay") {
val m = intercept[IllegalArgumentException] {
val conf = new SparkConf().set(KUBERNETES_ALLOCATION_BATCH_DELAY.key, "1")
conf.get(KUBERNETES_ALLOCATION_BATCH_DELAY)
}.getMessage
assert(m.contains("Allocation batch delay must be greater than 0.1s."))
}

test("SPARK-41210: Window based executor failure tracking mechanism") {
var _exitCode = -1
val _conf = conf.clone
Expand Down

0 comments on commit 9e8b3c1

Please sign in to comment.