From 90139f678860ce74b934a919b5bcd0635df348f4 Mon Sep 17 00:00:00 2001 From: prasha2 Date: Tue, 15 Oct 2019 22:22:37 -0700 Subject: [PATCH] [SPARK-27259][CORE] Allow setting -1 as length for FileBlock ### What changes were proposed in this pull request? This PR aims to update the validation check on `length` from `length >= 0` to `length >= -1` in order to allow set `-1` to keep the default value. ### Why are the changes needed? At Apache Spark 2.2.0, [SPARK-18702](https://github.com/apache/spark/pull/16133/files#diff-2c5519b1cf4308d77d6f12212971544fR27-R38) adds `class FileBlock` with the default `length` value, `-1`, initially. There is no way to set `filePath` only while keeping `length` is `-1`. ```scala def set(filePath: String, startOffset: Long, length: Long): Unit = { require(filePath != null, "filePath cannot be null") require(startOffset >= 0, s"startOffset ($startOffset) cannot be negative") require(length >= 0, s"length ($length) cannot be negative") inputBlock.set(new FileBlock(UTF8String.fromString(filePath), startOffset, length)) } ``` For compressed files (like GZ), the size of split can be set to -1. This was allowed till Spark 2.1 but regressed starting with spark 2.2.x. Please note that split length of -1 also means the length was unknown - a valid scenario. Thus, split length of -1 should be acceptable like pre Spark 2.2. ### Does this PR introduce any user-facing change? No ### How was this patch tested? This is updating the corner case on the requirement check. Manually check the code. Closes #26123 from praneetsharma/fix-SPARK-27259. Authored-by: prasha2 Signed-off-by: Dongjoon Hyun (cherry picked from commit 57edb4258254fa582f8aae6bfd8bed1069e8155c) Signed-off-by: Dongjoon Hyun --- .../main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala b/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala index bfe8152d4dee2..1beb085db27d9 100644 --- a/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala +++ b/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala @@ -76,7 +76,7 @@ private[spark] object InputFileBlockHolder { def set(filePath: String, startOffset: Long, length: Long): Unit = { require(filePath != null, "filePath cannot be null") require(startOffset >= 0, s"startOffset ($startOffset) cannot be negative") - require(length >= 0, s"length ($length) cannot be negative") + require(length >= -1, s"length ($length) cannot be smaller than -1") inputBlock.get().set(new FileBlock(UTF8String.fromString(filePath), startOffset, length)) }