-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-27259][CORE] Allow setting -1 as length for FileBlock #26123
Conversation
For compressed files, the size of split can be set to -1. This was allowed till Spark 2.1 but regressed starting spark 2.2.x This commit removes the validation on length being at least 0.
Hi, @praneetsharma .
I'm just wondering if we have an official reference for the following.
|
ok to test |
Incorporating comment of modifying the validation to be `length >= -1`
Formatting fix
Test build #112132 has finished for PR 26123 at commit
|
Test build #112133 has finished for PR 26123 at commit
|
Test build #112131 has finished for PR 26123 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @praneetsharma .
Merged to master/3.0-preview/2.4
.
### What changes were proposed in this pull request? This PR aims to update the validation check on `length` from `length >= 0` to `length >= -1` in order to allow set `-1` to keep the default value. ### Why are the changes needed? At Apache Spark 2.2.0, [SPARK-18702](https://github.com/apache/spark/pull/16133/files#diff-2c5519b1cf4308d77d6f12212971544fR27-R38) adds `class FileBlock` with the default `length` value, `-1`, initially. There is no way to set `filePath` only while keeping `length` is `-1`. ```scala def set(filePath: String, startOffset: Long, length: Long): Unit = { require(filePath != null, "filePath cannot be null") require(startOffset >= 0, s"startOffset ($startOffset) cannot be negative") require(length >= 0, s"length ($length) cannot be negative") inputBlock.set(new FileBlock(UTF8String.fromString(filePath), startOffset, length)) } ``` For compressed files (like GZ), the size of split can be set to -1. This was allowed till Spark 2.1 but regressed starting with spark 2.2.x. Please note that split length of -1 also means the length was unknown - a valid scenario. Thus, split length of -1 should be acceptable like pre Spark 2.2. ### Does this PR introduce any user-facing change? No ### How was this patch tested? This is updating the corner case on the requirement check. Manually check the code. Closes #26123 from praneetsharma/fix-SPARK-27259. Authored-by: prasha2 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 57edb42) Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? This PR aims to update the validation check on `length` from `length >= 0` to `length >= -1` in order to allow set `-1` to keep the default value. ### Why are the changes needed? At Apache Spark 2.2.0, [SPARK-18702](https://github.com/apache/spark/pull/16133/files#diff-2c5519b1cf4308d77d6f12212971544fR27-R38) adds `class FileBlock` with the default `length` value, `-1`, initially. There is no way to set `filePath` only while keeping `length` is `-1`. ```scala def set(filePath: String, startOffset: Long, length: Long): Unit = { require(filePath != null, "filePath cannot be null") require(startOffset >= 0, s"startOffset ($startOffset) cannot be negative") require(length >= 0, s"length ($length) cannot be negative") inputBlock.set(new FileBlock(UTF8String.fromString(filePath), startOffset, length)) } ``` For compressed files (like GZ), the size of split can be set to -1. This was allowed till Spark 2.1 but regressed starting with spark 2.2.x. Please note that split length of -1 also means the length was unknown - a valid scenario. Thus, split length of -1 should be acceptable like pre Spark 2.2. ### Does this PR introduce any user-facing change? No ### How was this patch tested? This is updating the corner case on the requirement check. Manually check the code. Closes #26123 from praneetsharma/fix-SPARK-27259. Authored-by: prasha2 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 57edb42) Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR aims to update the validation check on
length
fromlength >= 0
tolength >= -1
in order to allow set-1
to keep the default value.Why are the changes needed?
At Apache Spark 2.2.0, SPARK-18702 adds
class FileBlock
with the defaultlength
value,-1
, initially.There is no way to set
filePath
only while keepinglength
is-1
.For compressed files (like GZ), the size of split can be set to -1. This was allowed till Spark 2.1 but regressed starting with spark 2.2.x. Please note that split length of -1 also means the length was unknown - a valid scenario. Thus, split length of -1 should be acceptable like pre Spark 2.2.
Does this PR introduce any user-facing change?
No
How was this patch tested?
This is updating the corner case on the requirement check. Manually check the code.