-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read concatenated byte stream in LZ4BlockInputStream #105
Conversation
Thanks for your contribution! |
ok, I'll update soon. |
ok, fixed. check again? Thanks! |
stopOnEmptyBlock = true; | ||
} | ||
|
||
protected void setStopOnEmptyBlock(boolean stopOnEmptyBlock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be protected, or public? I guess the use case you assume is to extend this class to enable the new behavior, but isn't it simpler to just make setStopOnEmptyBlock()
public so that users can just call it if they want the new behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reconsider this and I think how about setting this in the constructor? I couldn't imagine users change this flag during the decompression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It makes more sense to have new constructors.
@@ -147,7 +152,19 @@ public long skip(long n) throws IOException { | |||
} | |||
|
|||
private void refill() throws IOException { | |||
readFully(compressedBuffer, HEADER_LENGTH); | |||
if (finished || o < originalLen) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this check necessary? I don't find any execution path on which this check is evaluated to true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, ok. my bad. I forgot removing this. Thanks.
*/ | ||
public LZ4BlockInputStream(InputStream in, LZ4FastDecompressor decompressor, Checksum checksum) { | ||
public LZ4BlockInputStream(InputStream in, LZ4FastDecompressor decompressor, Checksum checksum, Boolean stopOnEmptyBlock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not so familiar with the latest Java best practices, but is there any reason for Boolean
, not boolean
?
|
||
/** | ||
* Create a new instance using {@link XXHash32} for checksuming. | ||
* @see #LZ4BlockInputStream(InputStream, LZ4FastDecompressor, Checksum, Boolean) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
||
/** | ||
* Create a new instance using {@link XXHash32} for checksuming. | ||
* @see #LZ4BlockInputStream(InputStream, LZ4FastDecompressor, Checksum, Boolean) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
* @see #LZ4BlockInputStream(InputStream, LZ4FastDecompressor, Checksum, Boolean) | ||
* @see StreamingXXHash32#asChecksum() | ||
*/ | ||
public LZ4BlockInputStream(InputStream in, Boolean stopOnEmptyBlock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Thanks a lot for your contribution! |
Thanks! Do we have a new release in near future? Any blocker? |
## What changes were proposed in this pull request? This pr updated `lz4-java` to the latest (v1.4.0) and removed custom `LZ4BlockInputStream`. We currently use custom `LZ4BlockInputStream` to read concatenated byte stream in shuffle. But, this functionality has been implemented in the latest lz4-java (lz4/lz4-java#105). So, we might update the latest to remove the custom `LZ4BlockInputStream`. Major diffs between the latest release and v1.3.0 in the master are as follows (lz4/lz4-java@62f7547...6d4693f); - fixed NPE in XXHashFactory similarly - Don't place resources in default package to support shading - Fixes ByteBuffer methods failing to apply arrayOffset() for array-backed - Try to load lz4-java from java.library.path, then fallback to bundled - Add ppc64le binary - Add s390x JNI binding - Add basic LZ4 Frame v1.5.0 support - enable aarch64 support for lz4-java - Allow unsafeInstance() for ppc64le archiecture - Add unsafeInstance support for AArch64 - Support 64-bit JNI build on Solaris - Avoid over-allocating a buffer - Allow EndMark to be incompressible for LZ4FrameInputStream. - Concat byte stream ## How was this patch tested? Existing tests. Author: Takeshi Yamamuro <[email protected]> Closes #18883 from maropu/SPARK-21276.
This pr fixed issues when reading concatenated byte stream in
LZ4BlockInputStream
.This comes from #76.