Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read concatenated byte stream in LZ4BlockInputStream #105

Merged
merged 4 commits into from
Jun 30, 2017
Merged

Read concatenated byte stream in LZ4BlockInputStream #105

merged 4 commits into from
Jun 30, 2017

Conversation

maropu
Copy link
Contributor

@maropu maropu commented Jun 26, 2017

This pr fixed issues when reading concatenated byte stream in LZ4BlockInputStream.
This comes from #76.

@odaira
Copy link
Member

odaira commented Jun 27, 2017

Thanks for your contribution!
Doesn't this change the default behavior when reading an empty block? Would you have any good reason for the change? As you might know, #76 kept the default behavior and added setStopOnEmptyBlock() to control the behavior.

@maropu
Copy link
Contributor Author

maropu commented Jun 28, 2017

ok, I'll update soon.

@maropu
Copy link
Contributor Author

maropu commented Jun 28, 2017

ok, fixed. check again? Thanks!

stopOnEmptyBlock = true;
}

protected void setStopOnEmptyBlock(boolean stopOnEmptyBlock) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be protected, or public? I guess the use case you assume is to extend this class to enable the new behavior, but isn't it simpler to just make setStopOnEmptyBlock() public so that users can just call it if they want the new behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reconsider this and I think how about setting this in the constructor? I couldn't imagine users change this flag during the decompression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It makes more sense to have new constructors.

@@ -147,7 +152,19 @@ public long skip(long n) throws IOException {
}

private void refill() throws IOException {
readFully(compressedBuffer, HEADER_LENGTH);
if (finished || o < originalLen) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this check necessary? I don't find any execution path on which this check is evaluated to true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok. my bad. I forgot removing this. Thanks.

*/
public LZ4BlockInputStream(InputStream in, LZ4FastDecompressor decompressor, Checksum checksum) {
public LZ4BlockInputStream(InputStream in, LZ4FastDecompressor decompressor, Checksum checksum, Boolean stopOnEmptyBlock) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not so familiar with the latest Java best practices, but is there any reason for Boolean, not boolean?


/**
* Create a new instance using {@link XXHash32} for checksuming.
* @see #LZ4BlockInputStream(InputStream, LZ4FastDecompressor, Checksum, Boolean)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


/**
* Create a new instance using {@link XXHash32} for checksuming.
* @see #LZ4BlockInputStream(InputStream, LZ4FastDecompressor, Checksum, Boolean)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

* @see #LZ4BlockInputStream(InputStream, LZ4FastDecompressor, Checksum, Boolean)
* @see StreamingXXHash32#asChecksum()
*/
public LZ4BlockInputStream(InputStream in, Boolean stopOnEmptyBlock) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@odaira odaira merged commit 6480bd9 into lz4:master Jun 30, 2017
@odaira
Copy link
Member

odaira commented Jun 30, 2017

Thanks a lot for your contribution!

@maropu
Copy link
Contributor Author

maropu commented Jul 1, 2017

Thanks! Do we have a new release in near future? Any blocker?

asfgit pushed a commit to apache/spark that referenced this pull request Aug 9, 2017
## What changes were proposed in this pull request?
This pr updated `lz4-java` to the latest (v1.4.0) and removed custom `LZ4BlockInputStream`. We currently use custom `LZ4BlockInputStream` to read concatenated byte stream in shuffle. But, this functionality has been implemented in the latest lz4-java (lz4/lz4-java#105). So, we might update the latest to remove the custom `LZ4BlockInputStream`.

Major diffs between the latest release and v1.3.0 in the master are as follows (lz4/lz4-java@62f7547...6d4693f);
- fixed NPE in XXHashFactory similarly
- Don't place resources in default package to support shading
- Fixes ByteBuffer methods failing to apply arrayOffset() for array-backed
- Try to load lz4-java from java.library.path, then fallback to bundled
- Add ppc64le binary
- Add s390x JNI binding
- Add basic LZ4 Frame v1.5.0 support
- enable aarch64 support for lz4-java
- Allow unsafeInstance() for ppc64le archiecture
- Add unsafeInstance support for AArch64
- Support 64-bit JNI build on Solaris
- Avoid over-allocating a buffer
- Allow EndMark to be incompressible for LZ4FrameInputStream.
- Concat byte stream

## How was this patch tested?
Existing tests.

Author: Takeshi Yamamuro <[email protected]>

Closes #18883 from maropu/SPARK-21276.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants