Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Content Size information exposition to LZ4FrameInputStream #144

Merged
merged 3 commits into from
Sep 11, 2019

Conversation

BastienF
Copy link
Contributor

This PR to allow users to retrieve Content Size information before uncompressing file.
This is especially usefully when working with AWS S3 filesystem.
Because of the S3 streaming API limitation that requires the file size before starting to write it, access to Content Size a priori is mandatory to avoid loading the file in memory.

As workaround we are currently forced to directly read the Content Size header bytes from the lz4 file byte array.

@BastienF
Copy link
Contributor Author

Travis build is unstable but the PR is OK (Cf. https://travis-ci.org/lz4/lz4-java/builds/550391117)

@odaira
Copy link
Member

odaira commented Sep 11, 2019

@BastienF Thanks for your contribution. I basically agree with your proposal, but I would like to apply some changes, so I made a PR to your branch. Could you review and merge my PR?

BastienF#1

The important change is to introduce readSingleFrame to LZ4FrameInputStream's constructors so that the instance reads only the first frame from the stream. And getExpectedContentSize() is valid only when readSingleFrame is true. This is because when LZ4FrameInputStream reads multiple concatenated frames, a user of LZ4FrameInputStream does not know where a new frame begins, so a call to getExpectedContentSize() does not make sense.

I guess in your use case an lz4-compressed file consists of only one frame, so I expect this change makes sense.

Introduce readSingleFrame to read only the first frame
@odaira odaira merged commit 71fafd1 into lz4:master Sep 11, 2019
@odaira
Copy link
Member

odaira commented Dec 20, 2019

@BastienF I am thinking of deferring nextFrameInfo() until an actual read() is called, as proposed in #146. This change requires adding throws IOException to getExpectedContentSize() and isExpectedContentSizeDefined(). I appreciate it if you could let me know, if you have any concern. I should have added throws IOException when we discussed the API, because these methods depend on the contents read by I/O, so in principle they can throw IOException at any time.

@BastienF
Copy link
Contributor Author

@odaira Sorry for the late response, no problem for me. Thanks for the concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants