-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zlib: support concatenated archives #5120
Conversation
267b15e
to
017e172
Compare
cc @indutny or @bnoordhuis |
@@ -254,11 +253,22 @@ class ZCtx : public AsyncWrap { | |||
ctx->err_ = Z_NEED_DICT; | |||
} | |||
} | |||
if (ctx->strm_.avail_in >= 2 && ctx->mode_ == GUNZIP) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic constants need an explaining comment and should preferably be true, well-named constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought this was a little silly. It should probably check to make sure that avail_in
is at least equal to the gzip header size. I'll fix this.
Thanks for the review @bnoordhuis. I've addressed some individual comments and will be updating this PR to fix all the issues mentioned. |
Can you post a comment when you're done? GH doesn't send notifications for new or updated commits. |
017e172
to
30be769
Compare
I've updated this in line with the comments from @bnoordhuis. |
// member in the same archive, or just trailing garbage. | ||
// Check the header to find out. | ||
if (*ctx->strm_.next_in != GZIP_HEADER_ID1 || | ||
*(ctx->strm_.next_in + 1) != GZIP_HEADER_ID2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little more idiomatic to write it as ctx->strm_.next_in[1] != GZIP_HEADER_ID2
. Suggestion: use indexed notation for the first byte too, for consistency; that's what we do elsewhere.
LGTM with a style nit. Perhaps it would be good to add a test for an archive that looks like an concatenated archive but isn't, i.e., with trailing junk that just happens to start with the right magic bytes. CI: https://ci.nodejs.org/job/node-test-pull-request/1580/ (currently private) |
I thought about including such a test, but it's pretty unclear to me what the expected behaviour is. Technically, trailing garbage in a gzipped file is forbidden by the spec. It seems however that in practice some files are padded with '\0', and I've accounted for this in my implementation. Currently, if a file happens to contain trailing garbage that happens to begin with the gzip header id bytes it will not decompress correctly and an error will be thrown. EDIT: I just checked if |
30be769
to
30f5b38
Compare
Addressed the style nit and force pushed. |
@kthelgason About the trailing |
@Fishrock123 no, both implementations simply ignore the junk bytes at the end. |
@kthelgason Seems fine to me then; let's discuss the trailing bytes thing in a different issue if it is important. |
|
||
assert.equal(zlib.gunzipSync(data).toString(), 'abcdef'); | ||
|
||
zlib.gunzip(data, (_, result) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't ignore the error. You can do assert.ifError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to wrap this kind of callback in common.mustCall
to make sure the assertions are really executed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why common.mustCall
exists! :)
30f5b38
to
f7b67e4
Compare
@thefourtheye @targos @Fishrock123 fixed. Regarding the trailing garbage bytes, I think opening another issue to discuss that (if necessary) would be a good idea. |
LGTM |
LGTM. Is there much to discuss w.r.t. trailing garbage? Emitting an error is the obvious thing to do, right? |
@bnoordhuis to me it seems that way yes. Esp. given that |
Exactly. I'd add the test case to this PR. Might as well land it with full coverage from the start. |
Thanks for the comment @rvagg and for the review @bnoordhuis. I've squashed the commits as recommended, and CI looks good (apart from the ARM failure but as pointed out by @jbergstroem I can't see that it's related to this at all). |
Thanks Kári, landed in f380db2. |
According to the spec gzipped archives can contain more than one compressed member. Previously Node's gzip implementation would only unzip the first member and throw away the rest of the compressed data. Issue #4306 is an example of this occurring in daily use. Fixes: #4306 PR-URL: #5120 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]>
According to the spec gzipped archives can contain more than one compressed member. Previously Node's gzip implementation would only unzip the first member and throw away the rest of the compressed data. Issue #4306 is an example of this occurring in daily use. Fixes: #4306 PR-URL: #5120 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]>
Notable changes: * **governance**: The following members have been added as collaborators: - Andreas Madsen (@AndreasMadsen) - Benjamin Gruenbaum (@benjamingr) - Claudio Rodriguez (@claudiorodriguez) - Glen Keane (@thekemkid) - Jeremy Whitlock (@whitlockjc) - Matt Loring (@matthewloring) - Phillip Johnsen (@phillipj) * **lib**: copy arguments object instead of leaking it (Nathan Woltman) #4361 * **zlib**: add support for concatenated members (Kári Tristan Helgason) #5120 PR-URL: #5702
According to the spec gzipped archives can contain more than one compressed member. Previously Node's gzip implementation would only unzip the first member and throw away the rest of the compressed data. Issue #4306 is an example of this occurring in daily use. Fixes: #4306 PR-URL: #5120 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]>
Notable changes: * **governance**: The following members have been added as collaborators: - Andreas Madsen (@AndreasMadsen) - Benjamin Gruenbaum (@benjamingr) - Claudio Rodriguez (@claudiorodriguez) - Glen Keane (@thekemkid) - Jeremy Whitlock (@whitlockjc) - Matt Loring (@matthewloring) - Phillip Johnsen (@phillipj) * **lib**: copy arguments object instead of leaking it (Nathan Woltman) #4361 * **zlib**: add support for concatenated members (Kári Tristan Helgason) #5120 PR-URL: #5702
Notable changes: * **governance**: The following members have been added as collaborators: - Andreas Madsen (@AndreasMadsen) - Benjamin Gruenbaum (@benjamingr) - Claudio Rodriguez (@claudiorodriguez) - Glen Keane (@thekemkid) - Jeremy Whitlock (@whitlockjc) - Matt Loring (@matthewloring) - Phillip Johnsen (@phillipj) * **lib**: copy arguments object instead of leaking it (Nathan Woltman) #4361 * **src**: allow combination of -i and -e cli flags (Rich Trott) #5655 * **zlib**: add support for concatenated members (Kári Tristan Helgason) #5120 PR-URL: #5702
Notable changes: * **contextify**: Fixed a memory consumption issue related to heavy use of `vm.createContext` and `vm.runInNewContext`. (Ali Ijaz Sheikh) #5392 * **governance**: The following members have been added as collaborators: - Andreas Madsen (@AndreasMadsen) - Benjamin Gruenbaum (@benjamingr) - Claudio Rodriguez (@claudiorodriguez) - Glen Keane (@thekemkid) - Jeremy Whitlock (@whitlockjc) - Matt Loring (@matthewloring) - Phillip Johnsen (@phillipj) * **lib**: copy arguments object instead of leaking it (Nathan Woltman) #4361 * **src**: allow combination of -i and -e cli flags (Rich Trott) #5655 * **v8**: backport fb4ccae from v8 upstream (Vladimir Krivosheev) #4231 - breakout events from v8 to offer better support for external debuggers * **zlib**: add support for concatenated members (Kári Tristan Helgason) #5120 PR-URL: #5702
Notable changes: * **contextify**: Fixed a memory consumption issue related to heavy use of `vm.createContext` and `vm.runInNewContext`. (Ali Ijaz Sheikh) #5392 * **governance**: The following members have been added as collaborators: - Andreas Madsen (@AndreasMadsen) - Benjamin Gruenbaum (@benjamingr) - Claudio Rodriguez (@claudiorodriguez) - Glen Keane (@thekemkid) - Jeremy Whitlock (@whitlockjc) - Matt Loring (@matthewloring) - Phillip Johnsen (@phillipj) * **lib**: copy arguments object instead of leaking it (Nathan Woltman) #4361 * **src**: allow combination of -i and -e cli flags (Rich Trott) #5655 * **v8**: backport fb4ccae from v8 upstream (Vladimir Krivosheev) #4231 - breakout events from v8 to offer better support for external debuggers * **zlib**: add support for concatenated members (Kári Tristan Helgason) #5120 PR-URL: #5702
This PR adds support for gunzipping archives that consists of multiple concatenated members.
This has not been supported by node's implementation so far but is a part of the spec rfc1952.
The problem was raised in #4306, and there is more detailed discussion on this feature in the issue.
I also refactored some of
lib/zlib.js
but I will submit that in a separate PR later.