-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node's zlib.ungzip does not support concatenated files #4306
Node's zlib.ungzip does not support concatenated files #4306
Comments
Your example gzip file actually consists of multiple streams, i.e. is the concatenation of more than one “raw” gzip file. See e.g. the gzip(1) manpage for some explanations on how the gzip command line program handles this. Node.js, however, ignores anything after the first stream, which in your case is only 14 bytes long. A bit more simplified: var abc = zlib.gzipSync('ABC');
var def = zlib.gzipSync('DEF');
var decompressed = zlib.gunzipSync(Buffer.concat([abc, def]));
console.log(decompressed.toString()); // => ABC (!) There was a fix for this a while ago, which resulted in a regression and then was reverted again; The story so far (as far as I can tell): |
I understand. |
I’d totally be in favour of supporting concatenated input files. Maybe someone here has the time to look into this; If people here agree that that would be the intended behaviour, I’d be really interested in looking into this, but I’m not sure I can make any promises on when I’d find time to do that. |
cc @chrisdickinson, @indutny, @trevnorris, @bnoordhuis -- You all seemed to have dealt with the aforementioned issues, can any of you give more context? @addaleax Thanks for triaging this! :) |
IIRC the problem was that the streams paradigm doesn't work so well for concatenated archives? For the sync versions, we could create an overload or alternative that returns an array or an iterator. For the streaming versions, I'm not so sure; maybe an event for the unconsumed data that you can then feed back into a new zlib stream? |
Not sure whether this helps, but here might be a little confusion between terms here. This is not about providing multiple files to the end user (i.e. it is not about archives in the sense of a single file which in some way contains multiple files), but rather it’s about encoding a single file as the result of multiple, independent compression operations. As far as I can tell, the problem with the original patch was that trailing zero bytes (as padding) caused decompression to fail. I’d still say that the default behaviour should be to allow decompression of a single file as concatenated streams, following the conventions set by the command line tools (and besides, it’s cool feature of the compression format). |
Ah, I think I remember this. The terms are confusing, but @bnoordhuis has it right, I think. There was a patch to make the zlib module compatible with concatenated zlib datastreams, but the problem was that it broke existing code. IIRC this was because:
The workaround, if you're interested in taking it on, is to expose either:
|
@chrisdickinson Yes, you are correct; However, the underlying assumption in the npm’s tarball handling was not that only the first datastream from a gzip file would be extracted. npm expects the ability to extract .tar.gz files which are padded with trailing So:
I agree that it’s a good idea to make this optional, but it should be active by default. And yes, if it’s okay with you, I’d be interested in working on this, but I can’t make any promises on when I’d get this done (probably within the next 1 oder 2 weeks). |
I'm definitely 👍 on getting support into core for this, and would love to see a PR for it — I'll note re:
The problem while defaulting this way is that the existing behavior has been out in the wild for a long time now, and folks have built programs that expect that behavior — it could potentially cause a lot of breakage without a commensurate upside. We're primarily beholden to the code that exists in the wild by way of our users, and we'd have to have a very good reason to release a version (even a major version) that breaks their code. For something like this we'd have to message changing the default way in advance (i.e., start warning that we're going to do it in the next major version, then actually flip the default in the major version after that.) |
Okay, I’ll start working on it! :) And I get that this would be semver-major, I just can’t really imagine that there’s any application which would rely on the current first-stream-only behaviour. But I do see your point, and if you say this is the way it has to be done, then it’s like that. |
@addaleax have you started working on this issue? If you have difficulty finding the time I can also work on this. |
@kthelgason Feel free to do that – I was busier than expected in the last weeks, sorry |
@addaleax Great, I'd love too. You've made it really approachable for me with all the groundwork laid out in this thread 😄 |
If I understand correctly an issue with the previous implementation of this feature is that NULL-padded .gz files caused issues. According to rfc1952 the gzip file format is as follows (ephasis mine):
Doesn't this explicitly disallow trailing garbage? @addaleax you state above that padding with |
@kthelgason Yes, I’d agree that trailing Then again, the I have no idea how many |
According to the spec gzipped archives can contain more than one compressed member. Previously Node's gzip implementation would only unzip the first member and throw away the rest of the compressed data. Issue nodejs#4306 is an example of this occurring in daily use.
According to the spec gzipped archives can contain more than one compressed member. Previously Node's gzip implementation would only unzip the first member and throw away the rest of the compressed data. Issue #4306 is an example of this occurring in daily use. Fixes: #4306 PR-URL: #5120 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]>
According to the spec gzipped archives can contain more than one compressed member. Previously Node's gzip implementation would only unzip the first member and throw away the rest of the compressed data. Issue #4306 is an example of this occurring in daily use. Fixes: #4306 PR-URL: #5120 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]>
why did incorrect operate zlib.ungzip?
It is operates in C#.net
but don't operate in nodejs.(windows7, node ver. 5.2.0)
Most of the well, but a problem appears in some of the files.
in C# (with ICSharpCode)
in NodeJs
Why it did reduce the size?
Here's the file in question.
http://static.kafra.kr/file/patch.rgz
I'm sorry bad English.
The text was updated successfully, but these errors were encountered: