Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

zlib auto-reset on trailing data causes repeated flushing #8962

Closed
chrisdickinson opened this issue Jan 2, 2015 · 11 comments
Closed

zlib auto-reset on trailing data causes repeated flushing #8962

chrisdickinson opened this issue Jan 2, 2015 · 11 comments

Comments

@chrisdickinson
Copy link

6f6a979 changed the behavior of the zlib module's decompression transform (specifically Zlib.prototype._processChunk): if an incoming chunk would end the transform, but has trailing data, the transform resets itself and re-sends the chunk, optimistically assuming that the chunk represents a concatenated stream. However, it looks like this behavior can hang Node, and breaks some tests in npm. I'm looking into this at present, but want to make others aware!

@chrisdickinson
Copy link
Author

Per @othiym23, this seems like a pretty severe regression:

We should get this resolved before v0.11.15 is released, because npm install will break for a substantial number of users immediately.

@othiym23
Copy link

othiym23 commented Jan 2, 2015

It seems that any file in an npm tarball that's more than one stream-chunk long will trigger this behavior in v0.11.15-pre, causing any attempt to unpack / install the related package will cause npm to hang, potentially using a large amount of CPU. This probably affects most other applications using zlib as part of a pipeline as well.

@chrisdickinson
Copy link
Author

A semi-related reproduction. Before the patch, this code would write once and then halt -- afterward, it errors (so, that's one breaking change).

I've come around to the idea that zlib's transform streams shouldn't transparently handle decompressing concatenated compressed streams -- really, an outer stream should be in charge of that, and all node should be concerned with is giving the client the leftover bytes so they can manually .reset() or create a new zlib transform and re-write the remaining bytes.

@chrisdickinson
Copy link
Author

Update: An actual reproduction! Zero Any bytes trailing a compressed stream that is being decompressed with Unzip blows up.

@chrisdickinson
Copy link
Author

Made a patch to fix the reproduction. Unfortunately npm still blows up with this patch, though at least it's ending the stream now.

npm encounters the following error:

56 error argv "/Users/chris/projects/iojs/node" "/Users/chris/projects/iojs/npm/bin/npm-cli.js" "install"
57 error node v0.11.15-pre
58 error npm  v2.1.17
59 error code Z_DATA_ERROR
60 error errno -3
61 error unknown compression method
62 error If you need help, you may report this error at:
62 error     <http://github.com/npm/npm/issues>
63 verbose exit [ -3, true ]

Since there are trailing "zero" bytes in the tar.gz it's trying to decompress, it blows up when trying to restart the stream.

Edit: those trailing zeros may be due to block-stream.

@chrisdickinson
Copy link
Author

/cc @luismreis, who originally authored the patch.

@chrisdickinson
Copy link
Author

I suppose the remaining questions are:

  1. Do we back this out for now and attempt to reintroduce later?
  2. When re-introducing this, do we want to reintroduce this (roughly) as-is, or as an additional option for the zlib decompression stream constructors?

@rmg
Copy link

rmg commented Jan 6, 2015

I had to rollback my CI environments to work around this. I couldn't run npm install to completion on any module being tested. That seems like a pretty critical breakage.

@chrisdickinson your proposal to revert for now (0.11.15, I'm guessing?) and reintroduce later as an opt-in option sounds reasonable to me.

chrisdickinson added a commit to chrisdickinson/node that referenced this issue Jan 6, 2015
Revert "src: fix windows build error" and "zlib: support
concatenated gzip files". Treating subsequent data as a
concatenated stream breaks npm install.

This reverts commits 93533e9
and 6f6a979.

Fixes: nodejs#8962
PR-URL: nodejs#8985
Reviewed-By: Julien Gilli <[email protected]>
chrisdickinson added a commit that referenced this issue Jan 7, 2015
Revert "src: fix windows build error" and "zlib: support
concatenated gzip files". Treating subsequent data as a
concatenated stream breaks npm install.

This reverts commits 93533e9
and 6f6a979.

Fixes: #8962
PR-URL: #8985
Reviewed-By: Julien Gilli <[email protected]>
@chrisdickinson
Copy link
Author

Fixed by c8ef97e.

@eendeego
Copy link

eendeego commented Jan 8, 2015

Hi, can you tell what's the reason for the gunzip transform stream being fed with data that is not gzipped ? According to the gzip spec, we should be reading it as if it was a new gzipped stream, but it seams not to be the case here.

Does it make any sense to trim the stream from npm ?

I would like to help, but I'm feeling a bit lost here.

chrisdickinson added a commit to nodejs/node that referenced this issue Jan 9, 2015
Revert "src: fix windows build error" and "zlib: support
concatenated gzip files".

This reverts commits be413ac
and 1183ba4.

Treating subsequent bytes as a concatenated zlib stream
breaks npm install.

Conflicts:
	test/parallel/test-zlib-from-multiple-gzip-with-garbage.js
	test/parallel/test-zlib-from-multiple-gzip.js
	test/parallel/test-zlib-from-multiple-huge-gzip.js

Fixes: nodejs/node-v0.x-archive#8962
PR-URL: #240
Reviewed-By: Ben Noordhuis <[email protected]>
@eendeego
Copy link

@chrisdickinson, @othiym23 I'd like to fix this and have a proper patch commited into node/iojs.

The example referenced in #8962 (comment), converted to bash is something like:

#!/bin/bash
echo -ne 'hello world' | gzip > example.gz
echo -ne "\xde\xad\xbe\xef" >> example.gz
gunzip < example.gz

Which yields for output:

$ ./example.sh
hello worldgunzip: (stdin): trailing garbage ignored

The equivalent code in perl

#!/usr/bin/env perl

use strict;
use warnings;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
use IO::File;

my $input = new IO::File "<example.gz"
    or die "Cannot open 'example.gz': $!\n";
my $buffer;
gunzip $input => \$buffer, MultiStream => 1
    or die "gunzip failed: $GunzipError\n";

print $buffer;

Output:

$ ./gunzip.pl | hexdump -C
00000000  68 65 6c 6c 6f 20 77 6f  72 6c 64 de ad be ef     |hello world....|
0000000f

So, what should be the proper behaviour of this patch ? Should it try to decode the next stream ? Should it passed it along transparently ? Should it ignore it ?

Note, that if we don't allow chained streams, we'll not be implementing gunziping according to the spec.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants