Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

res.write not sending chunks until res.end() is called #56

Closed
bijoythomas opened this issue Oct 20, 2015 · 13 comments
Closed

res.write not sending chunks until res.end() is called #56

bijoythomas opened this issue Oct 20, 2015 · 13 comments
Assignees
Labels

Comments

@bijoythomas
Copy link

I'm using node v.0.12.0 with express (v4.4.1) and compression (v1.6.0)

I'm sending back about 80MB of dynamically generated data (not from FS or DB) in multiple res.write() calls. When I add the compression middle ware (with no options passed), I don't see any traffic (using WireShark) from the Node server until the res.end() is called. When res.end() is called, there is a sudden burst of chunked data.

But when the compression module is not used, I see chunked responses going on the wire. The size of each of these chunked fragments seems to be between 10K and 16K (based on WireShark).

The only header information I am setting happens before the res.write() calls and is:

res.setHeader('Content-Type', 'application/json');

Any reason why the data is getting buffered before the res.end() call?

@dougwilson
Copy link
Contributor

This is because it's how compression works with gzip: to get small sizes, gzip needs to accumulate the payload so it can do substring search and replacements.

If you want to stream, you can use the res.flush() command between writes, but the compression will be much less efficient.

An example can be found at the bottom of the readme.

@dougwilson
Copy link
Contributor

Basically you are just describing to me exactly how gzip compression and the Node.js core zlib stream works and not actually a bug. Let me know if my answer above helps at all. If you think there is a legitimate bug, let me know the specifics and even perhaps a PR :)

@dougwilson dougwilson self-assigned this Oct 20, 2015
@bijoythomas
Copy link
Author

Thanks. I added the call to flush the response after each write but it is still behaving the same. Everything seems to be coming only when res.end() is called. I am calling res.write() [and now res.flush()] in batches. After about 10 calls, I see this,

(node) warning: possible EventEmitter memory leak detected. 11 drain listeners added. Use emitter.setMaxListeners() to increase limit

Trace
    at Gzip.addListener (events.js:179:15)
    at Gzip.Readable.on (_stream_readable.js:671:33)
    at Gzip.once (events.js:204:8)
    at Gzip.Zlib.flush (zlib.js:451:10)
    at ServerResponse.flush (/Users/bijoythomas/Zulu/SchedulingMVP/app_node/node_modules/compression/index.js:70:16)

@dougwilson
Copy link
Contributor

Hi! The warning you are getting is a known and fixed Node.js core bug; if you upgrade Node.js to the latest, it should be fixed (or go back to 0.10, which I know was before that regression).

As for your issue, I'm not able to replicate it and all the tests running in Travis CI cannot, either.

If you can provide a PR for the fix or at least provide full code and instructions on how I can reproduce it, I can probably find some time in the next week to debug it. A lot of times people can have a proxy between them and their app, which can cause issues, so be sure you are running your server on localhost to narrow it down to this module. Another common issue is app.use of another module that is buffering your response, so you can try removing all other middleware to narrow it down.

@bijoythomas
Copy link
Author

Thanks for the suggestions. I am running on localhost with no proxies in between. So far, I can confirm that the difference comes down to one line: app.use(compression()).

When commented, I can see the data streaming and the network tab in Chrome dev tools shows me the streaming information (sizes and time)

When enabled, I can see a long pause (Pending) and then finally the data coming in.

I will try to narrow it down to an example.

@dougwilson
Copy link
Contributor

Hi @bijoythomas , you cannot use the Chrome network tab to make the measurements for GZIPed things, as far as I know. We get lots of issues in this repo from Chrome reporting fake things. I thought you said you were using Wireshark to verify this? Did you use Wireshark to verify on your most recent update or just Chrome?

@bijoythomas
Copy link
Author

Both. Wire Shark and Chrome.

@dougwilson
Copy link
Contributor

Cool. Yea, I'm not able to reproduce for whatever reason, all the tests in Travis CI on all their Node.js versions get streaming compressed responses, etc., so I'm not sure how I can proceed without your help on trying to get me a reproducible case or just a PR to address the issue.

@dougwilson
Copy link
Contributor

It's possible there is some kind of Express bug (I see you are using a version from June 2, 2014, well over a year old) or even some kind of bug with the Node.js you are using (from February). Have you also tried upgrading either of those two? I know I've only been testing with the latest versions from those lines to rule out any bugs (Express 4.13.3 and Node.js 0.12.7) that may be in them.

@dougwilson
Copy link
Contributor

In addition, I know you said it was the addition of app.use(compression()) that caused it, but there has been several issues where another module was used that, when both of these modules were used together, this module triggered a bug in the other module. So if you haven't tried already, you still should try removing all other middleware to rule out the issue being a combination of this + another one. I don't have your application, so if there is an interaction like this going on, it could very well be another reason why I'm not able to reproduce the issue myself.

Without being able to see the issue, I'm not able to debug the issue and determine the case or a fix :(

@bijoythomas
Copy link
Author

Sure. I will try to narrow down the middleware and see if things change.

I did a little bit of debugging by adding console.log statements in the compression/index.js file at several points in my node_modules directory.

I see that in the res.flush() function we do have a 'gzip' stream and it is getting called but nothing gets onto the wire.

I added another log statement in the res.end function in there there and once that gets called, is when the actual res.write (actual http response object) is happening. I had added logging in the stream.on('data') function.

I know its hard to understand w/o seeing where I put the logs, but essentially what I'm seeing is a bunch of Flush calls with no data on the wire, followed by the res.end() calls and AFTER that I see the 'data' event from the stream with the underlying res.write() call happening and returning true/false and draining/resuming the stream. But like I said, all this activity seems to be happening only after the res.end() call. It's like the flush has no effect.

@dougwilson
Copy link
Contributor

Gotcha. In fact, from what you are describing, it sounds like you are describing a Node.js bug (perhaps just in 0.12.0).

Essentially all we do is whenever the Node.js code stream emits a data event, we call res.write with that chunk. Every time you call res.write, we call stream.write to that Node.js core stream. When you call res.flush we just call stream.flush on that stream as well. If that doesn't cause the stream to emit a data event, then we don't have anything to call res.write with, of course.

@bijoythomas
Copy link
Author

So I figured out what is happening. The same issue exists in the last version of Node too. The way I am sending data is something like below

for (i-0; i<some_large_number; i++) {
 poppulate_data_in_array(arr);
 if (arr.length === 500) {
   res.write(SON.stringify(arr));
   res.flush();
   arr = [];
}

// Finally if there is anything in array, send it across too

A look at the zlib.js source https://github.com/nodejs/node/blob/master/lib/zlib.js line 448 shows that the flush() call is not synchronous. Instead, it sets up a 'drain' listener to execute later. But with my for-loop consuming the thread, this listener has no chance to run. In fact, the loop will keep adding the listeners for ' drain' and that explains why I was getting the error below:
(node) warning: possible EventEmitter memory leak detected. 11 drain listeners added

I changed my code to allow the execution of the listener by making use of the callback that the Zlib.flush() takes and trampoline between data generation and sending it. So it looks something like below:

function populateArray() {
    if (iterCount < maxCount) {
      arr.push(generate_some_data());
    }
    send();
}

function send() {
    if (iterCount === maxCount) {
      if (arr.length > 0) {
        res.write(JSON.stringify(arr));
      }
      res.end();
      return;
   }

    if (arr.length === 500) {
      res.write(JSON.stringify(arr));
      res.flush(function() { // <--------- callback to flush which on invocation resumes the array population
        arr = [];
        populateArray(++iterCount);

      });
    } else {
      populateArray(++iterCount);
    }
}

In order to get this to work, I had to change the flush implementation to take a callback

res.flush = function flush(cb) {
      if (stream) {
        stream.flush(opts.flush, cb);
      }   
}

Where opts is the variable that captures the options passed into the compression function. The Zlib flush implementation takes the flush mode but I'm not aware of a practical use case of using multiple flush modes for a single compression run. Or if there is one, the function could take the addition flush argument but the user has to make sure the right mode is passed in.

@expressjs expressjs deleted a comment from Abhisakharkar Jun 22, 2018
@expressjs expressjs locked and limited conversation to collaborators Jun 22, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants