net: WIP run close callbacks in correct eloop phase #6802

trevnorris · 2014-01-04T07:05:30Z

Instead of running the close callbacks seemingly synchronously instead
of when the handle has actually been closed by libuv, instead run the
callbacks in the uv__run_closing_handles() phase of the eloop.

There's an issue with the cluster module overriding the close() callback that prevents the callbacks from being executed at the correct time.

NOTE: This is missing tests and not fully functional. Don't merge yet.

trevnorris · 2014-01-04T07:29:20Z

@tjfontaine This PR is basically what I'm talking about, where the close callbacks should actually be run in the proper phase of the eloop.

tjfontaine · 2014-01-04T15:47:20Z

there are certainly some cleaner paths here, I'm just worried about the subtle semantic changes we would be introducing here without necessarily having a reason to change the code to this right now

trevnorris · 2014-01-04T23:13:15Z

Other than this is just TheRightWay(tm) I don't ever see a need (i.e. to
fix a bug) to implement this.

The only semantic change introduced is that if a setTimeout(..., 0); and
setImmediate(...); were called in the close callback in this one case the
setTimeout() would run first. Since the closing handles phase of the event
eloop occurs after the phase where setImmediate is run.

As far as performance, there is zero impact.

IMO it's in our best interest to run the user's callbacks in the phase of
the eloop where they'd be expected. As you can see, doing this I was able
to get rid of some of the cluster hackery around closing at the wrong time.

tjfontaine · 2014-01-04T23:16:13Z

We don't necessarily know at the moment what the performance difference is, we don't really have numbers to classify it.

trevnorris · 2014-01-05T00:23:00Z

Really? I don't throw around performance implications lightly, and don't
appreciate terse statements contradicting what I've spent so many hours
programming and testing.

trevnorris · 2014-01-05T00:24:36Z

If you want my tests and performance numbers than please just ask for them.
Instead of saying we don't have anything to back up what I'm saying.

tjfontaine · 2014-01-05T00:45:20Z

I'm not being dismissive, and it wasn't meant to upset you, I am not aware of any numbers or the benchmarks used to determine them. My point is merely that if we're going to make claims about performance then they do need to come with numbers and the cases so we can better qualify the decisions instead of making blanket statements.

I am mostly worried that implications of this change won't be completely obvious because of the subtlety of the change, similar to how core and its tests work fine with s/nextTick/setImmediate/ but in practice in real applications it's not quite that simple of a qualification.

To do the best by node we need to act from a point where we know there are problems, and have them understood as well as we can, instead of making assumptions and acting upon them.

trevnorris · 2014-01-05T01:00:30Z

Other than some of the emit('close') events in process.nextTick(), which would prevent the eloop from continuing before event callback execution. Also the following case:

var s = require('net').createServer();
s.listen(8000, function() {
  s.close(function() {
    setTimeout(function() {
      console.log('setTimeout');
    }, 0);
    setImmediate(function() {
      console.log('setImmediate');
    });
  });
});

Where before the patch output would be:

setTimeout
setImmediate

and after the patch:

setImmediate
setTimeout

I dismissed both of these as potential problems because as I've interpreted your long standing belief that callback execution order of this type shouldn't be relied upon.

Those were the only cases I could think of. Please let me know if you can immediately think of any others.

trevnorris · 2014-01-05T04:17:50Z

@tjfontaine While the following is not a real world example, I do believe it serves as a point of some flaws in Node architecture:

var net = require('net');

var cntr = 0;
(function doConnect() {
  cntr++;
  var s = net.createServer();
  s.listen(8000, function() {
    s.close(doConnect);
  });
}());

setInterval(function() {
  console.log('cntr: ' + cntr);
  cntr = 0;
}, 2000);

In current master it will just spin out of control and eventually crash. This is because the close() callback is executed via process.nextTick(). Whereas with this patch we get approximately the following:

...
cntr: 26831
cntr: 26882
14.35user 1.74system 0:16.02elapsed 100%CPU (46500maxresident)k

So, an alternative to get around the process.nextTick() issue is to setImmediate() the call instead. Doing this to lib/net.js we'll get the following:

...
cntr: 23851
cntr: 23998
13.25user 1.31system 0:14.48elapsed 100%CPU (47100maxresident)k

While not a significant performance impact, it is slight that setImmediate() is slower (I used a more in depth analysis than the simple numbers shown here), and it would still introduce a "subtle change".

So, IMO, the issue needs to be fixed that there are paths in Node that can cause your application to crash because of non-obvious implementation details. And since we're at it, might as well make sure the implementation is done correctly.

trevnorris · 2014-01-05T04:40:11Z

Same issue can be seen w/ UDP:

var dgram = require('dgram');

var cntr = 0;
(function doBind() {
  cntr++;
  var s = dgram.createSocket('udp4');
  s.bind(8000, function() {
    s.close();
  });
  s.on('close', doBind);
}());

setInterval(function() {
  console.log('cntr: ' + cntr);
  cntr = 0;
}, 2000);

With patch:

...
cntr: 133264
cntr: 133573
5.34user 1.32system 0:06.67elapsed 100%CPU (15796maxresident)k

And w/o patch Node will crash. If we place the emit() in a setImmediate() we get the following:

cntr: 102547
cntr: 102627
9.36user 2.06system 0:11.42elapsed 100%CPU (0avgtext+0avgdata 16084maxresident)k

Side note: Why is the callback interface for UDP and TCP different?

Instead of running the close callbacks seemingly synchronously instead of when the handle has actually been closed by libuv, instead run the callbacks in the uv__run_closing_handles() phase of the eloop.

Now that the net close callbacks don't run until after libuv has had a chance to properly close the uv_handle_t the custom close callback in cluster is no longer necessary.

trevnorris · 2014-02-04T01:48:47Z

@tjfontaine I'm still of the opinion that we should be running the close() callback in the proper phase of the event loop. What's your opinion here?

trevnorris · 2015-06-23T22:58:12Z

Still think it should be done this way, but not taking the time to update the code.

trevnorris added 5 commits January 6, 2014 11:39

net: run close callbacks in correct eloop phase

5e73f2b

Instead of running the close callbacks seemingly synchronously instead of when the handle has actually been closed by libuv, instead run the callbacks in the uv__run_closing_handles() phase of the eloop.

cluster: remove custom close callback

c903c7e

Now that the net close callbacks don't run until after libuv has had a chance to properly close the uv_handle_t the custom close callback in cluster is no longer necessary.

dgram: emit close when handle is closed

44ad789

child_process: emit close in proper phase of eloop

d5f2a8f

zlib: emit after properly closed

5121959

trevnorris mentioned this pull request Aug 20, 2014

clarify the 'close' event #8209

Closed

trevnorris closed this Jun 23, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

net: WIP run close callbacks in correct eloop phase #6802

net: WIP run close callbacks in correct eloop phase #6802

trevnorris commented Jan 4, 2014

trevnorris commented Jan 4, 2014

tjfontaine commented Jan 4, 2014

trevnorris commented Jan 4, 2014

tjfontaine commented Jan 4, 2014

trevnorris commented Jan 5, 2014

trevnorris commented Jan 5, 2014

tjfontaine commented Jan 5, 2014

trevnorris commented Jan 5, 2014

trevnorris commented Jan 5, 2014

trevnorris commented Jan 5, 2014

trevnorris commented Feb 4, 2014

trevnorris commented Jun 23, 2015

net: WIP run close callbacks in correct eloop phase #6802

net: WIP run close callbacks in correct eloop phase #6802

Conversation

trevnorris commented Jan 4, 2014

trevnorris commented Jan 4, 2014

tjfontaine commented Jan 4, 2014

trevnorris commented Jan 4, 2014

tjfontaine commented Jan 4, 2014

trevnorris commented Jan 5, 2014

trevnorris commented Jan 5, 2014

tjfontaine commented Jan 5, 2014

trevnorris commented Jan 5, 2014

trevnorris commented Jan 5, 2014

trevnorris commented Jan 5, 2014

trevnorris commented Feb 4, 2014

trevnorris commented Jun 23, 2015