Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

net: WIP run close callbacks in correct eloop phase #6802

Closed
wants to merge 5 commits into from
Closed

net: WIP run close callbacks in correct eloop phase #6802

wants to merge 5 commits into from

Conversation

trevnorris
Copy link

Instead of running the close callbacks seemingly synchronously instead
of when the handle has actually been closed by libuv, instead run the
callbacks in the uv__run_closing_handles() phase of the eloop.

There's an issue with the cluster module overriding the close() callback that prevents the callbacks from being executed at the correct time.

NOTE: This is missing tests and not fully functional. Don't merge yet.

@trevnorris
Copy link
Author

@tjfontaine This PR is basically what I'm talking about, where the close callbacks should actually be run in the proper phase of the eloop.

@tjfontaine
Copy link

there are certainly some cleaner paths here, I'm just worried about the subtle semantic changes we would be introducing here without necessarily having a reason to change the code to this right now

@trevnorris
Copy link
Author

Other than this is just TheRightWay(tm) I don't ever see a need (i.e. to
fix a bug) to implement this.

The only semantic change introduced is that if a setTimeout(..., 0); and
setImmediate(...); were called in the close callback in this one case the
setTimeout() would run first. Since the closing handles phase of the event
eloop occurs after the phase where setImmediate is run.

As far as performance, there is zero impact.

IMO it's in our best interest to run the user's callbacks in the phase of
the eloop where they'd be expected. As you can see, doing this I was able
to get rid of some of the cluster hackery around closing at the wrong time.

@tjfontaine
Copy link

We don't necessarily know at the moment what the performance difference is, we don't really have numbers to classify it.

@trevnorris
Copy link
Author

Really? I don't throw around performance implications lightly, and don't
appreciate terse statements contradicting what I've spent so many hours
programming and testing.

@trevnorris
Copy link
Author

If you want my tests and performance numbers than please just ask for them.
Instead of saying we don't have anything to back up what I'm saying.

@tjfontaine
Copy link

I'm not being dismissive, and it wasn't meant to upset you, I am not aware of any numbers or the benchmarks used to determine them. My point is merely that if we're going to make claims about performance then they do need to come with numbers and the cases so we can better qualify the decisions instead of making blanket statements.

I am mostly worried that implications of this change won't be completely obvious because of the subtlety of the change, similar to how core and its tests work fine with s/nextTick/setImmediate/ but in practice in real applications it's not quite that simple of a qualification.

To do the best by node we need to act from a point where we know there are problems, and have them understood as well as we can, instead of making assumptions and acting upon them.

@trevnorris
Copy link
Author

Other than some of the emit('close') events in process.nextTick(), which would prevent the eloop from continuing before event callback execution. Also the following case:

var s = require('net').createServer();
s.listen(8000, function() {
  s.close(function() {
    setTimeout(function() {
      console.log('setTimeout');
    }, 0);
    setImmediate(function() {
      console.log('setImmediate');
    });
  });
});

Where before the patch output would be:

setTimeout
setImmediate

and after the patch:

setImmediate
setTimeout

I dismissed both of these as potential problems because as I've interpreted your long standing belief that callback execution order of this type shouldn't be relied upon.

Those were the only cases I could think of. Please let me know if you can immediately think of any others.

@trevnorris
Copy link
Author

@tjfontaine While the following is not a real world example, I do believe it serves as a point of some flaws in Node architecture:

var net = require('net');

var cntr = 0;
(function doConnect() {
  cntr++;
  var s = net.createServer();
  s.listen(8000, function() {
    s.close(doConnect);
  });
}());

setInterval(function() {
  console.log('cntr: ' + cntr);
  cntr = 0;
}, 2000);

In current master it will just spin out of control and eventually crash. This is because the close() callback is executed via process.nextTick(). Whereas with this patch we get approximately the following:

...
cntr: 26831
cntr: 26882
14.35user 1.74system 0:16.02elapsed 100%CPU (46500maxresident)k

So, an alternative to get around the process.nextTick() issue is to setImmediate() the call instead. Doing this to lib/net.js we'll get the following:

...
cntr: 23851
cntr: 23998
13.25user 1.31system 0:14.48elapsed 100%CPU (47100maxresident)k

While not a significant performance impact, it is slight that setImmediate() is slower (I used a more in depth analysis than the simple numbers shown here), and it would still introduce a "subtle change".

So, IMO, the issue needs to be fixed that there are paths in Node that can cause your application to crash because of non-obvious implementation details. And since we're at it, might as well make sure the implementation is done correctly.

@trevnorris
Copy link
Author

Same issue can be seen w/ UDP:

var dgram = require('dgram');

var cntr = 0;
(function doBind() {
  cntr++;
  var s = dgram.createSocket('udp4');
  s.bind(8000, function() {
    s.close();
  });
  s.on('close', doBind);
}());

setInterval(function() {
  console.log('cntr: ' + cntr);
  cntr = 0;
}, 2000);

With patch:

...
cntr: 133264
cntr: 133573
5.34user 1.32system 0:06.67elapsed 100%CPU (15796maxresident)k

And w/o patch Node will crash. If we place the emit() in a setImmediate() we get the following:

cntr: 102547
cntr: 102627
9.36user 2.06system 0:11.42elapsed 100%CPU (0avgtext+0avgdata 16084maxresident)k

Side note: Why is the callback interface for UDP and TCP different?

Instead of running the close callbacks seemingly synchronously instead
of when the handle has actually been closed by libuv, instead run the
callbacks in the uv__run_closing_handles() phase of the eloop.
Now that the net close callbacks don't run until after libuv has had a
chance to properly close the uv_handle_t the custom close callback in
cluster is no longer necessary.
@trevnorris
Copy link
Author

@tjfontaine I'm still of the opinion that we should be running the close() callback in the proper phase of the event loop. What's your opinion here?

@trevnorris
Copy link
Author

Still think it should be done this way, but not taking the time to update the code.

@trevnorris trevnorris closed this Jun 23, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants