Better Signal Handling #29480
Replies: 24 comments
-
I have finished typing! Reopening! |
Beta Was this translation helpful? Give feedback.
-
It's worth noting that SIGINT doesn't necessarily mean the program will exit. It means the system has requested that the program interrupt whatever work it's actively doing. For example in the REPL, sending SIGINT stops whatever is currently evaluating, it doesn't close the REPL. This is also why SIGINT doesn't guarantee the I think a key point here is that SIGINT is top-level, big public behavior, so individual functions shouldn't have a concept of SIGINT for cleanup, since other things could also cause cleanup. You'd want to bundle that all into some sort of "cleanup" interface and then flow your various conditions for cleanup into that. Also, there is also some work going into defining better ways of cleaning up resources: https://github.com/tc39/proposal-explicit-resource-management |
Beta Was this translation helpful? Give feedback.
-
The specific example of console.log(`PID ${process.pid}`);
process.on("exit", (code, signal) => {
console.log("exit called!", { code, signal });
});
setInterval(() => {
console.log('Still alive!')
}, 1000);
setTimeout(() => {
console.log("Goodbye!");
process.exit(0);
}, 10000);
process.stdin.resume(); If you run this and then send |
Beta Was this translation helpful? Give feedback.
-
I was there for the presentation, but I don't think it helps. It's not that |
Beta Was this translation helpful? Give feedback.
-
your example there is interesting. in general we can't fire an event for sigint exits if js is still running, the idea is to interrupt it after all, and js doesn't really work with reentrancy. I would, however, expect the event to be fired it if happens while waiting on the event loop. Perhaps it doesn't for consistency reasons. |
Beta Was this translation helpful? Give feedback.
-
Right, that's why I added the |
Beta Was this translation helpful? Give feedback.
-
What should libraries do? Let's say, you want to provide a async withTmpDir(callback) {
let dir = createTmpDir();
try {
return await callback(dir);
} finally {
removeTmpDir(dir);
}
} We already know that this doesn't cleanup the temp directory if you exit/SIGINT/SIGTERM before The nature of the cleanup work here isn't really that important, you can substitute "removing a temp directory" with a different task, like shutting down detached child processes, closing database connections, etc. It would be great if Node can provide the coordination needed to "bundle that all into some sort of "cleanup" interface. If the exit event does fire on SIGINT and SIGTERM, that is perhaps sufficient for this particular case. |
Beta Was this translation helpful? Give feedback.
-
ah i see the pattern you're going for now. yes, we wouldn't deliver such occurrences as exceptions because of a sort of "cardinal rule" which the ecosystem generally follows which is that the world isn't supposed to mutate while js is running, you'd have to wait until after the js finishes running to know if the outside world requested that the js stop running.
It's a good question. I've had the same issue working on libraries that mount virtual filesystems, and will easily crash the OS if not unmounted correctly. Lots of things actually have this issue, for example I have vim set up to run eslint automatically, and every once in a while eslint leaves some orphaned processes behind because i killed vim. Here is prior art on trying to make this less messy that I searched up, hopefully it can provide additional information: #20804 #2853 #9050 |
Beta Was this translation helpful? Give feedback.
-
There's no real solution here. Either you accept that:
If (2) is an acceptable middle ground for you, then you can already accomplish that through the The one thing Node.js can (perhaps/arguably) do better is wait for promises to resolve on normal exit, when there is no more work to do. That's being discussed in #29355 and doesn't need changes to Node.js core, strictly speaking. |
Beta Was this translation helpful? Give feedback.
-
Even if you accept (2), or a softer variant of (2) (up to some time limit, etc), I'm don't think it can really be done today. For one thing, signals and explicit exit does not emit 'beforeExit'. It's still the case that I believe there is no way to "do some cleanup work if the SIGINT is going to cause termination, but do nothing otherwise". |
Beta Was this translation helpful? Give feedback.
-
For the record: that's by design. SIGINT works the same way (unless you handle it) because that's the traditional/expected behavior for a UNIX application. |
Beta Was this translation helpful? Give feedback.
-
Correct, |
Beta Was this translation helpful? Give feedback.
-
Well, the "real solution" here is promise cancellation, which would have solved this - unfortuunately that was rejected by tc39 for a more explcit proposal that does no let us do this. Other languages do "creative" things like running I would also like to challenge the assumption we can't fix this - we can (as the platform and from the V8 side) throw an On a side note good issue and reproducing example @chancancode thanks 🙏 |
Beta Was this translation helpful? Give feedback.
-
The more I think about this the more I become convinced this is a footgun in how we deal with async functiions. If in this case: async function someTask() {
try {
await someOtherTask();
} finally {
// MUST PERFORM VERY IMPORTANT CLEAN UP!
}
} Sending a SIGINT (when no SIGINT signal handler is set up) could reject the promise returned from |
Beta Was this translation helpful? Give feedback.
-
I agree that we should continue looking for creative ways to solve this. Generally though, we should avoid growing the event loop (in fact we ignore new tasks during a normal exit), so i don't think we can rely on the promises themselves in the exit flow. |
Beta Was this translation helpful? Give feedback.
-
@devsnek I am really not sure what we can or should do but I am really not a fan of the user stories in our current behavior and I would love to see how this can be fixed from the language side and not from the platform side because this is essentially the same issues web-pages have if the user navigates from them during an operation or when a browser stops running JavaScript on a page to save power. |
Beta Was this translation helpful? Give feedback.
-
Perhaps one route could be registering https://github.com/tc39/proposal-explicit-resource-management scopes in a big list and calling them. We still end up with the problem that we can't dispatch this in a sync context due to design and implementation choices, and we can't dispatch this in an async context because we can't put tasks on the queue... |
Beta Was this translation helpful? Give feedback.
-
I mean, I get why we might not want to do either and I am not even sure we should propose either (I opened an issue in that repo) but I am not sure why we can't do either? |
Beta Was this translation helpful? Give feedback.
-
the decisions made in the past are: we don't want to do it synchronously because the state of the world can't change while js is running. we don't want to do it asynchronously because queuing more tasks means the process lives longer and we're trying to end the process. Obviously we can change our minds about these decisions, but they've been in place for a while. |
Beta Was this translation helpful? Give feedback.
-
Ok, so I think I understand, your point is that cleanup for stuff like this is typically asynchronous in nature anyway and we cannot (safely) delay the process exiting by waiting for asynchronous actions to happen. That makes sense and I agree. We theoretically could also have different behavior for a "clean" exit (like a SIGINT) and a "dirty" exit (like a process.exit(1)) but given that cleanup is "best effort" anyway I can totally also see not wanting to do anything here anyway. It appears that Python does the same thing we do in async functions. (You would do loop.add_signal_handler() on your event loop and handle it there but it doesn't get stuff thrown onto the async function). In C# you would be expected to pass a CancelToken throughout your app in a patten that looks like this: const token = { cancel() { throw new CancelError('cancelled')} };
async function someTask(token) {
try {
await someOtherTask(token);
} finally {
// MUST PERFORM VERY IMPORTANT CLEAN UP!
}
} And your interrupt handler would be And users would be expected to know to filter the CancelError (well CancelException since this example is in C#) in their instrumentation and logs. |
Beta Was this translation helpful? Give feedback.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
-
I think at its heart this issue is about the guarantees provided by the In Python, we love using the I am still new to this ecosystem, but the question whether I should do the same or not in NodeJS/JavaScript was one of the first ones I researched more deeply. I found the following discussion to be incredibly insightful: https://esdiscuss.org/topic/garbage-collection-in-generators Andreas Rosberg said about
that it is a
and Benjamin Gruenbaum (@benjamingr) was kind of representing my perspective in this discussion, saying that
and that
The fundamental difference between the approaches might be as of garbage collection, seemingly, as Mark S miller says:
but on the other hand Andreas Rossberg said
I don't think the discussion was concluded in a really satisfying way, but my big takeaway was one should not rely on the |
Beta Was this translation helpful? Give feedback.
-
I am not sure whether this is a question, bug report, or feature request, but here we go.
At the moment, it is really difficult to handle exits/signals correctly. Consider this:
This async function does not work "correctly". If the process received a signal causing it to terminate, or, for some reason, node decided to exit due to the event loop becoming empty while waiting for the other task, the
finally
block do not run at all.In "synchronous" languages like Java, Ruby, Python, signals that are reasonable to handle are sometimes converted into exceptions and delivered to the "currently running code". However, maybe that doesn't quite work in Node/JavaScript's model, since the function above is in fact not "running", and it's unclear who should be getting these exceptions.
I understand that there is no such thing as "surefire cleanup", since your process can always be terminated forcefully without a chance to run any code (SIGKILL, etc), so you must be prepared to deal with the consequence of an unclean exit anyway.
However, the user pressing Ctrl-C (SIGINT) isn't that unusual of a situation, and is usually expected to be as a "clean" exit, so it would be nice if we can properly cleanup after our code. Maybe in our function, we created a PID file that we would like to remove, or maybe we are supposed to shutdown some children processes.
Okay, so maybe we try something like this:
But now, we caused another problem: when we do receive these signals (e.g. Ctrl-C), the program no longer terminates. This is because by installing any handler for these signal events, it suppresses the default handler's behavior (terminate the program) and, as far as I know, there is no way to invoke that deafult behavior.
Okay, so maybe we can just mimic the behavior of the default handler ourselves?
This caused multiple new problems.
First, the default behavior of the signal handlers actually exit with a non-zero exit code, according to the docs, 128 + signal value. Presumably, this is for good reason and a behavior we want to match. Maybe this is not such a big deal, we could probably read the docs or source code to copy the logic.
But more importantly, by calling
process.exit(...)
in the signal handler, the rest of the handlers (if any) will be skipped over! So that means if more than one async task wanted to perform cleanup in this manner, only one of them can succeed.More over, maybe some other code intentionally installed a signal handler here to avoid exiting in the first place. This, in practice, means that it is unsafe for libraries to install signal handlers to perform cleanup.
Okay, so since all we really care about is to perform our cleanup on exit, so perhaps this is all we needed:
Unfortunately, this does not work either, because the exit event is not fired when the process is terminated by the default SIGINT handler (maybe this is a bug?).
So at this point, I feel like I am not left with any good option to do async cleanup properly, especially if you are writing a library and do not want to influence the global shutdown behavior just to accomplish your clanup goals.
As far as concrete suggestions on what should be done, I'm not quite sure, but here are a few points to consider.
From a DX perspective, it would be extremely nice if the async
try { ... } finally { ... }
Just Worked™ as it would in other environments. I understand why this is perhaps a poor fit for the Node/JavaScript programming model though.It would have been nice if the events handlers here are designed like the browser's event APIs, where you are just attaching some extra logic to run, and does not cancel the browser's default behavior by default, but there are ways to do that if that's what you wish to do (
preventDefault
/stopPropagation
). However, that ship has probably sailed.Under the current design, some way to invoke what-would-have-been the default handler's behavior would be nice. The current behavior is like
click
handlers in the browser havepreventDefault()
on by default with no way to opt back in.However, while the above is useful for "apps", where get to decide what shutdown behavior you want, and just want to run some extra logic before handing things back to the default handler, it is not enough to solve the problem for libraries.
For libraries, you really need a way to say "Okay, I did what I needed to do, now just keep going as if I wasn't here", which means 1) calling the rest of the signal handlers and 2) exit if no one else objects.
Maybe the last point can be accomplished by just firing the
exit
event more consistently. But I'm worried that there are other edge cases that causes theexit
event to not fire, other than theSIGINT
example I discovered.It is also possible that I just missed some common patterns or idioms to solve this problem?
Beta Was this translation helpful? Give feedback.
All reactions