Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGINT/SIGTERM handling doesn't wait for promises #4931

Closed
Sytten opened this issue Feb 15, 2021 · 16 comments · Fixed by #4991
Closed

SIGINT/SIGTERM handling doesn't wait for promises #4931

Sytten opened this issue Feb 15, 2021 · 16 comments · Fixed by #4991

Comments

@Sytten
Copy link

Sytten commented Feb 15, 2021

Since #4273, there is a hook for serverWillStop that is called when the SIGTERM or SIGINT signals are received. For reasons which I still don't understand after reading the PR #4450, the code does the following:

const handler: NodeJS.SignalsListener = async () => {
await this.stop();
process.kill(process.pid, signal);
};
process.once(signal, handler);
this.toDispose.add(async () => {
process.removeListener(signal, handler);
});

If you actually test the thing with a server like express, you will quickly see that a promise that requires the event loop to go around (like let's say the apollo agent..........) will NOT, repeat NOT be executed and the process will be killed by the default signal handler before it can send the data over the network.

Now the fix is pretty simple, you have to capture the signal with process.on instead of process.once, remove the process.removeListener and call process.exit instead on process.kill.
This should take 5m to fix and it should be done ASAP @glasser

@glasser
Copy link
Member

glasser commented Feb 15, 2021

Can you be more clear as to what you mean by "a promise that requires the event loop to go around"?

The signal handler awaits the call to server.stop which includes any plugin serverWillStop handlers before going forward with shutdown. When you say "the apollo agent" do you mean the usage reporting plugin? It should block in serverWillStop until the final report is sent, right? This stuff is subtle and quite like has bugs, but I don't understand what you're describing. Can you share some code that demonstrates the issue?

@Sytten
Copy link
Author

Sytten commented Feb 15, 2021

It means anything that needs another loop of the event loop like IO or setImmediate. Just launch a plain apollo server on express and place a console log before the process.kill, it is never called. As soon as you yield the thread the process gets killed. I can give you a full example tomorrow.

@glasser
Copy link
Member

glasser commented Feb 15, 2021 via email

@Sytten
Copy link
Author

Sytten commented Feb 15, 2021

Maybe that is the case, because sending only a SIGINT using kill to the process seems to work.
But doing one CTRL+C (which also sends a SIGINT AFAIK), kills the process immediately unless the process handles the SIGINT with a process.on. This is weird.

@Sytten
Copy link
Author

Sytten commented Feb 15, 2021

Let's compare using this repo: https://github.com/Sytten/repo-apollo-kill

@Sytten
Copy link
Author

Sytten commented Feb 15, 2021

Interesting enough, yesterday my test showed that it didn't make any difference it was node or ts-node but now I have a different result (not sure why, will investigate).
Seems like the CTRL+C correctly handles the stop if running node directly but not ts-node.

EDIT: it also doesn't work (CTRL+C) if using tsconfig-paths/register with node, but it always works correctly if doing kill -SIGTERM PID.

@Sytten
Copy link
Author

Sytten commented Feb 15, 2021

So my takeaway is that for some reason, it might be that the SIGTERM is sent more than once so using a process.once is less reliable than using a process.on (which is standard anyway). So while it probably works ok in a container world where a nice SIGTERM is sent only once, it might not work well for setups that use pm2 or nodemon in production.
Sorry for being quite rude last night, I was aggravated by this but I still think this should be changed.

@Sytten
Copy link
Author

Sytten commented Feb 18, 2021

I did more tests and I found that doing a gzip doesn't work at all with the current system even when sending the signal with kill and removed other parameters (tsconfig-paths and ts-node):

import zlib from 'zlib';
import util from 'util';
const gzip = util.promisify(zlib.gzip);
...
const output = JSON.stringify(data);
const compressed = await gzip(Buffer.from(output));

This tells me that is should be fixed @glasser

@glasser
Copy link
Member

glasser commented Feb 23, 2021

Hi! Back from vacation and want to look at this soon.

I did a process.once because the idea was that we should handle the signal by starting 'stop' and then re-signal so that the death can still be "from the signal". However I think you're right that things get funky if the signal gets sent twice. Probably I should use process.on, make the handler do nothing if it's not the first invocation, and removeListener when done.

I can work on this soon, but I'm pretty confused by your final comment. What does gzip have to do with anything? In what context are you running the code in question?

@Sytten
Copy link
Author

Sytten commented Feb 23, 2021

Hey!

Yup I think this is the better approach.

Not sure to be honest, but I saw a difference in my tests between having only an IO await (like doing a POST) and doing a gzip await. I have no idea why, but doing a kill -SIGTERM $PID correctly waited for the IO but not for the gzip await???
All my code examples ran in a plugin serverWillStop.

glasser added a commit that referenced this issue Mar 6, 2021
Two improvements to the SIGINT and SIGTERM signal handlers that are installed by
default in non-test mode if you don't pass `stopOnTerminationSignals: false`.

- If another SIGINT or SIGTERM is recieved while awaiting `stop()`, ignore
  it (ie, prevent the process from exiting if this was the only handler for that
  signal). This is implemented by switching the `process.once` to `process.on`
  and explicitly tracking if a signal has been received. Fixes #4931.

- If `await this.stop()` throws, make sure to shut down the process anyway after
  logging. This does `process.exit(1)` instead of re-signaling because it can't
  be sure that the signal handler has been removed.
@glasser
Copy link
Member

glasser commented Mar 6, 2021

@Sytten How does #4991 look?

glasser added a commit that referenced this issue Mar 6, 2021
Two improvements to the SIGINT and SIGTERM signal handlers that are installed by
default in non-test mode if you don't pass `stopOnTerminationSignals: false`.

- If another SIGINT or SIGTERM is recieved while awaiting `stop()`, ignore
  it (ie, prevent the process from exiting if this was the only handler for that
  signal). This is implemented by switching the `process.once` to `process.on`
  and explicitly tracking if a signal has been received. Fixes #4931.

- If `await this.stop()` throws, make sure to shut down the process anyway after
  logging. This does `process.exit(1)` instead of re-signaling because it can't
  be sure that the signal handler has been removed.
@Sytten
Copy link
Author

Sytten commented Mar 6, 2021

Seems fine to me, a bit convoluted with the call of kill but it should do the trick (and it is well documented).
I am guessing this is to give a chance to other potential hooks to also do their cleaning?

@glasser
Copy link
Member

glasser commented Mar 6, 2021

Yeah, and I kinda like that the process will exit with an exit status indicating it died due to signal rather than an exit code. Maybe that's silly though.

@Sytten
Copy link
Author

Sytten commented Mar 6, 2021

Make sense I guess since by default this is on (not sure I agree but thats another topic) so it should be transparent for a user. If it was disabled by default then a standard exit would be correct.

glasser added a commit that referenced this issue Mar 9, 2021
Two improvements to the SIGINT and SIGTERM signal handlers that are installed by
default in non-test mode if you don't pass `stopOnTerminationSignals: false`.

- If another SIGINT or SIGTERM is recieved while awaiting `stop()`, ignore
  it (ie, prevent the process from exiting if this was the only handler for that
  signal). This is implemented by switching the `process.once` to `process.on`
  and explicitly tracking if a signal has been received. Fixes #4931.

- If `await this.stop()` throws, make sure to shut down the process anyway after
  logging. This does `process.exit(1)` instead of re-signaling because it can't
  be sure that the signal handler has been removed.
glasser added a commit that referenced this issue Mar 9, 2021
Two improvements to the SIGINT and SIGTERM signal handlers that are installed by
default in non-test mode if you don't pass `stopOnTerminationSignals: false`.

- If another SIGINT or SIGTERM is recieved while awaiting `stop()`, ignore
  it (ie, prevent the process from exiting if this was the only handler for that
  signal). This is implemented by switching the `process.once` to `process.on`
  and explicitly tracking if a signal has been received. Fixes #4931.

- If `await this.stop()` throws, make sure to shut down the process anyway after
  logging. This does `process.exit(1)` instead of re-signaling because it can't
  be sure that the signal handler has been removed.
@glasser
Copy link
Member

glasser commented Mar 16, 2021

I've published a release with this to version 2.21.2-alpha.0. My plan is to release 2.21.2 in a day or two. If you want to test that this fix works before then, try running the prerelease yourself, and provide feedback on #5037.

@Sytten
Copy link
Author

Sytten commented Mar 17, 2021

Will do I will test with my plugin!

kodiakhq bot added a commit to ProjectXero/dbds that referenced this issue Mar 17, 2021
Bumps apollo-datasource from 0.7.2 to 0.7.3.

Changelog
Sourced from apollo-datasource's changelog.

CHANGELOG
The version headers in this history reflect the versions of Apollo Server itself.  Versions of other packages (e.g., those which are not actual HTTP integrations; packages not prefixed with "apollo-server", or just supporting packages) may use different versions.
🆕 Please Note!: 🆕 The @apollo/federation and @apollo/gateway packages now live in the apollographql/federation repository.

@apollo/gateway
@apollo/federation

vNEXT

The changes noted within this vNEXT section have not been released yet.  New PRs and commits which introduce changes should include an entry in this vNEXT section as part of their development.  With few exceptions, the format of the entry should follow convention (i.e., prefix with package name, use markdown backtick formatting for package names and code, suffix with a link to the change-set à la [PR #YYY](https://link/pull/YYY), etc.).  When a release is being prepared, a new header will be (manually) created below and the appropriate changes within that release will be moved into the new section.


apollo-server-core: The SIGINT and SIGTERM signal handlers installed by default (when not disabled by stopOnTerminationSignals: false) now stay active (preventing process termination) while the server shuts down, instead of letting a second signal terminate the process. The handlers still re-signal the process after this.stop() concludes. Also, if this.stop() throws, the signal handlers will now log and exit 1 instead of throwing an uncaught exception. [Issue #4931](apollographql/apollo-server#4931)
apollo-server-lambda: (UPDATE THIS MESSAGE BEFORE RELEASE; we are not sure if this actually helps nodejs14 compatibility or if it's just a nice refactor.) Support the nodejs14 runtime by changing the handler to be an async handler. (For backwards compatibility, if the handler receives a callback, it still acts like a non-async handler.) [Issue #1989](apollographql/apollo-server#1989) [PR #5004](apollographql/apollo-server#5004)

v2.21.1

apollo-server-lambda: The onHealthCheck option did not previously work. Additionally, health checks (with onHealthCheck or without) didn't work in all Lambda contexts, such as behind Custom Domains; the path check is now more flexible. [Issue #3999](apollographql/apollo-server#3999) [PR #4969](apollographql/apollo-server#4969) [Issue #4891](apollographql/apollo-server#4891) [PR #4892](apollographql/apollo-server#4892)
The debug option to new ApolloServer (which adds stack traces to errors) now affects errors that come from requests executed with server.executeOperation (and its wrapper apollo-server-testing), instead of just errors that come from requests executed over HTTP. [Issue #4107](apollographql/apollo-server#4107) [PR #4948](apollographql/apollo-server#4948)
Bump version of @apollographql/graphql-playground-html to v1.6.27 and @apollographql/graphql-playground-react to v1.7.39 to resolve incorrectly rendered CDN URL when Playground version was false-y.  [PR #4932](apollographql/apollo-server#4932) [PR #4955](apollographql/apollo-server#4955) [Issue #4937](apollographql/apollo-server#4937)

v2.21.0

Apollo Server can now be installed with graphql@15 without causing peer dependency errors or warnings. (Apollo Server has a file upload feature which was implemented as a wrapper around the graphql-upload package. We have been unable to upgrade our dependency on that package due to backwards-incompatible changes in later versions, and the version we were stuck on did not allow graphql@15 as a peer dependency. We have now switched to a fork of that old version called @apollographql/graphql-upload-8-fork that allows graphql@15.) Also bump the graphql-tools dependency from 4.0.0 to 4.0.8 for graphql@15 support. [Issue #4865](apollographql/apollo-server#4865)

v2.20.0

apollo-server: Previously, ApolloServer.stop() functioned like net.Server.close() in that it did not close idle connections or close active connections after a grace period. This meant that trying to await ApolloServer.stop() could hang indefinitely if there are open connections. Now, this method closes idle connections, and closes active connections after 10 seconds. The grace period can be adjusted by passing the new stopGracePeriodMillis option to new ApolloServer, or disabled by passing Infinity (though it will still close idle connections). Note that this only applies to the "batteries-included" ApolloServer in the apollo-server package with its own built-in Express and HTTP servers. [PR #4908](apollographql/apollo-server#4908) [Issue #4097](apollographql/apollo-server#4097)
apollo-server-core: When used with ApolloGateway, ApolloServer.stop now invokes ApolloGateway.stop. (This makes sense because ApolloServer already invokes ApolloGateway.load which is what starts the behavior stopped by ApolloGateway.stop.) Note that @apollo/gateway 0.23 will expect to be stopped in order for natural program shutdown to occur. [PR #4907](apollographql/apollo-server#4907) [Issue #4428](apollographql/apollo-server#4428)
apollo-server-core: Avoid instrumenting schemas for the old graphql-extensions library unless extensions are provided. [PR #4893](apollographql/apollo-server#4893) [Issue #4889](apollographql/apollo-server#4889)
[email protected]: The shouldReadFromCache and shouldWriteToCache hooks were always documented as returning ValueOrPromise<boolean> (ie, that they could be either sync or async), but they actually only worked if they returned a bool. Now they can be either sync or async as intended. [PR #4890](apollographql/apollo-server#4890) [Issue #4886](apollographql/apollo-server#4886)
[email protected]: The RESTDataSource.trace method is now protected instead of private to allow more control over logging and metrics. [PR #3940](apollographql/apollo-server#3940)

v2.19.2

apollo-server-express: types: Export ExpressContext from main module. [PR #4821](apollographql/apollo-server#4821) [Issue #3699](apollographql/apollo-server#3699)
apollo-server-env: types: The first parameter to fetch is now marked as required, as intended and in accordance with the Fetch API specification. [PR #4822](apollographql/apollo-server#4822) [Issue #4741](apollographql/apollo-server#4741)
apollo-server-core: Update graphql-tag package to latest, now with its graphql-js peerDependencies expanded to include ^15.0.0 [PR #4833](apollographql/apollo-server#4833)

v2.19.1

apollo-server-core: The debugPrintReports option to ApolloServerPluginUsageReporting now prints traces as well. [PR #4805](apollographql/apollo-server#4805)

v2.19.0

apollo-server-testing: types: Allow generic variables usage of query and mutate functions. [PR #4383](apollograpqh/apollo-server#4383)
apollo-server-express: Export the GetMiddlewareOptions type. [PR #4599](apollograpqh/apollo-server#4599)
apollo-server-lambda: Fix file uploads - ignore base64 decoding for multipart queries. [PR #4506](apollographql/apollo-server#4506)
apollo-server-core: Do not send  operation documents that cannot be executed to Apollo Studio. Instead, information about these operations will be combined into one "operation" for parse failures, one for validation failures, and one for unknown operation names.



... (truncated)


Commits

c212627 Release
See full diff in compare view




Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants