Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Add more unix signal handlers #2486

Closed
wants to merge 5 commits into from

Conversation

mstallmo
Copy link
Contributor

Issue Addressed

Resolves #2114

Swapped out the ctrlc crate for tokio signals to hook register handlers for SIGPIPE and SIGHUP along with SIGTERM and SIGINT.

Proposed Changes

  • Swap out the ctrlc crate for tokio signals for unix signal handing
  • Register signals for SIGPIPE and SHIGUP that trigger the same shutdown procedure as SIGTERM and SIGINT

Additional Info

I tested these changes against the examples in the original issue and noticed some interesting behavior on my machine. When running lighthouse bn --network pyrmont |& tee -a pyrmont_bn.log or lighthouse bn --network pyrmont 2>&1 | tee -a pyrmont_bn.log none of the above signals are sent to the lighthouse program in a way I was able to observe.

The only time it seems that the signal gets sent to the lighthouse program is if there is no redirection of stderr to stdout. I'm not as familiar with the details of how unix signals work in linux with a redirect like that so I'm not sure if this is a bug in the program or expected behavior.

Signals are correctly received without the redirection and if the above signals are sent directly to the program with something like kill.

@CLAassistant
Copy link

CLAassistant commented Jul 31, 2021

CLA assistant check
All committers have signed the CLA.

@paulhauner paulhauner added ready-for-review The code is ready for review v1.5.1 To be included in the v1.5.1 relase labels Aug 2, 2021
@mstallmo mstallmo force-pushed the fix_graceful_shutdown branch from 4b5a93a to 9de3875 Compare August 3, 2021 05:21
@mstallmo
Copy link
Contributor Author

mstallmo commented Aug 3, 2021

Pushed a couple of updates based on the CI failures from earlier. The current version of the PR should pass CI now.

@mstallmo mstallmo force-pushed the fix_graceful_shutdown branch from d357df7 to 3fa865b Compare August 18, 2021 05:40
@paulhauner
Copy link
Member

Thanks @mstallmo, this is looking great! We'll get to a review soon. New PRs are blocked until we release v1.5.0 (probably next week) and we've been busy with the Pyrmont fork.

I just wanted to let you know that it's on our radar and not going stale :)

@michaelsproul michaelsproul added v1.5.2 The release after v1.5.1 v2.0.0 Altair on mainnet release (v2.0.0) and removed v1.5.1 To be included in the v1.5.1 relase v1.5.2 The release after v1.5.1 labels Aug 26, 2021
@michaelsproul michaelsproul self-requested a review August 30, 2021 00:48
Copy link
Member

@michaelsproul michaelsproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

I spent some time digging into why the pipe commands from the original issue don't seem to work and I think they are actually working correctly, but we just can't see the output.

If tee receives the SIGINT from the shell, then it immediately shuts down and closes its stdin file descriptor which lighthouse was writing to. That means no more terminal output after tee exits, because both stderr and stdout from Lighthouse are redirected to tee. Lighthouse also receives the SIGINT that knocked out tee, and runs its own signal handler to begin shutting down. Anything that Lighthouse logs from this point on is lost, because it has nowhere to write it to. It exits with a 0 exit code, indicating that the process exited successfully (you need to echo ${PIPESTATUS[0]} in bash to see it). Further, if you modify Lighthouse's signal handler so that it panics, then Lighthouse exits it with a 101 exit code (the standard panic exit code), demonstrating that the signal handler does indeed still run. Even better, if you run tee with -i to stop it from exiting then it will record Lighthouse's shutdown messages before exiting.

I stumbled around testing things for a while before realising this, and I found this article incredibly helpful: https://www.cons.org/cracauer/sigint.html

Thanks again!

bors r+

@michaelsproul michaelsproul added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Aug 30, 2021
bors bot pushed a commit that referenced this pull request Aug 30, 2021
## Issue Addressed

Resolves #2114 

Swapped out the ctrlc crate for tokio signals to hook register handlers for SIGPIPE and SIGHUP along with SIGTERM and SIGINT.

## Proposed Changes

- Swap out the ctrlc crate for tokio signals for unix signal handing
- Register signals for SIGPIPE and SHIGUP that trigger the same shutdown procedure as SIGTERM and SIGINT

## Additional Info

I tested these changes against the examples in the original issue and noticed some interesting behavior on my machine. When running `lighthouse bn --network pyrmont |& tee -a pyrmont_bn.log` or `lighthouse bn --network pyrmont 2>&1 | tee -a pyrmont_bn.log` none of the above signals are sent to the lighthouse program in a way I was able to observe. 

The only time it seems that the signal gets sent to the lighthouse program is if there is no redirection of stderr to stdout. I'm not as familiar with the details of how unix signals work in linux with a redirect like that so I'm not sure if this is a bug in the program or expected behavior.

Signals are correctly received without the redirection and if the above signals are sent directly to the program with something like `kill`.
@bors bors bot changed the title Add more unix signal handlers [Merged by Bors] - Add more unix signal handlers Aug 30, 2021
@bors bors bot closed this Aug 30, 2021
bors bot pushed a commit that referenced this pull request Sep 3, 2021
## Proposed Changes

Remove the SIGPIPE handler added in #2486.

We saw some of the testnet nodes running under `systemd` being stopped due to `journald` restarts. The systemd docs state:

> If systemd-journald.service is stopped, the stream connections associated with all services are terminated. Further writes to those streams by the service will result in EPIPE errors. In order to react gracefully in this case it is recommended that programs logging to standard output/error ignore such errors. If the SIGPIPE UNIX signal handler is not blocked or turned off, such write attempts will also result in such process signals being generated, see signal(7).

From https://www.freedesktop.org/software/systemd/man/systemd-journald.service.html

## Additional Info

It turned out that the issue described in #2114 was due to `tee`'s behaviour rather than Lighthouse's, so the SIGPIPE handler isn't required for any current use case. An alternative to disabling it all together would be to exit with a non-zero code so that systemd knows to restart the process, but it seems more desirable to tolerate journald glitches than to restart frequently.
@mstallmo mstallmo deleted the fix_graceful_shutdown branch September 11, 2021 04:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge This PR is ready to merge. v2.0.0 Altair on mainnet release (v2.0.0)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants