-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shutdown gracefully on SIGPIPE, SIGHUP #2114
Labels
Comments
Would love to take a stab at resolving this if no one is working on it already! |
@mstallmo That would be awesome! There's nobody working on it currently, it's all yours! |
michaelsproul
added
v1.5.2
The release after v1.5.1
v2.0.0
Altair on mainnet release (v2.0.0)
and removed
v1.5.1
To be included in the v1.5.1 relase
v1.5.2
The release after v1.5.1
labels
Aug 26, 2021
bors bot
pushed a commit
that referenced
this issue
Aug 30, 2021
## Issue Addressed Resolves #2114 Swapped out the ctrlc crate for tokio signals to hook register handlers for SIGPIPE and SIGHUP along with SIGTERM and SIGINT. ## Proposed Changes - Swap out the ctrlc crate for tokio signals for unix signal handing - Register signals for SIGPIPE and SHIGUP that trigger the same shutdown procedure as SIGTERM and SIGINT ## Additional Info I tested these changes against the examples in the original issue and noticed some interesting behavior on my machine. When running `lighthouse bn --network pyrmont |& tee -a pyrmont_bn.log` or `lighthouse bn --network pyrmont 2>&1 | tee -a pyrmont_bn.log` none of the above signals are sent to the lighthouse program in a way I was able to observe. The only time it seems that the signal gets sent to the lighthouse program is if there is no redirection of stderr to stdout. I'm not as familiar with the details of how unix signals work in linux with a redirect like that so I'm not sure if this is a bug in the program or expected behavior. Signals are correctly received without the redirection and if the above signals are sent directly to the program with something like `kill`.
Finally got to the bottom of this here: #2486 (review) For anyone wanting to see the shutdown output in |
bors bot
pushed a commit
that referenced
this issue
Sep 3, 2021
## Proposed Changes Remove the SIGPIPE handler added in #2486. We saw some of the testnet nodes running under `systemd` being stopped due to `journald` restarts. The systemd docs state: > If systemd-journald.service is stopped, the stream connections associated with all services are terminated. Further writes to those streams by the service will result in EPIPE errors. In order to react gracefully in this case it is recommended that programs logging to standard output/error ignore such errors. If the SIGPIPE UNIX signal handler is not blocked or turned off, such write attempts will also result in such process signals being generated, see signal(7). From https://www.freedesktop.org/software/systemd/man/systemd-journald.service.html ## Additional Info It turned out that the issue described in #2114 was due to `tee`'s behaviour rather than Lighthouse's, so the SIGPIPE handler isn't required for any current use case. An alternative to disabling it all together would be to exit with a non-zero code so that systemd knows to restart the process, but it seems more desirable to tolerate journald glitches than to restart frequently.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Description
When running Lighthouse in a shell pipeline it fails to exit gracefully because of the lack of signal handler for
SIGPIPE
. I noticed that my Lighthouse nodes wouldn't shutdown gracefully from my runner script, and it took a bit of digging to find the cause.This command reproduces the issue (
bash
assumed):Or equivalently:
Both of these commands redirect lighthouse's stderr (
2
) to its stdout (&1
), and then send stdout totee
for writing to the terminal and a file simultaneously (a common UNIX pattern). If you hit^C
while running either of these commands Lighthouse will exit immediately, without running the shutdown handlers, runningdrop
on theBeaconChain
, or printing these messages:Version
Lighthouse v1.0.4
Steps to resolve
Currently we use the
ctrlc
crate to register a signal handler forSIGINT
andSIGTERM
. Unfortunately it seems this crate doesn't allow us to register handlers for other signals.lighthouse/lighthouse/environment/src/lib.rs
Lines 382 to 392 in 1abc70e
Using another crate (or maybe
tokio::signal
), we should ensure that this handler runs for all of the signals we can reasonably handle. Based on the list here I think that should just beSIGINT
,SIGTERM
, plus:SIGPIPE
: The SIGPIPE signal is sent to a process when it attempts to write to a pipe without a process connected to the other end.SIGHUP
: The SIGHUP signal is sent to a process when its controlling terminal is closed.The text was updated successfully, but these errors were encountered: