Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully handle SIGPIPE #7040

Closed
rossmcdonald opened this issue Jul 20, 2016 · 12 comments
Closed

Gracefully handle SIGPIPE #7040

rossmcdonald opened this issue Jul 20, 2016 · 12 comments
Assignees
Milestone

Comments

@rossmcdonald
Copy link
Contributor

Earlier versions of journald would send a SIGPIPE to any systemd processes when journald was restarted. So far we have bypassed this issue by redirecting STDOUT and STDERR to the traditional log paths, however since we're no longer doing this it would make sense to handle SIGPIPEs in a graceful manner to ensure the service doesn't crash when journald is restarted.

A similar fix was applied to Docker here: jwhonce/docker@b7f60f7

@jwilder jwilder added this to the 1.0.0 milestone Jul 21, 2016
@jsternberg jsternberg self-assigned this Jul 21, 2016
@jsternberg
Copy link
Contributor

I concur with @nathanielc's conclusion here for Kapacitor: influxdata/kapacitor#737 (comment)

@rossmcdonald I think the proper answer is to upgrade systemd since it seems to be their problem. Unfortunately, we can't modify Go to notify us of a SIGPIPE on only fd 1 or 2 so that we can exit gracefully with an error exit code so systemd causes us to restart and I don't think ignoring SIGPIPE is correct. If this is absolutely necessary though, we can still just ignore the signal and warn people that InfluxDB will not stop if the file descriptor dies. Either way, there's going to need to be a note in the documentation about this issue with a caveat.

@rossmcdonald
Copy link
Contributor Author

rossmcdonald commented Jul 21, 2016

I'm actually seeing this issue on both systemd 208 and 219, so this might be something we need to investigate further. Here is the output from 219:

$ systemctl restart systemd-journald
$ systemctl status influxdb
● influxdb.service - InfluxDB is an open-source, distributed, time series database
   Loaded: loaded (/usr/lib/systemd/system/influxdb.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Thu 2016-07-21 19:58:25 CEST; 1min 37s ago
     Docs: https://docs.influxdata.com/influxdb/
 Main PID: 2573 (code=killed, signal=PIPE)

Depending on how we want to handle this, I think we can either:

  • Go the Docker route, and simply ignore SIGPIPE signals. I'm concerned that this means that logs may not be emitted correctly if journald is restarted, though.
  • Add a Restart=always to the unit file, so that InfluxDB will be restarted even in the event of a SIGPIPE. The downside here is that users will have to use systemctl to stop the daemon, as opposed to sending a SIGKILL SIGTERM.
  • Add a RestartForceExitStatus=SIGPIPE to the unit file, which tells the service to interpret SIGPIPE as a failure status and restart the process.

I'm thinking the third option is the cleanest, as we're not changing any other behavior apart from making SIGPIPE an exit status.

@jsternberg
Copy link
Contributor

I would favor the third option, but for the second option, I don't think we should encourage people to use SIGKILL. It's generally very bad to use that signal. I don't think it would cause any corruption, but it's still not something we want to encourage people to use.

@rossmcdonald
Copy link
Contributor Author

Oops, sorry, I meant SIGTERM. Agreed that people shouldn't be using SIGKILL on a normal basis. 😄

@sparrc
Copy link
Contributor

sparrc commented Jul 22, 2016

So what is the conclusion? Add RestartForceExitStatus=SIGPIPE to the systemd file? If anyone goes ahead and does this please apply patch to telegraf and kapacitor as well :)

cc @nathanielc

@rossmcdonald
Copy link
Contributor Author

Well it turns out the RestartForceExitStatus= option was not added until systemd 215, so the third option might not be very helpful for users stuck on older versions. I think in that case Restart=always might be the better option.

@jsternberg
Copy link
Contributor

I think Restart=always should be fine. We shouldn't encourage people to kill the daemon with SIGTERM but tell them to use the appropriate service manager anyway.

@nathanielc
Copy link
Contributor

Which versions are affected by this issue? I just tried it with systemd 229 and no issue. Restarting journald did not send a SIGPIPE and logs are still arriving in the journal after the restart.

Why can't we instruct uses with broken versions of systemd to add RestartForceExitStatus=SIGPIPE or Restart=always as an override config?

http://askubuntu.com/questions/659267/how-do-i-override-or-configure-systemd-services

Seems like a simple solution that doesn't risk creating issues for our wider user base.

@rossmcdonald
Copy link
Contributor Author

So far I've only tested with versions 208 and 219, which both exhibited this issue. We can certainly add some documentation around overriding the default unit file if using below a specific systemd version, I was just hoping to find a clean out-of-the-box fix we could use that would have minimal/no impact to current and outdated systemd users alike.

Since this isn't as simple of a fix as I was thinking it might be, we might just want to move this over as a docs issue.

@jwilder
Copy link
Contributor

jwilder commented Jul 25, 2016

@rossmcdonald @jsternberg @nathanielc Should we close this an re-open as a docs issue?

@rossmcdonald
Copy link
Contributor Author

Moving this to a doc issue here: influxdata/docs.influxdata.com-ARCHIVE#548

We can create a separate issue if we want to change the default restart behavior when running on systemd.

@TaborKelly
Copy link

Moving this to a doc issue here: influxdata/docs.influxdata.com#548

We can create a separate issue if we want to change the default restart behavior when running on systemd.

But you never actually changed the influxdb.service. That is, RestartForceExitStatus=SIGPIPE was never added to the service file, so this issue still occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants