-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harden shutdown logic #1037
Harden shutdown logic #1037
Commits on Apr 20, 2015
-
Revert "fix ugly error message when killing commands"
This reverts commit f74e71f. The 'Online' flag of the command context does not seem to be set in any code paths, at least not when running commands such as 'ipfs daemon' or 'ipfs ping'. The result after f74e71f is that we never shutdown cleanly, as we'll always os.Exit(0) from the interrupt handler. The os.Exit(0) itself is also dubious, as conceptually the interrupt handler should ask whatever is stalling to stop stalling, so that main() can return like normal. Exiting with -1 in error cases where the interrupt handler is unable to stop the stall is fine, but the normal case of interrupting cleanly should exit through main().
Configuration menu - View commit details
-
Copy full SHA for 61efc4d - Browse repository at this point
Copy the full SHA 61efc4dView commit details -
corehttp: log when server takes a long time to shut down
The server may stay alive for quite a while due to waiting on open connections to close before shutting down. We should find ways to terminate these connections in a more controlled manner, but in the meantime it's helpful to be able to see why a shutdown of the ipfs daemon is taking so long.
Configuration menu - View commit details
-
Copy full SHA for cc830ff - Browse repository at this point
Copy the full SHA cc830ffView commit details -
corehttp: ensure node closing/teardown waits for server termination
When closing a node, the node itself only takes care of tearing down its own children. As corehttp sets up a server based on a node, it needs to also ensure that the server is accounted for when determining if the node has been fully closed.
Configuration menu - View commit details
-
Copy full SHA for c9d3084 - Browse repository at this point
Copy the full SHA c9d3084View commit details -
corehttp: disable HTTP keep-alive when shutting down server
Once the server is asked to shut down, we stop accepting new connections, but the 'manners' graceful shutdown will wait for all existing connections closed to close before finishing. For keep-alive connections this will never happen unless the client detects that the server is shutting down through the ipfs API itself, and closes the connection in response. This is a problem e.g. with the webui's connections visualization, which polls the swarm/peers endpoint once a second, and never detects that the API server was shut down. We can mitigate this by telling the server to disable keep-alive, which will add a 'Connection: close' header to the next HTTP response on the connection. A well behaving client should then treat that correspondingly by closing the connection. Unfortunately this doesn't happen immediately in all cases, presumably depending on the keep-alive timeout of the browser that set up the connection, but it's at least a step in the right direction.
Configuration menu - View commit details
-
Copy full SHA for 6fe8549 - Browse repository at this point
Copy the full SHA 6fe8549View commit details -
main: wait for interrupt to finish before ending command invocation
If a command invocation such as 'daemon' is interrupted, the interrupt handler asks the node to close. The closing of the node will result in the command invocation finishing, and possibly returning from main() before the interrupt handler is done. In particular, the info logging that a graceful shutdown was completed may never reach reach stdout. As the whole point of logging "Gracefully shut down." is to give confidence when debugging that the shutdown was clean, this is slightly unfortunate. The interrupt handler needs to be set up in main() instead of Run(), so that we can defer the closing of the interrupt handler until just before returning from main, not when Run() returns with a streaming result reader.
Configuration menu - View commit details
-
Copy full SHA for 00a6e59 - Browse repository at this point
Copy the full SHA 00a6e59View commit details -
Sync up request context with root context just before calling command
The context passed on from main() may change before we hit callCommand, so setting it in Parse is a bit optimistic.
Configuration menu - View commit details
-
Copy full SHA for 3d05764 - Browse repository at this point
Copy the full SHA 3d05764View commit details -
Teach http client to cancel request on context cancellation
The context may be cancelled while a request is in flight. We need to handle this and cancel the request. The code is based on the ideas from https://blog.golang.org/context
Configuration menu - View commit details
-
Copy full SHA for 661fb0a - Browse repository at this point
Copy the full SHA 661fb0aView commit details -
Teach http client to abort channel streaming on context cancellation
When the response includes the X-Chunked-Output header, we treat that as channel output, and fire up a goroutine to decode the chunks. This routine need to look for context cancellation so that it can exit cleanly.
Configuration menu - View commit details
-
Copy full SHA for cc45e21 - Browse repository at this point
Copy the full SHA cc45e21View commit details -
Handle ipfs command interruption by cancelling the command context
Instead of assuming the command is the daemon command and closing the node, which resulted in bugs like ipfs#1053, we cancel the context and let the context children detect the cancellation and gracefully clean up after themselves. The shutdown logging has been moved into the daemon command, where it makes more sense, so that commands like ping will not print out the same output on cancellation.
Configuration menu - View commit details
-
Copy full SHA for cf6a268 - Browse repository at this point
Copy the full SHA cf6a268View commit details -
Remove daemon InitDone guard in interrupt handler
Instead of just terminating right there and then, we cancel the context, and let the daemon exit cleanly. This make take a few seconds, as the node builder and its child processes do not care too much about the context state while building nodes, but this can be improved by injecting checks for ctx.Done() before time-consuming steps.
Configuration menu - View commit details
-
Copy full SHA for bfd1211 - Browse repository at this point
Copy the full SHA bfd1211View commit details