Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harden shutdown logic #1037

Merged
merged 10 commits into from
Apr 21, 2015
Merged

Harden shutdown logic #1037

merged 10 commits into from
Apr 21, 2015

Commits on Apr 20, 2015

  1. Revert "fix ugly error message when killing commands"

    This reverts commit f74e71f.
    
    The 'Online' flag of the command context does not seem to be set in
    any code paths, at least not when running commands such as 'ipfs daemon'
    or 'ipfs ping'. The result after f74e71f is that we never shutdown
    cleanly, as we'll always os.Exit(0) from the interrupt handler.
    
    The os.Exit(0) itself is also dubious, as conceptually the interrupt
    handler should ask whatever is stalling to stop stalling, so that
    main() can return like normal. Exiting with -1 in error cases where
    the interrupt handler is unable to stop the stall is fine, but the
    normal case of interrupting cleanly should exit through main().
    torarnv authored and Tor Arne Vestbø committed Apr 20, 2015
    Configuration menu
    Copy the full SHA
    61efc4d View commit details
    Browse the repository at this point in the history
  2. corehttp: log when server takes a long time to shut down

    The server may stay alive for quite a while due to waiting on
    open connections to close before shutting down. We should
    find ways to terminate these connections in a more controlled
    manner, but in the meantime it's helpful to be able to see
    why a shutdown of the ipfs daemon is taking so long.
    torarnv authored and Tor Arne Vestbø committed Apr 20, 2015
    Configuration menu
    Copy the full SHA
    cc830ff View commit details
    Browse the repository at this point in the history
  3. corehttp: ensure node closing/teardown waits for server termination

    When closing a node, the node itself only takes care of tearing down
    its own children. As corehttp sets up a server based on a node, it
    needs to also ensure that the server is accounted for when determining
    if the node has been fully closed.
    torarnv authored and Tor Arne Vestbø committed Apr 20, 2015
    Configuration menu
    Copy the full SHA
    c9d3084 View commit details
    Browse the repository at this point in the history
  4. corehttp: disable HTTP keep-alive when shutting down server

    Once the server is asked to shut down, we stop accepting new
    connections, but the 'manners' graceful shutdown will wait for
    all existing connections closed to close before finishing.
    
    For keep-alive connections this will never happen unless the
    client detects that the server is shutting down through the
    ipfs API itself, and closes the connection in response.
    
    This is a problem e.g. with the webui's connections visualization,
    which polls the swarm/peers endpoint once a second, and never
    detects that the API server was shut down.
    
    We can mitigate this by telling the server to disable keep-alive,
    which will add a 'Connection: close' header to the next HTTP
    response on the connection. A well behaving client should then
    treat that correspondingly by closing the connection.
    
    Unfortunately this doesn't happen immediately in all cases,
    presumably depending on the keep-alive timeout of the browser
    that set up the connection, but it's at least a step in the
    right direction.
    torarnv authored and Tor Arne Vestbø committed Apr 20, 2015
    Configuration menu
    Copy the full SHA
    6fe8549 View commit details
    Browse the repository at this point in the history
  5. main: wait for interrupt to finish before ending command invocation

    If a command invocation such as 'daemon' is interrupted, the interrupt
    handler asks the node to close. The closing of the node will result in
    the command invocation finishing, and possibly returning from main()
    before the interrupt handler is done. In particular, the info logging
    that a graceful shutdown was completed may never reach reach stdout.
    
    As the whole point of logging "Gracefully shut down." is to give
    confidence when debugging that the shutdown was clean, this is
    slightly unfortunate.
    
    The interrupt handler needs to be set up in main() instead of Run(),
    so that we can defer the closing of the interrupt handler until just
    before returning from main, not when Run() returns with a streaming
    result reader.
    torarnv authored and Tor Arne Vestbø committed Apr 20, 2015
    Configuration menu
    Copy the full SHA
    00a6e59 View commit details
    Browse the repository at this point in the history
  6. Sync up request context with root context just before calling command

    The context passed on from main() may change before we hit callCommand,
    so setting it in Parse is a bit optimistic.
    torarnv authored and Tor Arne Vestbø committed Apr 20, 2015
    Configuration menu
    Copy the full SHA
    3d05764 View commit details
    Browse the repository at this point in the history
  7. Teach http client to cancel request on context cancellation

    The context may be cancelled while a request is in flight. We need to
    handle this and cancel the request. The code is based on the ideas
    from https://blog.golang.org/context
    torarnv authored and Tor Arne Vestbø committed Apr 20, 2015
    Configuration menu
    Copy the full SHA
    661fb0a View commit details
    Browse the repository at this point in the history
  8. Teach http client to abort channel streaming on context cancellation

    When the response includes the X-Chunked-Output header, we treat that
    as channel output, and fire up a goroutine to decode the chunks. This
    routine need to look for context cancellation so that it can exit
    cleanly.
    torarnv authored and Tor Arne Vestbø committed Apr 20, 2015
    Configuration menu
    Copy the full SHA
    cc45e21 View commit details
    Browse the repository at this point in the history
  9. Handle ipfs command interruption by cancelling the command context

    Instead of assuming the command is the daemon command and closing
    the node, which resulted in bugs like ipfs#1053, we cancel the context
    and let the context children detect the cancellation and gracefully
    clean up after themselves.
    
    The shutdown logging has been moved into the daemon command, where
    it makes more sense, so that commands like ping will not print out
    the same output on cancellation.
    torarnv authored and Tor Arne Vestbø committed Apr 20, 2015
    Configuration menu
    Copy the full SHA
    cf6a268 View commit details
    Browse the repository at this point in the history
  10. Remove daemon InitDone guard in interrupt handler

    Instead of just terminating right there and then, we cancel the
    context, and let the daemon exit cleanly. This make take a few
    seconds, as the node builder and its child processes do not
    care too much about the context state while building nodes,
    but this can be improved by injecting checks for ctx.Done()
    before time-consuming steps.
    torarnv authored and Tor Arne Vestbø committed Apr 20, 2015
    Configuration menu
    Copy the full SHA
    bfd1211 View commit details
    Browse the repository at this point in the history