Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon: remove listener closing race #253

Merged
merged 1 commit into from
Aug 1, 2023

Commits on Jul 7, 2023

  1. daemon: remove listener closing race

    We use http.Server with a number of custom net.Listener instances.
    
    This patch makes some changes to the daemon.Stop(...) function.
    
    In order for the server instance to Shutdown() gracefully, the listener
    termination must be performed by the shutdown process itself. The
    server shutdown process is thread-safe, and tracks the listeners in
    use. Thread safety is important because the http.Server has two places
    where listeners can be terminated:
    
    1. The thread hosting the Serve() method, using a deferred mutex
       protected closing function for all registered listeners.
    
    2. The Shutdown() method (with mutex protection), which will close
       listeners async resuling in an error from the listener Accept(),
       but this special case is handled in Serve() and returns the
       ErrServerClosed error which means shutdown was done cleanly.
    
    However, in daemon.go we also pre-close all the listeners before the
    call to Shutdown(), which causes two kinds of errors:
    
    1. accept tcp [::]:<port>: use of closed network connection
    
    This error is actually masked because of tomb.Kill() (when state is dying)
    for the termination of the daemon. See the daemon.Start() exit code.
    
    2. close tcp [::]:<port>: use of closed network connection
    
    This is the actual race condition symptom that can sometimes be seen in
    CI logs. Since the listeners are pre-closed, this error is expected. However,
    errors discovered during deferred closure in the Serve() thread are discarded
    while errors in Shutdown() as reported.
    
    The race is between the two closing paths. If the Shutdown path wins the
    sync.Once race, we see the error. If the deferred Serve() path wins, we see
    nothing.
    
    This race is only triggered because we manually close the listener first.
    
    A standalone simulation of the problem can be seen here:
    
    https://gist.github.com/flotter/483d233d38f0961a52f7ef7405887012
    
    The following changes are introduced:
    
    - Remove the early manual listener closing code so its properly done by
      the Shutdown() call.
    flotter committed Jul 7, 2023
    Configuration menu
    Copy the full SHA
    60be0c1 View commit details
    Browse the repository at this point in the history