Skip to content

Commit

Permalink
daemon: remove listener closing race
Browse files Browse the repository at this point in the history
We use http.Server with a number of custom net.Listener instances.

This patch makes some changes to the daemon.Stop(...) function.

In order for the server instance to Shutdown() gracefully, the listener
termination must be performed by the shutdown process itself. The
server shutdown process is thread-safe, and tracks the listeners in
use. Thread safety is important because the http.Server has two places
where listeners can be terminated:

1. The thread hosting the Serve() method, using a deferred mutex
   protected closing function for all registered listeners.

2. The Shutdown() method (with mutex protection), which will close
   listeners async resuling in an error from the listener Accept(),
   but this special case is handled in Serve() and returns the
   ErrServerClosed error which means shutdown was done cleanly.

However, in daemon.go we also pre-close all the listeners before the
call to Shutdown(), which causes two kinds of errors:

1. accept tcp [::]:<port>: use of closed network connection

This error is actually masked because of tomb.Kill() calls used for
the termination of the daemon. See the daemon.Start() Serve() exit
code.

2. close tcp [::]:<port>: use of closed network connection

This is the actual race condition that can sometimes be seen in CI
logs. This race is caused by the two closing mechanisms discussed
above:

The race is handled correctly by ensuring only one close() path can
win (using sync.Do), but the problem is that the Serve() deferred Close()
discards the error, while the Shutdown() Close() records the error. If
The Shutdown() wins the Close(), we see an error.

This race is only triggered because we manually close the listener first.

A standalone simulation of the problem can be seen here:

https://gist.github.com/flotter/483d233d38f0961a52f7ef7405887012

The following changes are introduced:

- Remove the early manual listener closing code so its properly done by
  the Shutdown() call.
  • Loading branch information
flotter committed Jul 5, 2023
1 parent 8e96c74 commit 27f44c0
Showing 1 changed file with 0 additions and 9 deletions.
9 changes: 0 additions & 9 deletions internals/daemon/daemon.go
Original file line number Diff line number Diff line change
Expand Up @@ -568,17 +568,8 @@ func (d *Daemon) Stop(sigCh chan<- os.Signal) error {
restartSocket := d.restartSocket
d.mu.Unlock()

d.generalListener.Close()
d.standbyOpinions.Stop()

if d.untrustedListener != nil {
d.untrustedListener.Close()
}

if d.httpListener != nil {
d.httpListener.Close()
}

if restartSystem {
// give time to polling clients to notice restart
time.Sleep(rebootNoticeWait)
Expand Down

0 comments on commit 27f44c0

Please sign in to comment.