Skip to content

Commit

Permalink
server: (re-)fix data race in server initialization
Browse files Browse the repository at this point in the history
Fixes cockroachdb#91414.
Fixes  cockroachdb#101010.
Fixes cockroachdb#100902.

There was a race between registerEnginesForDiskStatsMap and
SetPebbleMetricsProvider, the latter making use of a map constructed by
the former but appeared in the opposite order in code.

Release note: None
  • Loading branch information
irfansharif committed Apr 10, 2023
1 parent 82e09ed commit 2ef89c9
Showing 1 changed file with 9 additions and 7 deletions.
16 changes: 9 additions & 7 deletions pkg/server/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -1788,6 +1788,15 @@ func (s *Server) PreStart(ctx context.Context) error {
// stores)
s.node.waitForAdditionalStoreInit()

// Connect the engines to the disk stats map constructor. This needs to
// wait until after waitForAdditionalStoreInit returns since it realizes on
// wholly initialized stores (it reads the StoreIdentKeys). It also needs
// to come before the call into SetPebbleMetricsProvider, which internally
// uses the disk stats map we're initializing.
if err := s.node.registerEnginesForDiskStatsMap(s.cfg.Stores.Specs, s.engines); err != nil {
return errors.Wrapf(err, "failed to register engines for the disk stats map")
}

// Stores have been initialized, so Node can now provide Pebble metrics.
//
// Note that all existing stores will be operational before Pebble-level
Expand All @@ -1798,13 +1807,6 @@ func (s *Server) PreStart(ctx context.Context) error {
// to bypass admission control.
s.storeGrantCoords.SetPebbleMetricsProvider(ctx, s.node, s.node)

// Connect the engines to the disk stats map constructor.
// This also needs to wait until after `waitForAdditionalStoreInit` returns,
// as the store IDs may not be known until then.
if err := s.node.registerEnginesForDiskStatsMap(s.cfg.Stores.Specs, s.engines); err != nil {
return errors.Wrapf(err, "failed to register engines for the disk stats map")
}

// Once all stores are initialized, check if offline storage recovery
// was done prior to start and record any actions appropriately.
logPendingLossOfQuorumRecoveryEvents(workersCtx, s.node.stores)
Expand Down

0 comments on commit 2ef89c9

Please sign in to comment.