From ffaf845335b50abfcc49b8ac56d0677cba20f542 Mon Sep 17 00:00:00 2001 From: Raphael 'kena' Poss Date: Wed, 9 Jan 2019 11:37:23 +0100 Subject: [PATCH] storage: bump RaftDelaySplitToSuppressSnapshotTicks The admin split path must accommodate a scenario where a range with not-yet-replicated followers is being manually split multiple times (eg. IMPORT during TPCC test fixtures). This scenario results in a bunch of replicas that all need to be populated with snapshots. To avoid backing up the raft snapshot queue, a heuristic was put in place (#32594) to delay the admin split if there is another snapshot being applied already. As shown by investigation in a failing test, there is a mismatch between the configured max delay for this heuristic (20s) and the actual duration of the snapshot application - the latter is limited by the max bandwidth for snapshots, 2 MB/s resulting in 32s applications in the worst case. We (Tobias and I) suspect that the heuristic thus fails to wait long enough to have the protective effect it was designed to provide. The current patch increases this delay to exceed this snapshot application duration estimate to about 50s. Note that this scenario is not likely to occur now that #32782 has been merged (this reduces the need for raft snapshots during splits); however in release-2.1 where that patch was not applied, the scenario is more likely. Release note (bug fix): resolve a cluster degradation scenario that could occur during IMPORT/RESTORE operations, manifested through a high number of pending Raft snapshots. --- pkg/base/config.go | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/pkg/base/config.go b/pkg/base/config.go index 4e4d972bab03..a13a0ec96098 100644 --- a/pkg/base/config.go +++ b/pkg/base/config.go @@ -533,10 +533,16 @@ func (cfg *RaftConfig) SetDefaults() { } if cfg.RaftDelaySplitToSuppressSnapshotTicks == 0 { - // A total of 100 ticks is >18s which experimentally has been shown to - // allow the small pile (<100) of Raft snapshots observed at the - // beginning of an import/restore to be resolved. - cfg.RaftDelaySplitToSuppressSnapshotTicks = 100 + // The Raft Ticks interval defaults to 200ms, and an election is 15 + // ticks. Add a generous amount of ticks to make sure even a backed up + // Raft snapshot queue is going to make progress when a (not overly + // concurrent) amount of splits happens. + // The generous amount should result in a delay sufficient to + // transmit at least one snapshot with the slow delay, which + // with default settings is max 64MB at 2MB/s, ie 32 seconds. + // + // The resulting delay configured here is about 50s. + cfg.RaftDelaySplitToSuppressSnapshotTicks = 3*cfg.RaftElectionTimeoutTicks + 200 } }