[dbnode] Shards assignment improvements during cluster topology changes #3425

soundvibe · 2021-04-16T07:18:30Z

What this PR does / why we need it:

Currently, during cluster topology changes, db node assigns new shards without taking into account already running background processes. This behaviour is problematic because some db node background processes (e.g. cold flush, warm flush, snapshotting) heavily rely on retrieving owned shards from namespaces. Usually these background processes use db.IsBootstrapped() to be sure that retrieved shards are bootstrapped and consistent. This check was not 100% sufficient because in current implementation new shards might be assigned after db.IsBootstrapped() check returns true thus making background processes unpredictable and inconsistent.

This PR solves this issue by enqueueing new shardSet update and executing it when it is safe,- when no other background processes are running. We also wait until bootstrap is actually started when new shards are received, this ensures that new not bootstrapped shards will start bootstrapping (and whole db will be in Bootstrapping state) before background process are resumed, so the next db.IsBootstrapped check for the background process will return false so the execution will be skipped until node is fully bootstrapped.
Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

Does this PR require updating code package or user-facing documentation?:

…ise warm and cold flushes might fail because some shards might still be not bootstrapped.

* master: Start (#3396) [query] Graphite fix exponentialMovingAverage seeded constant when num steps used for window size (#3395) [query] Fix Graphite exponentialMovingAverage to use correct EMA const and return single series per input series (#3391) [DOCS] Add contribution guide for documentation (#3365) [dbnode] Skip bootstrapping shards from aggregation (#3394)

…ssigned. check for bootstrapped shards when doing cold flush cleanups.

* master: [DOCS] Configuration and component section overhaul (#3324) Pass a context to the query worker pool (#3350)

…it is better to make a check before calling cleanup.

* master: [dbnode] Decoder: fix handling of values requiring 64bit precision (#3406) Revert "Revert "[dbnode] Improve m3tsz decoding performance (#3358)" (#3403)" (#3405) [dbnode] TestReaderIteratorDecodingRegression (#3404) Revert "[dbnode] Improve m3tsz decoding performance (#3358)" (#3403) Update generated file (#3402)

…shards get assigned when file ops are not running.

…ely so that `IsBootstrappedAndDurable()` won't return true when db was previously bootstrapped and new bootstrap is enqueued.

codecov · 2021-04-16T07:36:57Z

Codecov Report

Merging #3425 (4f035bd) into master (4f035bd) will not change coverage.
The diff coverage is n/a.

❗ Current head 4f035bd differs from pull request most recent head 003ff06. Consider uploading reports for the commit 003ff06 to get more accurate results

@@           Coverage Diff           @@
##           master    #3425   +/-   ##
=======================================
  Coverage    72.1%    72.1%           
=======================================
  Files        1100     1100           
  Lines      103600   103600           
=======================================
  Hits        74780    74780           
  Misses      23663    23663           
  Partials     5157     5157

Flag	Coverage Δ
aggregator	`76.8% <0.0%> (ø)`
cluster	`84.9% <0.0%> (ø)`
collector	`84.3% <0.0%> (ø)`
dbnode	`78.3% <0.0%> (ø)`
m3em	`74.4% <0.0%> (ø)`
m3ninx	`73.5% <0.0%> (ø)`
metrics	`19.7% <0.0%> (ø)`
msg	`74.5% <0.0%> (ø)`
query	`66.9% <0.0%> (ø)`
x	`80.3% <0.0%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f035bd...003ff06. Read the comment docs.

* master: [dbnode] Set default values for BootstrapPeersConfiguration (#3420) [integration-tests] Use explicit version for quay.io/m3db/prometheus_remote_client_golang (#3422) [dtest] Fix dtest docker compose config: env => environment (#3421) Fix broken links to edit pages (#3419) [dbnode] Fix races in source_data_test.go (#3418) [coordinator] add more information to processed count metric (#3415) [dbnode] Avoid use of grow on demand worker pool for fetchTagged and aggregate (#3416) [docs] Fix m3aggregagtor typo (#3417) [x/log] Bump zap version and add logging encoder configuration (#3377) Do not use buffer channels if growOnDemand is true (#3414) [dbnode] Fix TestSeriesWriteReadParallel datapoints too far in past with -race flag (#3413) [docs] Update m3db operator docs with v0.13.0 features (#3397) [aggregator] Fix followerFlushManager metrics (#3411) [query] Restore optimization to skip initial fetch for timeShift and unary fns (#3408)

notbdu · 2021-04-20T19:53:24Z

Approach and core logic LGTM.

src/dbnode/storage/bootstrap.go

* master: [query] Add Graphite find limit config integration test (#3428) [docs] Fix typo in consistency level description (#3431) Wrap errors from the m3 remote storage (#3427) [query] Add Graphite find and render limit option overrides (#3426) [coordinator] support for augmenting prom quantiles (#3424)

src/dbnode/storage/bootstrap.go

robskillington · 2021-04-21T10:35:38Z

src/dbnode/storage/bootstrap.go

+	asyncResult.bootstrapStarted.Done()
+
 	// Keep performing bootstraps until none pending and no error returned.
 	var result BootstrapResult


Perhaps we can use a defer here? Just in case another code path returns early.

Suggested change

asyncResult.bootstrapStarted.Done()

// Keep performing bootstraps until none pending and no error returned.

var result BootstrapResult

// Keep performing bootstraps until none pending and no error returned.

var result BootstrapResult

asyncResult.bootstrapStarted.Done()

defer func() {

asyncResult.bootstrapResult = result

asyncResult.bootstrapCompleted.Done()

}()

(Would need to remove the bootstrapResult = result assignment and bootstrapCompleted.Done() calls at bottom of method obviously)

robskillington · 2021-04-21T10:37:32Z

src/dbnode/storage/cluster/database.go

@@ -409,13 +407,11 @@ func (d *clusterDB) analyzeAndReportShardStates() {
 		count := d.bootstrapCount[id]
 		if count != len(namespaces) {
 			// Should never happen if bootstrapped and durable.


nit: Update this comment to say that this can temporarily occur due to race condition?

robskillington · 2021-04-21T10:39:36Z

src/dbnode/storage/database.go

+		if errors.Is(err, errMediatorNotOpen) {
+			// initial assignment.
+			d.assignShardSet(shardSet)
+		} else {


Can we special case this rather than return an error and proceed anyway?

i.e.

if !d.mediator.Open() { // Initial assignment d.assignShardSet(shardSet) return } if err := d.mediator.EnqueueMutuallyExclusiveFn(func() { d.assignShardSet(shardSet) }); err != nil { // log invariant error }

robskillington · 2021-04-21T10:43:41Z

src/dbnode/storage/mediator.go

+		for len(b.externalFnCh) > 0 {
+			externalFn := <-b.externalFnCh
+			externalFn()
+		}


This is racey if there were any other readers on the channel. It is better (and more idiomatic/common) to simply for loop and break when can't read anymore:

for { select { case fn := <-b.externalFnCh: fn() default: break // Break from loop, no more to read } }

* master: [dbnode] Adaptive WriteBatch allocations (#3429)

robskillington

LGTM

soundvibe added 14 commits March 30, 2021 11:51

use IsBootstrappedAndDurable() instead of IsBootstrapped() otherw…

4fe5430

…ise warm and cold flushes might fail because some shards might still be not bootstrapped.

do not run file ops (cold and warm flush) when new shards are being a…

b4ac31a

…ssigned. check for bootstrapped shards when doing cold flush cleanups.

update unit test to validate handling of not bootstrapped shards.

7c4a9bc

Merge branch 'master' into linasn/cold-flush-cleanup-panic-fix

3e4c350

* master: [DOCS] Configuration and component section overhaul (#3324) Pass a context to the query worker pool (#3350)

removed IsBootstrapped method arg (boolean args are a code smell), …

02e86ec

…it is better to make a check before calling cleanup.

reduce locking on a db level when new shards are assigned.

4a21659

can use read lock for d.hasReceivedNewShardsWithLock()

83d2c7c

Enqueue assignShardSet fn when received update from topology so that …

56177c3

…shards get assigned when file ops are not running.

need to set lastReceivedNewShards when received new shards immediat…

006065e

…ely so that `IsBootstrappedAndDurable()` won't return true when db was previously bootstrapped and new bootstrap is enqueued.

cleaned up some code.

3dfeb14

ensure that bootstrap is started when new shards are assigned.

4bc496a

added BootstrapEnqueue().

6befcdf

soundvibe added 2 commits April 16, 2021 10:37

updated logging levels.

a8d3765

soundvibe requested review from notbdu and robskillington April 16, 2021 07:43

soundvibe added 3 commits April 16, 2021 12:51

more test coverage.

856939d

removed invariant violation

80ea6b9

linter fix

fbf17ee

notbdu reviewed Apr 20, 2021

View reviewed changes

src/dbnode/storage/bootstrap.go Outdated Show resolved Hide resolved

soundvibe added 2 commits April 21, 2021 10:14

set bootstrap result value directly.

80c0568

robskillington reviewed Apr 21, 2021

View reviewed changes

src/dbnode/storage/bootstrap.go Show resolved Hide resolved

robskillington reviewed Apr 21, 2021

View reviewed changes

soundvibe added 2 commits April 21, 2021 14:37

changes after review

4ba1d96

fixed failing tests.

49b7f62

soundvibe requested review from robskillington and notbdu April 21, 2021 13:27

Merge branch 'master' into linasn/assign-new-shards-fix

003ff06

* master: [dbnode] Adaptive WriteBatch allocations (#3429)

robskillington approved these changes Apr 22, 2021

View reviewed changes

soundvibe merged commit a4cee97 into master Apr 22, 2021

soundvibe deleted the linasn/assign-new-shards-fix branch April 22, 2021 13:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dbnode] Shards assignment improvements during cluster topology changes #3425

[dbnode] Shards assignment improvements during cluster topology changes #3425

soundvibe commented Apr 16, 2021 •

edited

Loading

codecov bot commented Apr 16, 2021 •

edited

Loading

notbdu commented Apr 20, 2021

robskillington Apr 21, 2021

robskillington Apr 21, 2021

robskillington Apr 21, 2021

robskillington Apr 21, 2021

robskillington Apr 21, 2021

robskillington left a comment

[dbnode] Shards assignment improvements during cluster topology changes #3425

[dbnode] Shards assignment improvements during cluster topology changes #3425

Conversation

soundvibe commented Apr 16, 2021 • edited Loading

codecov bot commented Apr 16, 2021 • edited Loading

Codecov Report

notbdu commented Apr 20, 2021

robskillington Apr 21, 2021

Choose a reason for hiding this comment

robskillington Apr 21, 2021

Choose a reason for hiding this comment

robskillington Apr 21, 2021

Choose a reason for hiding this comment

robskillington Apr 21, 2021

Choose a reason for hiding this comment

robskillington Apr 21, 2021

Choose a reason for hiding this comment

robskillington left a comment

Choose a reason for hiding this comment

soundvibe commented Apr 16, 2021 •

edited

Loading

codecov bot commented Apr 16, 2021 •

edited

Loading