-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dbnode] Shards assignment improvements during cluster topology changes #3425
Conversation
…ise warm and cold flushes might fail because some shards might still be not bootstrapped.
* master: Start (#3396) [query] Graphite fix exponentialMovingAverage seeded constant when num steps used for window size (#3395) [query] Fix Graphite exponentialMovingAverage to use correct EMA const and return single series per input series (#3391) [DOCS] Add contribution guide for documentation (#3365) [dbnode] Skip bootstrapping shards from aggregation (#3394)
…ssigned. check for bootstrapped shards when doing cold flush cleanups.
…it is better to make a check before calling cleanup.
* master: [dbnode] Decoder: fix handling of values requiring 64bit precision (#3406) Revert "Revert "[dbnode] Improve m3tsz decoding performance (#3358)" (#3403)" (#3405) [dbnode] TestReaderIteratorDecodingRegression (#3404) Revert "[dbnode] Improve m3tsz decoding performance (#3358)" (#3403) Update generated file (#3402)
…shards get assigned when file ops are not running.
…ely so that `IsBootstrappedAndDurable()` won't return true when db was previously bootstrapped and new bootstrap is enqueued.
Codecov Report
@@ Coverage Diff @@
## master #3425 +/- ##
=======================================
Coverage 72.1% 72.1%
=======================================
Files 1100 1100
Lines 103600 103600
=======================================
Hits 74780 74780
Misses 23663 23663
Partials 5157 5157
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
* master: [dbnode] Set default values for BootstrapPeersConfiguration (#3420) [integration-tests] Use explicit version for quay.io/m3db/prometheus_remote_client_golang (#3422) [dtest] Fix dtest docker compose config: env => environment (#3421) Fix broken links to edit pages (#3419) [dbnode] Fix races in source_data_test.go (#3418) [coordinator] add more information to processed count metric (#3415) [dbnode] Avoid use of grow on demand worker pool for fetchTagged and aggregate (#3416) [docs] Fix m3aggregagtor typo (#3417) [x/log] Bump zap version and add logging encoder configuration (#3377) Do not use buffer channels if growOnDemand is true (#3414) [dbnode] Fix TestSeriesWriteReadParallel datapoints too far in past with -race flag (#3413) [docs] Update m3db operator docs with v0.13.0 features (#3397) [aggregator] Fix followerFlushManager metrics (#3411) [query] Restore optimization to skip initial fetch for timeShift and unary fns (#3408)
Approach and core logic LGTM. |
* master: [query] Add Graphite find limit config integration test (#3428) [docs] Fix typo in consistency level description (#3431) Wrap errors from the m3 remote storage (#3427) [query] Add Graphite find and render limit option overrides (#3426) [coordinator] support for augmenting prom quantiles (#3424)
src/dbnode/storage/bootstrap.go
Outdated
asyncResult.bootstrapStarted.Done() | ||
|
||
// Keep performing bootstraps until none pending and no error returned. | ||
var result BootstrapResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can use a defer here? Just in case another code path returns early.
asyncResult.bootstrapStarted.Done() | |
// Keep performing bootstraps until none pending and no error returned. | |
var result BootstrapResult | |
// Keep performing bootstraps until none pending and no error returned. | |
var result BootstrapResult | |
asyncResult.bootstrapStarted.Done() | |
defer func() { | |
asyncResult.bootstrapResult = result | |
asyncResult.bootstrapCompleted.Done() | |
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Would need to remove the bootstrapResult = result assignment and bootstrapCompleted.Done() calls at bottom of method obviously)
@@ -409,13 +407,11 @@ func (d *clusterDB) analyzeAndReportShardStates() { | |||
count := d.bootstrapCount[id] | |||
if count != len(namespaces) { | |||
// Should never happen if bootstrapped and durable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Update this comment to say that this can temporarily occur due to race condition?
src/dbnode/storage/database.go
Outdated
if errors.Is(err, errMediatorNotOpen) { | ||
// initial assignment. | ||
d.assignShardSet(shardSet) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we special case this rather than return an error and proceed anyway?
i.e.
if !d.mediator.Open() {
// Initial assignment
d.assignShardSet(shardSet)
return
}
if err := d.mediator.EnqueueMutuallyExclusiveFn(func() {
d.assignShardSet(shardSet)
}); err != nil {
// log invariant error
}
src/dbnode/storage/mediator.go
Outdated
for len(b.externalFnCh) > 0 { | ||
externalFn := <-b.externalFnCh | ||
externalFn() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is racey if there were any other readers on the channel. It is better (and more idiomatic/common) to simply for loop and break when can't read anymore:
for {
select {
case fn := <-b.externalFnCh:
fn()
default:
break // Break from loop, no more to read
}
}
* master: [dbnode] Adaptive WriteBatch allocations (#3429)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What this PR does / why we need it:
Currently, during cluster topology changes, db node assigns new shards without taking into account already running background processes. This behaviour is problematic because some db node background processes (e.g. cold flush, warm flush, snapshotting) heavily rely on retrieving owned shards from namespaces. Usually these background processes use
db.IsBootstrapped()
to be sure that retrieved shards are bootstrapped and consistent. This check was not 100% sufficient because in current implementation new shards might be assigned afterdb.IsBootstrapped()
check returnstrue
thus making background processes unpredictable and inconsistent.This PR solves this issue by enqueueing new shardSet update and executing it when it is safe,- when no other background processes are running. We also wait until bootstrap is actually started when new shards are received, this ensures that new not bootstrapped shards will start bootstrapping (and whole
db
will be inBootstrapping
state) before background process are resumed, so the nextdb.IsBootstrapped
check for the background process will returnfalse
so the execution will be skipped until node is fully bootstrapped.Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?: