Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dbnode] Skip entries for unowned shards in commit log bootstrapper #2145

Merged
merged 9 commits into from
Feb 24, 2020

Conversation

justinjc
Copy link
Collaborator

What this PR does / why we need it:

Fixes #2129

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

@robskillington
Copy link
Collaborator

robskillington commented Feb 16, 2020

Hm, believe the naive case of this was fixed with PR #2052.

Although I guess this could be plausible if a bootstrap is already happening and then a further topology change happens, therefore the current bootstrap is bootstrapping data for shards that was valid at beginning of bootstrap but is invalidated by a consequent topology change.

If so, this is a good catch! And unfortunate I introduced it for this upcoming release (in current release candidate) =[

If that's what we're protecting against here, this can also happen in the peer bootstrapper which also uses accumulator.CheckoutSeries...(...) interface.

Maybe we should introduce a third return to the CheckoutSeries...(...) methods that causes the caller to deal with the series no longer being owned.

i.e.

	CheckoutSeriesWithoutLock(
		shardID uint32,
		id ident.ID,
		tags ident.TagIterator,
	) (result CheckoutSeriesResult, owned bool, err error)

	CheckoutSeriesWithLock(
		shardID uint32,
		id ident.ID,
		tags ident.TagIterator,
	) (result CheckoutSeriesResult, owned bool, err error)

This should force all existing call sites to be updated (like the peers bootstrapper call site which I don't believe is updated with this change) and any new ones to need to consider the case when the series is no longer owned.

I'm thinking maybe something like this:

result, owned, err := accumulator.CheckoutSeriesWithoutLock(...)
if err != nil {
 // .. normal err case
}

if !owned {
  // deal with the series not being owned
}

// otherwise use result

Thoughts?

@justinjc
Copy link
Collaborator Author

@robskillington yeah that makes sense. Unfortunately we did hit this issue during multiple topology changes that happened in relatively quick succession.

Your proposed interface changes sound good - let me do some refactoring.

@codecov
Copy link

codecov bot commented Feb 19, 2020

Codecov Report

Merging #2145 into master will increase coverage by 7.0%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #2145      +/-   ##
=========================================
+ Coverage    65.2%   72.3%    +7.0%     
=========================================
  Files         991    1016      +25     
  Lines      102460   87969   -14491     
=========================================
- Hits        66885   63654    -3231     
+ Misses      30735   20047   -10688     
+ Partials     4840    4268     -572     
Flag Coverage Δ
#aggregator 82.0% <ø> (+0.7%) ⬆️
#cluster 85.3% <ø> (+7.6%) ⬆️
#collector 82.8% <ø> (+31.9%) ⬆️
#dbnode 79.2% <82.2%> (+7.1%) ⬆️
#m3em 74.4% <ø> (+29.5%) ⬆️
#m3ninx 74.2% <ø> (+7.0%) ⬆️
#m3nsch 51.1% <ø> (-48.9%) ⬇️
#metrics 17.6% <ø> (-82.4%) ⬇️
#msg 74.8% <ø> (+0.4%) ⬆️
#query 68.3% <ø> (-31.7%) ⬇️
#x 83.1% <ø> (-16.9%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4cd86c7...b222f59. Read the comment docs.

@robskillington robskillington changed the title Skip entries for unowned shards in commit log bootstrapper [dbnode] Skip entries for unowned shards in commit log bootstrapper Feb 23, 2020
Copy link
Collaborator

@robskillington robskillington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I changed the commit log source bootstrapper code a tiny bit to be more explicit rather than assume any series with a namespace not set needs to be skipped (would rather that panics as should be since unexpected for that to happen and would be a new code bug introduced)

@robskillington robskillington merged commit de568ba into master Feb 24, 2020
@robskillington robskillington deleted the juchan/commit-log-wrong-shard branch February 24, 2020 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nodes could fail to bootstrap using the commit log bootstrapper due to topology changes
2 participants