Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve how M3DB handles data durability during topology changes #1183

Merged
merged 58 commits into from
Dec 14, 2018

Conversation

richardartoul
Copy link
Contributor

@richardartoul richardartoul commented Nov 16, 2018

Previously, M3DB implemented durability during topology changes by forcing the peers bootstrapper to write out a snapshot file for any mutable data that was streamed in from a peer. This ensured that if the node crashed after the topology change, the commitlog bootstrapper could restore all the data the node was expected to have.

We're currently trying to move M3DB to a model where every set of snapshot files indicates that all the data that had been received by the node up until the beginning of the snapshot could be recovered from a combination of the data files, snapshot files, and commitlog files. The existing implementation clashes with that idea because it was writing out individual snapshot files that were not tied to any large snapshot process.

As a result, this P.R modifies M3DB to eschew writing out snapshot files for mutable data in the peers bootstrapper, and instead it prevents the clustered database from marking shards as available until the last successful snapshot began AFTER the last bootstrap completed and all bootstrapping for the topology change has completed. This is a much simpler model that is robust against future changes to the database.

@codecov
Copy link

codecov bot commented Nov 16, 2018

Codecov Report

Merging #1183 into master will increase coverage by 0.2%.
The diff coverage is 72%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #1183     +/-   ##
========================================
+ Coverage    70.8%   71.1%   +0.2%     
========================================
  Files         748     737     -11     
  Lines       63055   61944   -1111     
========================================
- Hits        44653   44043    -610     
+ Misses      15527   15042    -485     
+ Partials     2875    2859     -16
Flag Coverage Δ
#aggregator 81.6% <ø> (-0.1%) ⬇️
#cluster 85.6% <ø> (-0.1%) ⬇️
#collector 78.1% <ø> (-0.3%) ⬇️
#dbnode 80.7% <72%> (-0.2%) ⬇️
#m3em 73.2% <ø> (ø) ⬆️
#m3ninx 75.4% <ø> (+0.2%) ⬆️
#m3nsch 51.1% <ø> (ø) ⬆️
#metrics 18.3% <ø> (+0.1%) ⬆️
#msg 74.9% <ø> (-0.1%) ⬇️
#query 61.7% <ø> (+1.4%) ⬆️
#x 74.4% <ø> (-1.7%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4386243...8cfaefb. Read the comment docs.

@richardartoul richardartoul changed the title [WIP] - Fix how M3DB handles data durability during topology changes Fix how M3DB handles data durability during topology changes Nov 16, 2018
@richardartoul richardartoul changed the title Fix how M3DB handles data durability during topology changes [WIP] - Fix how M3DB handles data durability during topology changes Nov 16, 2018
@richardartoul richardartoul changed the title [WIP] - Fix how M3DB handles data durability during topology changes Improve how M3DB handles data durability during topology changes Nov 19, 2018
// is going on are both handled correctly. In addition, this will ensure that we hold onto both
// sets of data durably after topology changes and that the node can be properly bootstrapped
// from just the filesystem and commitlog in a later portion of the test.
seriesToWriteDuringPeerStreaming := []string{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just add this as a separate test? Seems like we're repurposing this one and changing it quite substantially?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two separate tests, they just call into this shared codepath. I think its fine, if I separated them out completely they'd be almost complete copy-pasta of each other, and the existing test benefits from this additional check as well (even if you don't verify the commitlog behavior, if you're doing a node add you probably want to make sure the node adding keeps track of the data it receives from its peer as well as all the data its receiving while actually joining)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yeah makes sense.

// We expect consistency errors because we're only running with
// R.F = 2 and one node is leaving and one node is joining for
// each of the shards that is changing hands.
if !client.IsConsistencyResultError(err) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm maybe it's better to make it just RF=3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly just didn't do it because it would probably take a few hours to re-write all the sharding logic and fix any little issues that crop up and it doesn't really make the test any better. Can change it if you feel strongly

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Np, that's fine.

d.queueBootstrapWithLock()
}

func (d *db) hasReceivedNewShards(incoming sharding.ShardSet) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Rename to hasReceivedNewShardsWithLock?

func (d *db) hasReceivedNewShards(incoming sharding.ShardSet) bool {
var (
existing = d.shardSet
existingSet = map[uint32]struct{}{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: existingSet = make(map[uint32]struct{}, len(existing.AllIDs())

Copy link
Collaborator

@robskillington robskillington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than some nits and comment about only setting last bootstrapped time if multiErr.FinalError() == nil for the bootstrap.

@richardartoul richardartoul merged commit cff28c1 into master Dec 14, 2018
@justinjc justinjc deleted the ra/fix-placement-peer-bootstrap branch January 7, 2019 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants