Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: clean up preemptive snapshot when receiving replica ID as learner #40907

Conversation

ajwerner
Copy link
Contributor

This commit adds an annotation to raft request messages to
indicate that the sender believes the current replica is a
learner. If the current replica on the recipient was created as a
preemptive snapshot (it's initialized but not in the range) then
we should remove that replica immediately.

The first commit is #40892.

Release Justification: Release blocker, required for backwards compatibility.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ajwerner ajwerner force-pushed the ajwerner/deal-with-upgrade-from-preemptive-snapshots branch from 683c35c to b45f805 Compare September 19, 2019 15:32
@ajwerner ajwerner force-pushed the ajwerner/deal-with-upgrade-from-preemptive-snapshots branch 6 times, most recently from 40b4b30 to 0fb88ee Compare September 20, 2019 15:24
Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, and @nvanbenschoten)


pkg/storage/raft.proto, line 47 at r2 (raw file):

  // TODO(ajwerner): remove in 20.2 once we ensure that preemptive snapshots can
  // no longer be present and that we're never talking to a 19.2 node.
  optional bool to_is_learner = 7 [(gogoproto.nullable) = false];

Is RaftHeartbeat really the only place we need this?


pkg/storage/store.go, line 3304 at r2 (raw file):

// learnerType exists to avoid allocating on every coalesced beat to a learner.
var learnerType = roachpb.LEARNER

s/var/const/? Or are consts not addressable in the way we need?


pkg/storage/store.go, line 4063 at r2 (raw file):

) (_ *Replica, created bool, err error) {
	var (
		removeReplica = func(repl *Replica) error {

I guess if we're going to potentially delete the replica here I withdraw my comment in the other PR about not wanting a write here.


pkg/storage/store_snapshot.go, line 758 at r2 (raw file):

			// our currently initialized range is due to a preemptive snapshot and
			// we've now assigned that range a replica ID. Fundamentally at this
			// point we want to clear out the pre-emptive snapshot because applying

So do we clear out the snapshot in this case (where?), or do we just "want to"? What changes in 20.1?

@ajwerner ajwerner force-pushed the ajwerner/deal-with-upgrade-from-preemptive-snapshots branch from 0fb88ee to 611a624 Compare September 23, 2019 05:04
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @nvanbenschoten)


pkg/storage/raft.proto, line 47 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Is RaftHeartbeat really the only place we need this?

Yes. We send the ReplicaDescriptor in the other raft messages.


pkg/storage/store.go, line 3304 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/var/const/? Or are consts not addressable in the way we need?

consts are not addressable.


pkg/storage/store.go, line 4063 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I guess if we're going to potentially delete the replica here I withdraw my comment in the other PR about not wanting a write here.

We were potentially deleting the replica here in the other PR already.


pkg/storage/store_snapshot.go, line 758 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

So do we clear out the snapshot in this case (where?), or do we just "want to"? What changes in 20.1?

Good catch. I typed this comment to myself before typing the fix.

We now synchronously delete the preemptive snaphot on disk before accepting this learner snapshot. Updated.

@ajwerner ajwerner force-pushed the ajwerner/deal-with-upgrade-from-preemptive-snapshots branch 3 times, most recently from 1dbee4c to 63c7882 Compare September 23, 2019 13:25
@tbg tbg requested a review from bdarnell September 23, 2019 14:40
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 28 of 28 files at r1, 23 of 23 files at r3, 9 of 9 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, and @nvanbenschoten)


pkg/storage/client_raft_test.go, line 4878 at r2 (raw file):

			Exec("SET CLUSTER SETTING version = crdb_internal.node_executable_version();")
		require.NoError(t, err)
		// We need to ensure that all nodes have received the cluster setting change.

Comment seems redundant with the ones below.


pkg/storage/store_snapshot.go, line 755 at r4 (raw file):

			}
			// If the current range is initialized then we need to accept this
			// this snapshot. We also know that this initialized range must be

s/this//


pkg/storage/store_snapshot.go, line 757 at r4 (raw file):

			// this snapshot. We also know that this initialized range must be
			// initialized as this replica as we'll clear out any preemptive
			// snapshots when detecting a learner snapshot in getOrCreateReplica.

I don't understand this part.

@ajwerner ajwerner force-pushed the ajwerner/deal-with-upgrade-from-preemptive-snapshots branch from 63c7882 to 163e95b Compare September 23, 2019 16:36
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @nvanbenschoten, and @tbg)


pkg/storage/client_raft_test.go, line 4878 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Comment seems redundant with the ones below.

Removed.


pkg/storage/store_snapshot.go, line 755 at r4 (raw file):

Previously, tbg (Tobias Grieger) wrote…

s/this//

Done.


pkg/storage/store_snapshot.go, line 757 at r4 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I don't understand this part.

I guess it's not worth talking about here. Removed.

@ajwerner ajwerner force-pushed the ajwerner/deal-with-upgrade-from-preemptive-snapshots branch 8 times, most recently from 05b9487 to 390000d Compare September 24, 2019 15:46
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, @nvanbenschoten, and @tbg)


pkg/storage/client_raft_test.go, line 5101 at r6 (raw file):

// the preemptive snapshot otherwise the replica may try to catch up across
// commands which are not safe (namely merges).
// Before learner replicas we would not add a store to a range until we had

Based on our offline conversation, there was more to why this was safe. It sounds like the key was that:

  1. "the grab range desc, preemptive snapshot, cput range desc" dance ensured that all preemptive snapshots were serialized with merges
  2. always preemptively snapshotting before adding a replica prevented a replica from being added and using an existing preemptive snapshot.

Together, these two requirements meant that the entire process of "preemptively snapshotting and adding a replica" was always fully serialized with any range merges, which prevented us from ever catching up across a merge.

Is this understanding missing anything? Is any of that written or tested anywhere?

Learners allow us to violate this second invariant, so the whole thing collapses. For this fix to be correct, it must also ensure that the preemptive snapshot -> learner path always serializes with all merges. I think this relies on the fact that during the process of adding a replica, it will at some point be contacted by a Raft message that indicates that the replica is a learner (either explicitly through the to_is_learner or implicitly through the repl desc). Are we sure we're guaranteed to receive at least one of these messages before the Raft leader promotes the replica to a full voter?


pkg/storage/store.go, line 3384 at r6 (raw file):

		}
		if beat.ToIsLearner {
			beatReqs[i].ToReplica.Type = &learnerType

We can replace this with roachpb.ReplicaTypeLearner()


pkg/storage/store.go, line 4270 at r6 (raw file):

	}
	// replicaID must be non-zero if isLearner.
	if repl.mu.replicaID != 0 || !repl.isInitializedRLocked() || !isLearner {

Can repl.mu.replicaID == 0 && !repl.isInitializedRLocked()? If not, I'd remove the mention of initialization because it makes this harder to reason about.

@ajwerner ajwerner force-pushed the ajwerner/deal-with-upgrade-from-preemptive-snapshots branch from 390000d to bf84464 Compare September 24, 2019 21:21
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 9 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, @nvanbenschoten, and @tbg)


pkg/storage/client_raft_test.go, line 5101 at r6 (raw file):

Are we sure we're guaranteed to receive at least one of these messages before the Raft leader promotes the replica to a full voter?

Yes, the learner snapshot is one of these such messages. We will always attempt to send the learner snapshot before attempting to promote. We will only promote if we successfully apply the learner snapshot.


pkg/storage/store.go, line 3384 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We can replace this with roachpb.ReplicaTypeLearner()

That function is unfortunate in that it allocates a new value.


pkg/storage/store.go, line 4270 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can repl.mu.replicaID == 0 && !repl.isInitializedRLocked()? If not, I'd remove the mention of initialization because it makes this harder to reason about.

Yes, this can happen if the RHS was created due to a split or merge trigger. The merge trigger seems unlikely due to the merge protocol but if the RHS was just created due to a split you could be here. I'll add commentary.

@ajwerner ajwerner force-pushed the ajwerner/deal-with-upgrade-from-preemptive-snapshots branch 2 times, most recently from c1e450a to c4767ba Compare September 25, 2019 15:01
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, @nvanbenschoten, and @tbg)


pkg/storage/store.go, line 4270 at r6 (raw file):

Previously, ajwerner wrote…

Yes, this can happen if the RHS was created due to a split or merge trigger. The merge trigger seems unlikely due to the merge protocol but if the RHS was just created due to a split you could be here. I'll add commentary.

Okay upon further reflection, in general this can't happen except in one gnarly case. Usually the split will hold the raftMu until the split applies. Imagine however the following wild series of events:

  • LHS is going to split to create RHS
  • We're on a follower with the LHS prior to the split
  • Follower hears about RHS prior to processing split at replica ID 3
  • Follower gets removed from RHS and re-added as replica ID 4, writing a tombstone at replica ID 4
  • Follower crashes, forgetting about RHS
  • Split trigger runs and grabs the raftMu for the RHS, creating the replica at ID 0
  • Split trigger detects that the RHS is too new and removes the data for the RHS but leaves the RHS replica object in memory.
  • RHS receives a raft message as replica ID 4 (or it could be 5, doesn't really matter)

Here are two proposals for avoiding this case:

  1. we could delete the in-memory object for the RHS (but don't delete its hard state data) in the split trigger case or
  2. we could delete the in-memory object for the RHS and not delete its hard state right here.

Currently we don't have a mechanism to delete an uninitialized replica that doesn't remove its hard state. Perhaps we should preserve the hard state in Replica.destroyRaftMuLocked() if r.mu.replicaID == 0. Then we would feel safe removing the uninitialized replica in either of the above cases.

@ajwerner ajwerner force-pushed the ajwerner/deal-with-upgrade-from-preemptive-snapshots branch 2 times, most recently from 41a8f44 to 31b9e11 Compare September 25, 2019 15:42
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is RFAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, @nvanbenschoten, and @tbg)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 11 of 11 files at r7.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, and @tbg)


pkg/storage/client_raft_test.go, line 5101 at r6 (raw file):

Previously, ajwerner wrote…

Are we sure we're guaranteed to receive at least one of these messages before the Raft leader promotes the replica to a full voter?

Yes, the learner snapshot is one of these such messages. We will always attempt to send the learner snapshot before attempting to promote. We will only promote if we successfully apply the learner snapshot.

Ack.


pkg/storage/store.go, line 4270 at r6 (raw file):

Previously, ajwerner wrote…

Okay upon further reflection, in general this can't happen except in one gnarly case. Usually the split will hold the raftMu until the split applies. Imagine however the following wild series of events:

  • LHS is going to split to create RHS
  • We're on a follower with the LHS prior to the split
  • Follower hears about RHS prior to processing split at replica ID 3
  • Follower gets removed from RHS and re-added as replica ID 4, writing a tombstone at replica ID 4
  • Follower crashes, forgetting about RHS
  • Split trigger runs and grabs the raftMu for the RHS, creating the replica at ID 0
  • Split trigger detects that the RHS is too new and removes the data for the RHS but leaves the RHS replica object in memory.
  • RHS receives a raft message as replica ID 4 (or it could be 5, doesn't really matter)

Here are two proposals for avoiding this case:

  1. we could delete the in-memory object for the RHS (but don't delete its hard state data) in the split trigger case or
  2. we could delete the in-memory object for the RHS and not delete its hard state right here.

Currently we don't have a mechanism to delete an uninitialized replica that doesn't remove its hard state. Perhaps we should preserve the hard state in Replica.destroyRaftMuLocked() if r.mu.replicaID == 0. Then we would feel safe removing the uninitialized replica in either of the above cases.

So just to make sure I'm following, this case is caused by our handling of splits when the RHS runs into a tombstone? I think your first solution is the right one then. We should delete the in-memory object at the same point that we delete the RHS data. Mind opening an issue to track this? Or is fixing this part of the work to remove replica ID 0?


pkg/storage/store.go, line 2586 at r7 (raw file):

func (s *Store) removeReplicaRaftMuLocked(
	ctx context.Context, rep *Replica, nextReplicaID roachpb.ReplicaID, opts RemoveOptions,
) (err error) {

No need to name this now.


pkg/storage/store.go, line 4240 at r7 (raw file):

		DestroyData: true,
	}); err != nil {
		log.Fatal(ctx, 1, err)

What is the 1 here? Depth?


pkg/storage/store.go, line 4297 at r7 (raw file):

	// carries a replicaID of 0. This case requires a rare sequence of events
	// whereby the RHS of a split is known to be destroyed prior to the LHS
	// processing the split and at some before receiving this message the node

"at some before"

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 9 files at r6, 10 of 11 files at r7.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @nvanbenschoten, and @tbg)


pkg/storage/replica_raft.go, line 1583 at r7 (raw file):

) (func(), error) {
	rightRng, _, err := r.store.getOrCreateReplica(ctx, split.RightDesc.RangeID,
		0 /* replicaID */, nil /* creatingReplica */, false /* isLearner */)

If r is a learner, don't we need to create the new RHS replica as a learner too? (ditto for the merge lock)


pkg/storage/store.go, line 4270 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

So just to make sure I'm following, this case is caused by our handling of splits when the RHS runs into a tombstone? I think your first solution is the right one then. We should delete the in-memory object at the same point that we delete the RHS data. Mind opening an issue to track this? Or is fixing this part of the work to remove replica ID 0?

Why do split triggers create replicas at replica ID zero? Isn't the new RHS replica ID known by the split trigger?

If the replica was created with ID zero how does the split trigger detect that the RHS is "too new"?


pkg/storage/store.go, line 4103 at r7 (raw file):

	replicaID roachpb.ReplicaID,
	creatingReplica *roachpb.ReplicaDescriptor,
	isLearner bool,

It might be nice to create a type and a pair of constants here instead of a bare bool that requires a comment at each call site.


pkg/storage/store.go, line 4229 at r7 (raw file):

// tryGetOrCreateRemoveReplica is a helper to remove a replica underneath
// tryGetOrCreate. It assumes repl.mu and repl.raftMu are held.

It's important to comment that this method temporarily releases repl.mu. That's a dangerous pattern and we'll need to make sure that we re-check anything relating to repl.mu after calling this method. It might be better for this method to require that repl.mu is not held and have that lock management in the top-level method so that the break in the critical section is visible.


pkg/storage/store.go, line 4230 at r7 (raw file):

// tryGetOrCreateRemoveReplica is a helper to remove a replica underneath
// tryGetOrCreate. It assumes repl.mu and repl.raftMu are held.
func (s *Store) tryGetOrCreateRemoveReplica(

These names are weird with the conflicting verbs CreateRemove. I'd like to have an underscore or something separating the tryGetOrCreate "namespace" from the method-specific name, but I don't think the linters allow that. I'm not convinced that these methods add clarity in comparison to just inlining everything in the main method. Closures allocate, but I think we can avoid closures by explicitly releasing locks instead of defer, etc.

The ones that are good candidates for being separate functions should be nameable without tying them so closely to tryGetOrCreate. For example, I think I'd prefer a refactoring where tryGetOrCreateHandleToReplicaTooOld were replaced with isToReplicaToOld, and then the "handling" were inlined into the top level.


pkg/storage/store.go, line 4240 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What is the 1 here? Depth?

The varargs (non-f) forms of the logging methods seem like a trap - do we ever use those on purpose with more than one argument?


pkg/storage/store.go, line 4266 at r7 (raw file):

// Detect if the replicaID is newer than repl indicating that repl needs to be
// removed. If so, remove the replica and return errRetry.
// Assumes that repl.mu and repl.raftMu are both held.

Comments about the temporary release of repl.mu need to be propagated up the stack here too.


pkg/storage/store.go, line 4285 at r7 (raw file):

// preemptive snapshot and needs to be removed. If so remove the replica and
// return errRetry. Assumes that repl.mu and repl.raftMu are both held.
func (s *Store) tryGetOrCreateHandleNonVoterWithPreemptiveSnapshot(

It's weird that this has NonVoter in the name and takes an isLearner parameter. I'd expect that given the name, the caller is responsible for putting it in an if isLearner instead of passing that flag in.

@ajwerner ajwerner force-pushed the ajwerner/deal-with-upgrade-from-preemptive-snapshots branch 2 times, most recently from e900e35 to 0a7afd8 Compare September 25, 2019 20:52
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworked the decidedly odd helpers from tryGetOrCreate. PTAL

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @bdarnell, @nvanbenschoten, and @tbg)


pkg/storage/replica_raft.go, line 1583 at r7 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

If r is a learner, don't we need to create the new RHS replica as a learner too? (ditto for the merge lock)

It's a good point. Technically we don't use this value to do anything to the replica other than destroy it but it does seem more correct to set this to true if the current replica is a learner. This will of course be another case we'll have to think about in the code I was discussing with nathan below about how to have an uninitialized replica with a zero ID (which is fine, I was going to retain that code anyway).

On the other hand, we can't possibly have a preemptive snapshot on the RHS here because this replica on the LHS would be covering it.

Anyway, I've updated the code to always use the information if we have it.


pkg/storage/store.go, line 4270 at r6 (raw file):

Why do split triggers create replicas at replica ID zero?
The pre-existing code always passed zero. Theory might say that a preemptive snapshot could safely process a split. In practice that can't happen because we'd fail to do the CPut on the range descriptor in the old change replicas protocol and would send a new snapshot after the split. It also can't happen in the case where we sent a preemptive snapshot before the upgrade and then upgraded to a learner because of this PR.

Isn't the new RHS replica ID known by the split trigger?

Yes except the preemptive snapshot case which doesn't happen so yes. The code now provides the replica ID in both cases.

If the replica was created with ID zero how does the split trigger detect that the RHS is "too new"?

It detects that it is too new because of the tombstone but won't know the right replica ID.


pkg/storage/store.go, line 4103 at r7 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

It might be nice to create a type and a pair of constants here instead of a bare bool that requires a comment at each call site.

I could use roachpb.ReplicaType but in the case of heartbeats I'll have to pass VOTER_FULL when it might be VOTER_OUTGOING or VOTER_INCOMING.

Now that I've reworked the split and merge code it never needs the comment so I'm leaving as it is for now.


pkg/storage/store.go, line 4229 at r7 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

It's important to comment that this method temporarily releases repl.mu. That's a dangerous pattern and we'll need to make sure that we re-check anything relating to repl.mu after calling this method. It might be better for this method to require that repl.mu is not held and have that lock management in the top-level method so that the break in the critical section is visible.

okay, I'll rework.


pkg/storage/store.go, line 4230 at r7 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

These names are weird with the conflicting verbs CreateRemove. I'd like to have an underscore or something separating the tryGetOrCreate "namespace" from the method-specific name, but I don't think the linters allow that. I'm not convinced that these methods add clarity in comparison to just inlining everything in the main method. Closures allocate, but I think we can avoid closures by explicitly releasing locks instead of defer, etc.

The ones that are good candidates for being separate functions should be nameable without tying them so closely to tryGetOrCreate. For example, I think I'd prefer a refactoring where tryGetOrCreateHandleToReplicaTooOld were replaced with isToReplicaToOld, and then the "handling" were inlined into the top level.

done.


pkg/storage/store.go, line 4240 at r7 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

The varargs (non-f) forms of the logging methods seem like a trap - do we ever use those on purpose with more than one argument?

Done.


pkg/storage/store.go, line 4266 at r7 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Comments about the temporary release of repl.mu need to be propagated up the stack here too.

Reworked as suggested.


pkg/storage/store.go, line 4285 at r7 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

It's weird that this has NonVoter in the name and takes an isLearner parameter. I'd expect that given the name, the caller is responsible for putting it in an if isLearner instead of passing that flag in.

Reworked this code.


pkg/storage/store.go, line 4290 at r7 (raw file):

	repl.raftMu.AssertHeld()
	repl.mu.AssertHeld()
	// replicaID must be non-zero if isLearner.

This assertion is removed. It doesn't feel natural in the refactor. I can add it back if you want.


pkg/storage/store.go, line 4297 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"at some before"

This code is gone.

@ajwerner ajwerner force-pushed the ajwerner/deal-with-upgrade-from-preemptive-snapshots branch from 0a7afd8 to 90d8173 Compare September 25, 2019 21:39
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @bdarnell, @nvanbenschoten, and @tbg)


pkg/storage/replica_raft.go, line 1583 at r7 (raw file):

Previously, ajwerner wrote…

It's a good point. Technically we don't use this value to do anything to the replica other than destroy it but it does seem more correct to set this to true if the current replica is a learner. This will of course be another case we'll have to think about in the code I was discussing with nathan below about how to have an uninitialized replica with a zero ID (which is fine, I was going to retain that code anyway).

On the other hand, we can't possibly have a preemptive snapshot on the RHS here because this replica on the LHS would be covering it.

Anyway, I've updated the code to always use the information if we have it.

Alright actually this turns out that passing the real replica ID is problematic. I've added more commentary. See how it makes you feel.

@ajwerner ajwerner force-pushed the ajwerner/deal-with-upgrade-from-preemptive-snapshots branch from 90d8173 to 17511f9 Compare September 25, 2019 21:54
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @bdarnell, @nvanbenschoten, and @tbg)


pkg/storage/store.go, line 4289 at r8 (raw file):

}

// isPreemptiveSnapshot returns true if repl is an initialized replica which

I think I'll just inline this into the above function. Ben sorry for stepping on your reviewing toes.

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 11 files at r7, 1 of 2 files at r8.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @bdarnell, @nvanbenschoten, and @tbg)


pkg/storage/replica_raft.go, line 1583 at r7 (raw file):

Previously, ajwerner wrote…

Alright actually this turns out that passing the real replica ID is problematic. I've added more commentary. See how it makes you feel.

LGTM (but the same logic doesn't apply to acquireMergeLock?)


pkg/storage/store.go, line 4103 at r7 (raw file):

Previously, ajwerner wrote…

I could use roachpb.ReplicaType but in the case of heartbeats I'll have to pass VOTER_FULL when it might be VOTER_OUTGOING or VOTER_INCOMING.

Now that I've reworked the split and merge code it never needs the comment so I'm leaving as it is for now.

OK (I agree that with the new split and merge code the types would be counterproductive)


pkg/storage/store.go, line 4127 at r9 (raw file):

		toTooOld := toReplicaIsTooOld(repl, replicaID)
		isPreemptiveSnapshot := repl.mu.replicaID == 0 && repl.isInitializedRLocked()
		removePreemetiveSnapshot := isPreemptiveSnapshot && isLearner

s/Preemet/preempt/g

This is a good place for commentary about why we need to remove our preemptive snapshot when becoming a learner.


pkg/storage/store.go, line 4134 at r9 (raw file):

				log.Infof(ctx, "found message for replica ID %d which is newer than %v",
					replicaID, repl)
			} else if shouldLog /* && removePreemptiveSnapshot */ {

Remove or uncomment?

…arner

This commit adds an annotation to raft request messages to
indicate that the sender believes the current replica is a
learner. If the current replica on the recipient was created as a
preemptive snapshot (it's initialized but not in the range) then
we should remove that replica immediately.

Release Justification: Release blocker, required for backwards compatibility.

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/deal-with-upgrade-from-preemptive-snapshots branch from 17511f9 to 99a031d Compare September 25, 2019 22:10
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @bdarnell, @nvanbenschoten, and @tbg)


pkg/storage/replica_raft.go, line 1583 at r7 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

LGTM (but the same logic doesn't apply to acquireMergeLock?)

Correct, it is an invariant that when a merge lock is acquired the right replica both exist and be a member of the range. I updated the comment slightly and referenced it in tryGetOrCreateReplica.


pkg/storage/store.go, line 4127 at r9 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/Preemet/preempt/g

This is a good place for commentary about why we need to remove our preemptive snapshot when becoming a learner.

Done.


pkg/storage/store.go, line 4134 at r9 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Remove or uncomment?

Sure, it's implied to be true but I liked having it to clarify to the reader that it's why we're in this branch. I uncommented.

Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @bdarnell, @nvanbenschoten, and @tbg)

craig bot pushed a commit that referenced this pull request Sep 25, 2019
40907: storage: clean up preemptive snapshot when receiving replica ID as learner r=ajwerner a=ajwerner

This commit adds an annotation to raft request messages to
indicate that the sender believes the current replica is a
learner. If the current replica on the recipient was created as a
preemptive snapshot (it's initialized but not in the range) then
we should remove that replica immediately.

The first commit is #40892.

Release Justification: Release blocker, required for backwards compatibility.

Release note: None

Co-authored-by: Andrew Werner <[email protected]>
@craig
Copy link
Contributor

craig bot commented Sep 25, 2019

Build succeeded

@craig craig bot merged commit 99a031d into cockroachdb:master Sep 25, 2019
@ajwerner ajwerner deleted the ajwerner/deal-with-upgrade-from-preemptive-snapshots branch March 1, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants