-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: allow healing ranges whose replicas are all lost #56333
server: allow healing ranges whose replicas are all lost #56333
Conversation
fef63fe
to
0a9a71a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @tbg)
a discussion (no related file):
Should update RangeLog (best effort) as well. RangeLog is updated when a range is updated (upreplicated, etc.), it's a system of record of operations / audit log.
Should populate regular LogEntries as well.
pkg/kv/kvserver/store_snapshot.go, line 922 at r1 (raw file):
eng := storage.NewInMem(ctx, roachpb.Attributes{}, 1<<20 /* cacheSize */) defer eng.Close()
Create engine to use as buffer for empty snapshot
pkg/kv/kvserver/store_snapshot.go, line 927 at r1 (raw file):
var ms enginepb.MVCCStats if err := storage.MVCCPutProto( ctx, eng, &ms, keys.RangeDescriptorKey(desc.StartKey), now, nil /* txn */, &desc,
Seeding empty range
pkg/kv/kvserver/store_snapshot.go, line 931 at r1 (raw file):
return err } ms, err := stateloader.WriteInitialReplicaState(
Seeding empty range
pkg/kv/kvserver/store_snapshot.go, line 944 at r1 (raw file):
} sl := stateloader.Make(desc.RangeID)
Use stateloader to load state out of memory from the engine earlier.
pkg/kv/kvserver/store_snapshot.go, line 963 at r1 (raw file):
snapUUID, sl, SnapshotRequest_RAFT,
May want a separate SnapshotRequest type for recovery that always goes through; bypass all throttling so they cannot be declined. We don't want our operation to be held up behind a long running snapshot. We want this to go through quickly.
pkg/kv/kvserver/store_snapshot.go, line 966 at r1 (raw file):
eng, desc.RangeID, raftentry.NewCache(1),
Won't actually be used.
pkg/kv/kvserver/store_snapshot.go, line 967 at r1 (raw file):
desc.RangeID, raftentry.NewCache(1), func(func(SideloadStorage) error) error { return nil },
Used for sstables, but not needed here as there are no logs.
pkg/kv/kvserver/store_snapshot.go, line 975 at r1 (raw file):
} from := to
Same because we have to send this from somewhere, makes it look like we are sending the snapshot from ourself. We have to send it from a member of the RangeDescriptor, otherwise it would be malformed if it came from a non-member.
pkg/roachpb/api.proto, line 2201 at r1 (raw file):
} message UnsafeHealRangeRequest {
TODO: Add comment
Pass in updated RangeDescriptor (instead of RangeID only) because this is a lower level primitive; not expecting the user to call this directly.
Caller updates meta2 and then calls this RPC to update the snapshot
Two cases:
A. All replicas are lost
B. One or more replicas remaining, resuscitate range
If this request only took a range ID, we would have to look up the range descriptor. If recovering from an existing replica, we want to know where we are recovering from.
Higher level abstractions could hit meta2 or use a survivor for the range descriptor.
--
node_id and store_id are the designated survivor or designated place we are recreating the replica
--
mention somewhere why we are covering both use cases in one RPC
--
Also, work with @tbg to explain how this can be safely used and how the orchestrator around this will work. In particular, restarting with a missing node vs a missing store.
pkg/roachpb/api.proto, line 2256 at r1 (raw file):
rpc RangeFeed (RangeFeedRequest) returns (stream RangeFeedEvent) {} rpc GossipSubscription (GossipSubscriptionRequest) returns (stream GossipSubscriptionEvent) {} rpc UnsafeHealRange (UnsafeHealRangeRequest) returns (UnsafeHealRangeResponse) {}
Does not make senes to return new RangeDescriptor because higher level controller computes it so it should be passed in
pkg/server/node.go, line 1088 at r1 (raw file):
Previously, TheSamHuang (Sam Huang) wrote…
@tbg We discussed how we wanted the higher level abstraction to include the RangeDescriptor in the RPC call, rather than just the RangeID. However, I think we still have to scan through all ranges to find the
expValue
as we need it forCPut
. Do you think it makes sense to have the higher level abstraction also pass inexpValue
?
CPut would live at caller, if necessary.
Low level RPC should just send snapshot. Want it to be as specific to the range as possible (nothing distributed). Other logic should be done somewhere that has more insight into state of the cluster.
(Add this comment to protobuf definition)
pkg/server/node.go, line 1109 at r1 (raw file):
rpc.DefaultClass
Use rpc.SystemClass to avoid throttling (add code comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 6 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @tbg, and @TheSamHuang)
pkg/kv/kvserver/raft_transport.go, line 661 at r1 (raw file):
sent func(), ) error {
nit: remove this empty line
pkg/kv/kvserver/raft_transport.go, line 667 at r1 (raw file):
} }() return sendSnapshot(
nit: maybe add one here
pkg/kv/kvserver/store_snapshot.go, line 914 at r1 (raw file):
// Formerly HackSendSnapshot func SendEmptySnapshot(
request to add a descriptive comment above the function
also comments throughout the body to explain the main phases of the control flow
pkg/server/node.go, line 1058 at r1 (raw file):
} func (n *Node) UnsafeHealRange(
Add a comment on top of this function.
Also suggestion to add comments throughout the function body, to indicate the general "phases" of the control flow.
pkg/server/node.go, line 1100 at r1 (raw file):
for _, rd := range dead { desc.RemoveReplica(rd.NodeID, rd.StoreID)
nit: avoid stray newlines
0a9a71a
to
19ef42f
Compare
@tbg @knz Added first draft of a unit test for UnsafeHealRange.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 when you get to addressing the comments you posted from our group review, can you make sure to mark the ones as done that should be addressed? That helps your reviewers know what to expect (right now I'm expecting that none of them were addressed, just, as you address them, do check them off). Thanks!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, @tbg, and @TheSamHuang)
pkg/kv/kvserver/recovery_test.go, line 21 at r3 (raw file):
) func TestRecoverRangeWithNoReplicas(t *testing.T) {
Add a comment to this test explaining what it does: set up two node cluster, isolate a range to n2, stop n2, check that the range is unavailable, use UnsafeHealRange to resuscitate the range, check that it is available again.
pkg/kv/kvserver/recovery_test.go, line 32 at r3 (raw file):
k := tc.ScratchRange(t) desc, err := tc.AddReplicas(k, tc.Target(1))
You need to update desc
to refer to the post-change descriptor as we discussed.
pkg/kv/kvserver/recovery_test.go, line 36 at r3 (raw file):
require.NoError(t, tc.TransferRangeLease(desc, tc.Target(1))) svr := tc.Server(0)
nit: we usually abbreviate srv
pkg/kv/kvserver/recovery_test.go, line 42 at r3 (raw file):
// Sanity check that requests to the ScratchRange time out. cCtx, cancel := context.WithTimeout(ctx, 10000*time.Millisecond)
50*time.Millisecnod
pkg/kv/kvserver/recovery_test.go, line 47 at r3 (raw file):
require.Equal(t, context.DeadlineExceeded, cCtx.Err()) // TODO: Figure out: is this necessary?
No, this is boilerplate for checking that UnsafeHealRange does a CPut, but it's intentionally no longer doing so
pkg/kv/kvserver/recovery_test.go, line 80 at r3 (raw file):
return nil }); err != nil { // TODO: update this
you can require.NoError(t, srv.....)
pkg/kv/kvserver/recovery_test.go, line 85 at r3 (raw file):
storeID, nodeID := store.Ident.StoreID, store.Ident.NodeID // TODO: Figure out: is this necessary?
Yes. UnsafeHealRangeRequest won't update the meta ranges, but that's what's used for routing requests, so you need to play orchestrator here and write them manually. I think we will get the existing code for updating the meta records in better shape so that we can use it here and then later in the orchestrator. It's basically this code:
cockroach/pkg/kv/kvserver/addressing.go
Lines 53 to 57 in f80ec9c
// updateRangeAddressing overwrites the meta1 and meta2 range addressing | |
// records for the descriptor. | |
func updateRangeAddressing(b *kv.Batch, desc *roachpb.RangeDescriptor) error { | |
return rangeAddressing(b, desc, putMeta) | |
} |
You can call that here directly, see if that works. (
srv
has a DB()
which can make a txn, and a txn can run a batch).
pkg/kv/kvserver/recovery_test.go, line 97 at r3 (raw file):
) require.NoError(t, err) log.Info(ctx, "snapshot applied")
This doesn't help much, just remove it
pkg/kv/kvserver/recovery_test.go, line 99 at r3 (raw file):
log.Info(ctx, "snapshot applied") require.Error(t, svr.DB().Put(cCtx, k, "baz"))
But wait, we want there to be no error (and even if we expected an error, we would want to check the precise error and not just "any")
You're also using the timeouted ctx here, so your error is probably context cancelled
90bb033
to
1a714ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely! Thank you for the comments, Tobi and Rafael!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, @tbg, and @TheSamHuang)
pkg/kv/kvserver/recovery_test.go, line 21 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Add a comment to this test explaining what it does: set up two node cluster, isolate a range to n2, stop n2, check that the range is unavailable, use UnsafeHealRange to resuscitate the range, check that it is available again.
Done.
pkg/kv/kvserver/recovery_test.go, line 42 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
50*time.Millisecnod
Done.
pkg/kv/kvserver/recovery_test.go, line 47 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
No, this is boilerplate for checking that UnsafeHealRange does a CPut, but it's intentionally no longer doing so
Done.
pkg/kv/kvserver/recovery_test.go, line 80 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
you can
require.NoError(t, srv.....)
Done.
pkg/kv/kvserver/recovery_test.go, line 97 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
This doesn't help much, just remove it
Done.
pkg/kv/kvserver/recovery_test.go, line 99 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
But wait, we want there to be no error (and even if we expected an error, we would want to check the precise error and not just "any")
You're also using the timeouted ctx here, so your error is probably context cancelled
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, @tbg, and @TheSamHuang)
pkg/kv/kvserver/recovery_test.go, line 85 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Yes. UnsafeHealRangeRequest won't update the meta ranges, but that's what's used for routing requests, so you need to play orchestrator here and write them manually. I think we will get the existing code for updating the meta records in better shape so that we can use it here and then later in the orchestrator. It's basically this code:
cockroach/pkg/kv/kvserver/addressing.go
Lines 53 to 57 in f80ec9c
// updateRangeAddressing overwrites the meta1 and meta2 range addressing // records for the descriptor. func updateRangeAddressing(b *kv.Batch, desc *roachpb.RangeDescriptor) error { return rangeAddressing(b, desc, putMeta) }
You can call that here directly, see if that works. (srv
has aDB()
which can make a txn, and a txn can run a batch).
Got it, I added this in.
ok github.com/cockroachdb/cockroach/pkg/kv/kvserver 8.355s
However, since this test is in the kvserver_test
package (putting it in kvserver
leads to an import cycle), updateRangeAddressing
is inaccessible because it is not exported. What are your thoughts of exporting updateRangeAddressing
? I added a temporary workaround for the time being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, @tbg, and @TheSamHuang)
pkg/kv/kvserver/recovery_test.go, line 32 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
You need to update
desc
to refer to the post-change descriptor as we discussed.
Done. desc
is updated below.
336e58e
to
1c0d9f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, @tbg, and @TheSamHuang)
pkg/kv/kvserver/store_snapshot.go, line 914 at r1 (raw file):
Previously, knz (kena) wrote…
request to add a descriptive comment above the function
also comments throughout the body to explain the main phases of the control flow
Added some comments, please let me know what you think.
pkg/server/node.go, line 1058 at r1 (raw file):
Previously, knz (kena) wrote…
Add a comment on top of this function.
Also suggestion to add comments throughout the function body, to indicate the general "phases" of the control flow.
Added some comments, please let me know what you think.
23ddc43
to
b43412b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. We need to address some of the TODOs but it's getting close to merging.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, @tbg, and @TheSamHuang)
pkg/kv/kvserver/addressing.go, line 60 at r6 (raw file):
// TODO (thesamhuang): Remove this temporary export func UpdateRangeAddressing(b *kv.Batch, desc *roachpb.RangeDescriptor) error {
You can put this into helpers_test.go
, that way it's exported only during tests.
pkg/kv/kvserver/recovery_test.go, line 10 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Remove this empty line, it'll run into lint failures.
crlfmt maybe not properly set up, instructions at https://cockroachlabs.slack.com/archives/C4VM697PC/p1540908498001700
pkg/kv/kvserver/recovery_test.go, line 112 at r9 (raw file):
//} func TestResetQuorum(t *testing.T) {
Write a top-level comment like the other test had.
pkg/kv/kvserver/recovery_test.go, line 171 at r9 (raw file):
_, err = srv.Node().(*server.Node).ResetQuorum( ctx, &roachpb.ResetQuorumRequest{
Let's also rename this file to reset_quorum_test.go
.
pkg/kv/kvserver/recovery_test.go, line 178 at r9 (raw file):
require.NoError(t, err) t.Logf("resetting quorum complete")
Could you look into adding few bells and whistles added to this test:
- if there's an existing replica and we send the recovery snapshot to that replica, we still get the quorum reset and end up with an empty range.
- if there's an existing replica and we send the recovery snapshot to another node that doesn't have that replica, we also still get the reset, and replicaGC removes the original survivor when triggered.
I think the best way to write these is to factor out a helper from your existing test which takes as inputs the node count, and the nodes that have a replica. That helper basically does everything up to (and excluding) the StopServer
call in your test here.
Then the current test becomes:
tc := helper(t, 2 /* nodes */, 3 /* only n2 has a replica */ )
- becomes
helper(t, 2, 1, 2)
, i.e. two nodes and they both have a replica, and then you'll stop n2 and send to n1 - becomes
helper(t, 3, 1, 2)
, i.e. three nodes and n1 and n2 with a replica, you stop n2 and send to n3. Then check that n1's replica goes away after a short time (you can trigger it by getting to the*Store
and callingMustForceReplicaGCScanAndProcess
; after thatstore.LookupReplica
should come back empty.
Btw, I'd suggest doing that in a follow-up PR. I'd like to see this one merged soon. But I'd like to see it done, as it gives us good confidence in that the resulting cli command will work in a variety of useful situations.
pkg/kv/kvserver/store_snapshot.go, line 963 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
nit:
TODO(thesamhuang):
make this (tbg).
pkg/kv/kvserver/store_snapshot.go, line 971 at r9 (raw file):
eng, desc.RangeID, raftentry.NewCache(1), // Cache is not used.
nit // cache is not used
pkg/kv/kvserver/store_snapshot.go, line 972 at r9 (raw file):
desc.RangeID, raftentry.NewCache(1), // Cache is not used. func(func(SideloadStorage) error) error { return nil }, // This is used for sstables, not needed here as there are no logs.
ditto, inline comments are lower case w/o punctuation (kind of wish this wasn't a rule, but it is)
pkg/server/node.go, line 1040 at r9 (raw file):
// Update range descriptor and update meta ranges for the descriptor. // Remove all (dead) replicas.
We're removing all, not all dead ones.
pkg/server/node.go, line 1045 at r9 (raw file):
desc.RemoveReplica(rd.NodeID, rd.StoreID) } // Add current node as new replica.
this is still using req.StoreID
, it should pick a store from the local node.
pkg/server/node.go, line 1049 at r9 (raw file):
// We should increment the generation so that the various // caches will recognize this descriptor as newer. This // may not be necessary, but it can't hurt.
Remove this last sentence - it's necessary now. It wasn't that necessary in the test but in general there are invariants around the descriptor that we need to uphold here, and one of them is that the generation goes up with any replication change, split, or merge bumps the generation. Here, we are basically forcing a replication change.
pkg/server/node.go, line 1057 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Yup, I think it's necessary for safety.
+1
Note that you have to hold on to the original bytes of the descriptor (kvs[i].Value
) for that.
pkg/server/node.go, line 1079 at r9 (raw file):
func (n *Node) unsafeHealRange( ctx context.Context, nodeID roachpb.NodeID, storeID roachpb.StoreID, desc roachpb.RangeDescriptor, to roachpb.ReplicaDescriptor, ) error {
Honestly what's here isn't even worth it as a separate method, just inline this at the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, @tbg, and @TheSamHuang)
pkg/kv/kvserver/addressing.go, line 60 at r6 (raw file):
Previously, tbg (Tobias Grieger) wrote…
You can put this into
helpers_test.go
, that way it's exported only during tests.
I didn't add this, would you still like me to move it? The function I had previously added (and removed since it was no longer necessary) was UpdateRangeAddressing
which just exported updateRangeAddressing
.
pkg/kv/kvserver/raft_transport.go, line 651 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
This is a somewhat hollowed out helper; all it does is forward its args to
sendSnapshot
. We could simply inline the definition at the caller.
Done.
pkg/kv/kvserver/recovery_test.go, line 10 at r9 (raw file):
Previously, tbg (Tobias Grieger) wrote…
crlfmt maybe not properly set up, instructions at https://cockroachlabs.slack.com/archives/C4VM697PC/p1540908498001700
Done.
pkg/kv/kvserver/recovery_test.go, line 19 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Remember to delete this commented out test and repurpose the top-level comment for below.
Done.
pkg/kv/kvserver/recovery_test.go, line 112 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Nice test!
Thanks!
pkg/kv/kvserver/recovery_test.go, line 112 at r9 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Write a top-level comment like the other test had.
Done.
pkg/kv/kvserver/recovery_test.go, line 124 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
[nit] Define
n1, n2 := 0, 1
and use those throughout for readability.
Done.
pkg/kv/kvserver/recovery_test.go, line 171 at r9 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Let's also rename this file to
reset_quorum_test.go
.
Done.
pkg/kv/kvserver/store_snapshot.go, line 908 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
We could just avoid mentioning the helper and talk in general terms what this function is responsible for.
Done.
pkg/kv/kvserver/store_snapshot.go, line 972 at r9 (raw file):
Previously, tbg (Tobias Grieger) wrote…
ditto, inline comments are lower case w/o punctuation (kind of wish this wasn't a rule, but it is)
Done.
pkg/kv/kvserver/store_snapshot.go, line 1026 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Add a comment here.
Done. Please let me know if this isn't correct/appropriate.
pkg/roachpb/api.proto, line 2093 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Rename all references to UnsafeHealRangeRequest.
Done.
pkg/roachpb/api.proto, line 2094 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
This comment needs to be updated. This RPC is now the entry point called by the user directly, and internally it does the heavy lifting around updating meta2 and retrieving store IDs, etc. I think blocks of this comment will find a better home elsewhere.
Done.
pkg/roachpb/api.proto, line 2112 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Remove this TODO for now, it's already well fleshed out.
Done.
pkg/server/node.go, line 1000 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
// ResetQuorum implements the roachpb.InternalServer interface.
Done.
pkg/server/node.go, line 1032 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Yup, good idea to have these. We could write super tiny tests for these, invoking this RPC directly and seeing that the right errors are returned.
I added the following but don't believe it's correct:
// Check range has actually lost quorum.
livenessMap := n.storeCfg.NodeLiveness.GetIsLiveMap()
available := desc.Replicas().CanMakeProgress(func(rDesc roachpb.ReplicaDescriptor) bool {
return livenessMap[rDesc.NodeID].IsLive
})
if available {
return nil, errors.Errorf("targeted range to recover has not lost quorum.")
}
When running the test case, available
is true
. Will be working on debugging this now, so please let me know if you have any pointers or suggestions. Thanks!
pkg/server/node.go, line 1039 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
// Update the range descriptor and update meta ranges for the descriptor, removing all (dead) replicas.
Done.
pkg/server/node.go, line 1040 at r9 (raw file):
Previously, tbg (Tobias Grieger) wrote…
We're removing all, not all dead ones.
Done.
pkg/server/node.go, line 1049 at r9 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Remove this last sentence - it's necessary now. It wasn't that necessary in the test but in general there are invariants around the descriptor that we need to uphold here, and one of them is that the generation goes up with any replication change, split, or merge bumps the generation. Here, we are basically forcing a replication change.
Done.
pkg/server/node.go, line 1057 at r9 (raw file):
Previously, tbg (Tobias Grieger) wrote…
+1
Note that you have to hold on to the original bytes of the descriptor (kvs[i].Value
) for that.
Changed to CPut but running into an unexpected value
error now. Will be trying to work this out.
pkg/server/node.go, line 1063 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Let's rename this lower level primitive as well, it's not "unsafe" anymore, is it? 😛
Done.
pkg/server/node.go, line 1070 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
There's no need to list out the args. Comments like these get stale quickly as things change, and it's repeating in no more succinct terms what's already visible below.
Done.
pkg/server/node.go, line 1071 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
This block is no longer applicable.
Done.
pkg/server/node.go, line 1076 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Talk instead about how we then rely on crdb internal upreplication/rebalancing mechanisms to create further replicas from this fresh snapshot.
Done.
pkg/server/node.go, line 1079 at r9 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Honestly what's here isn't even worth it as a separate method, just inline this at the caller.
Done.
e279e9b
to
30cf688
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, @tbg, and @TheSamHuang)
pkg/server/node.go, line 1032 at r9 (raw file):
Previously, TheSamHuang (Sam Huang) wrote…
I added the following but don't believe it's correct:
// Check range has actually lost quorum. livenessMap := n.storeCfg.NodeLiveness.GetIsLiveMap() available := desc.Replicas().CanMakeProgress(func(rDesc roachpb.ReplicaDescriptor) bool { return livenessMap[rDesc.NodeID].IsLive }) if available { return nil, errors.Errorf("targeted range to recover has not lost quorum.") }
When running the test case,
available
istrue
. Will be working on debugging this now, so please let me know if you have any pointers or suggestions. Thanks!
Decreasing LivenessDuration
in the TestClusterArgs and waiting for n2 liveliness to expire allows the test case to pass.
pkg/server/node.go, line 1057 at r9 (raw file):
Previously, TheSamHuang (Sam Huang) wrote…
Changed to CPut but running into an
unexpected value
error now. Will be trying to work this out.
Using expValue.TagAndDataBytes()
instead seems to have solved the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still have to remove the inclusion of StoreID from the ResetQuorum request type. Tobi's prototyped it here, let's pick it up.
Should update RangeLog (best effort) as well. RangeLog is updated when a range is updated (upreplicated, etc.), it's a system of record of operations / audit log. Should populate regular LogEntries as well.
We're not planning to do these in this PR. Let's instead leave a TODO (addressed to myself or Tobi) so we don't forget.
Reviewed 2 of 4 files at r8, 1 of 6 files at r9, 6 of 7 files at r10, 2 of 2 files at r11.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz, @tbg, and @TheSamHuang)
pkg/kv/kvserver/reset_quorum_test.go, line 33 at r11 (raw file):
Knobs: base.TestingKnobs{ NodeLiveness: kvserver.NodeLivenessTestingKnobs{ LivenessDuration: 3000 * time.Millisecond, // set duration so n2 liveness expires shortly after stopping.
[nit + arbitrary, questionable convention alert] we don't want a period after an inline comment. Could also simply elevate the comment to a proper one.
pkg/kv/kvserver/reset_quorum_test.go, line 65 at r11 (raw file):
tc.StopServer(n2) // Wait for n2 liveness to expire. time.Sleep(3001 * time.Millisecond)
We can simply use the same constant (and let's define it explicitly).
pkg/kv/kvserver/store_snapshot.go, line 1026 at r9 (raw file):
Previously, TheSamHuang (Sam Huang) wrote…
Done. Please let me know if this isn't correct/appropriate.
// noopStorePool is a hollowed out StorePool that does not throttle. It's used in recovery scenarios
pkg/kv/kvserver/store_snapshot.go, line 918 at r11 (raw file):
) error { // Create an engine to use as a buffer for the empty snapshot. eng := storage.NewInMem(ctx, roachpb.Attributes{}, 1<<20 /* cacheSize */)
eng := storage.NewDefaultInMem()
pkg/server/node.go, line 1039 at r9 (raw file):
Previously, TheSamHuang (Sam Huang) wrote…
Done.
We want a space after the "//"
pkg/server/node.go, line 1035 at r10 (raw file):
} // Check range has actually lost quorum.
"Check that we've actually lost quorum."
"Check that we're not a metaX range"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Store ID still needs to be removed and the test updated to that effect, but pretty much there other than that!
Dismissed @knz from 2 discussions.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @TheSamHuang)
pkg/kv/kvserver/addressing.go, line 60 at r6 (raw file):
Previously, TheSamHuang (Sam Huang) wrote…
I didn't add this, would you still like me to move it? The function I had previously added (and removed since it was no longer necessary) was
UpdateRangeAddressing
which just exportedupdateRangeAddressing
.
No, this is fine
pkg/kv/kvserver/recovery_test.go, line 178 at r9 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Could you look into adding few bells and whistles added to this test:
- if there's an existing replica and we send the recovery snapshot to that replica, we still get the quorum reset and end up with an empty range.
- if there's an existing replica and we send the recovery snapshot to another node that doesn't have that replica, we also still get the reset, and replicaGC removes the original survivor when triggered.
I think the best way to write these is to factor out a helper from your existing test which takes as inputs the node count, and the nodes that have a replica. That helper basically does everything up to (and excluding) the
StopServer
call in your test here.Then the current test becomes:
tc := helper(t, 2 /* nodes */, 3 /* only n2 has a replica */ )
- becomes
helper(t, 2, 1, 2)
, i.e. two nodes and they both have a replica, and then you'll stop n2 and send to n1- becomes
helper(t, 3, 1, 2)
, i.e. three nodes and n1 and n2 with a replica, you stop n2 and send to n3. Then check that n1's replica goes away after a short time (you can trigger it by getting to the*Store
and callingMustForceReplicaGCScanAndProcess
; after thatstore.LookupReplica
should come back empty.Btw, I'd suggest doing that in a follow-up PR. I'd like to see this one merged soon. But I'd like to see it done, as it gives us good confidence in that the resulting cli command will work in a variety of useful situations.
You can close this comment if you add TODOs to your test.
pkg/kv/kvserver/reset_quorum_test.go, line 20 at r11 (raw file):
// TestResetQuorum tests the ResetQuorum method to restore quorum for a // range whose replicas are all lost. It starts a cluster with two nodes, // n1 and n2 with a range isolated to n2. Then, it stops n2 and checks
n1 and n2,
pkg/kv/kvserver/reset_quorum_test.go, line 21 at r11 (raw file):
// range whose replicas are all lost. It starts a cluster with two nodes, // n1 and n2 with a range isolated to n2. Then, it stops n2 and checks // the range is unavailable. Next, it updates the meta2 range descriptor
checks that the range
pkg/kv/kvserver/reset_quorum_test.go, line 22 at r11 (raw file):
// n1 and n2 with a range isolated to n2. Then, it stops n2 and checks // the range is unavailable. Next, it updates the meta2 range descriptor // entry and uses ResetQuorum to make the range accessible again.
It doesn't update meta2 any more, this is now done in ResetQuorum. But let's make this test read meta2 and check the result.
pkg/kv/kvserver/reset_quorum_test.go, line 57 at r11 (raw file):
metaKey := keys.RangeMetaKey(desc.EndKey).AsRawKey() // Read the meta2 which we are about to update below while n2
which ResetQuorum will be updating
pkg/kv/kvserver/reset_quorum_test.go, line 65 at r11 (raw file):
Previously, irfansharif (irfan sharif) wrote…
We can simply use the same constant (and let's define it explicitly).
Can you avoid messing with the liveness duration + sleeps if you call, on n1, IncrementEpoch
pkg/kv/kvserver/reset_quorum_test.go, line 97 at r11 (raw file):
&roachpb.ResetQuorumRequest{ RangeID: int32(desc.RangeID), StoreID: int32(store.StoreID()),
This shouldn't be here any more.
pkg/roachpb/api.proto, line 2095 at r11 (raw file):
// ResetQuorumRequest makes a range that is unavailable due to lost quorum // available again. ResetQuorumRequest first uses meta2 to identify the
available again, at the cost of losing all of the data in the range. Any existing replica, even one residing on the target node, will irrevocably be removed.
pkg/roachpb/api.proto, line 2097 at r11 (raw file):
// available again. ResetQuorumRequest first uses meta2 to identify the // range descriptor. Then, it removes all replicas from the range descriptor // and adds the current node as the one designated survivor replica. This
a store from the target node
pkg/roachpb/api.proto, line 2098 at r11 (raw file):
// range descriptor. Then, it removes all replicas from the range descriptor // and adds the current node as the one designated survivor replica. This // change is then written to meta2 and sent as a snapshot to the current
to a store local to the target node.
pkg/server/node.go, line 1006 at r11 (raw file):
defer func() { if r := recover(); r != nil { rErr = errors.Errorf("%v", r)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @tbg, and @TheSamHuang)
pkg/kv/kvserver/reset_quorum_test.go, line 20 at r11 (raw file):
Previously, tbg (Tobias Grieger) wrote…
n1 and n2,
Done.
pkg/kv/kvserver/reset_quorum_test.go, line 21 at r11 (raw file):
Previously, tbg (Tobias Grieger) wrote…
checks that the range
Done.
pkg/kv/kvserver/reset_quorum_test.go, line 33 at r11 (raw file):
Previously, irfansharif (irfan sharif) wrote…
[nit + arbitrary, questionable convention alert] we don't want a period after an inline comment. Could also simply elevate the comment to a proper one.
Done.
pkg/kv/kvserver/reset_quorum_test.go, line 57 at r11 (raw file):
Previously, tbg (Tobias Grieger) wrote…
which ResetQuorum will be updating
Done.
pkg/kv/kvserver/reset_quorum_test.go, line 97 at r11 (raw file):
Previously, tbg (Tobias Grieger) wrote…
This shouldn't be here any more.
Done.
pkg/kv/kvserver/store_snapshot.go, line 1026 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
// noopStorePool is a hollowed out StorePool that does not throttle. It's used in recovery scenarios
Done.
pkg/kv/kvserver/store_snapshot.go, line 918 at r11 (raw file):
Previously, irfansharif (irfan sharif) wrote…
eng := storage.NewDefaultInMem()
Done.
pkg/roachpb/api.proto, line 2117 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Reminder to remove including a
store_id
parameter here.
Done.
pkg/roachpb/api.proto, line 2095 at r11 (raw file):
Previously, tbg (Tobias Grieger) wrote…
available again, at the cost of losing all of the data in the range. Any existing replica, even one residing on the target node, will irrevocably be removed.
Done.
pkg/roachpb/api.proto, line 2097 at r11 (raw file):
Previously, tbg (Tobias Grieger) wrote…
a store from the target node
Done.
pkg/roachpb/api.proto, line 2098 at r11 (raw file):
Previously, tbg (Tobias Grieger) wrote…
to a store local to the target node.
Done.
pkg/server/node.go, line 1039 at r9 (raw file):
Previously, irfansharif (irfan sharif) wrote…
We want a space after the "//"
Done.
pkg/server/node.go, line 1045 at r9 (raw file):
Previously, tbg (Tobias Grieger) wrote…
this is still using
req.StoreID
, it should pick a store from the local node.
Done.
pkg/server/node.go, line 1035 at r10 (raw file):
Previously, irfansharif (irfan sharif) wrote…
"Check that we've actually lost quorum."
"Check that we're not a metaX range"
Done.
30cf688
to
fe62de7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mod removing some debugging code you've added and adding the testing TODO. Thanks for taking it over the finish line, pretty happy with where it all ended up!
Reviewed 5 of 5 files at r12.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @tbg and @TheSamHuang)
pkg/kv/kvserver/reset_quorum_test.go, line 65 at r11 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Can you avoid messing with the liveness duration + sleeps if you call, on n1,
IncrementEpoch
I don't think so, the IncrementEpoch operation also checks to see that the target node is non-live:
cockroach/pkg/kv/kvserver/liveness/liveness.go
Lines 1133 to 1135 in 54396df
if liveness.IsLive(nl.clock.Now().GoTime()) { | |
return errors.Errorf("cannot increment epoch on live node: %+v", liveness) | |
} |
pkg/server/node.go, line 1006 at r11 (raw file):
Previously, tbg (Tobias Grieger) wrote…
?
Let's remove this debugging code.
501da0c
to
e9f915e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif and @tbg)
pkg/kv/kvserver/recovery_test.go, line 178 at r9 (raw file):
Previously, tbg (Tobias Grieger) wrote…
You can close this comment if you add TODOs to your test.
Done.
pkg/kv/kvserver/reset_quorum_test.go, line 22 at r11 (raw file):
Previously, tbg (Tobias Grieger) wrote…
It doesn't update meta2 any more, this is now done in ResetQuorum. But let's make this test read meta2 and check the result.
Done.
pkg/server/node.go, line 1006 at r11 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Let's remove this debugging code.
Done.
fe765df
to
2ed0016
Compare
Introduces and implements an `ResetQuorumRequest` RPC. `ResetQuorumRequest` takes in the range id of a range that that is unavailable due to lost quorum and makes it available again, at the cost of losing all of the data in that range. Any existing replica, even one residing on the target node, will irrevocably be removed. ResetQuorumRequest first uses meta2 to identify the range descriptor. Then, it removes all replicas from the range descriptor and adds a store from the target node as the one designated survivor replica. This change is then written to meta2 and sent as a snapshot to a store local to the target node in order to use crdb internal upreplication and rebalancing mechanisms to create further replicas from this fresh snapshot. This RPC is meant to be called by the user directly. It will not work for ranges that have not lost quorum or for a meta range. Release note: None
2ed0016
to
c3c4c66
Compare
bors r=irfansharif |
Build succeeded: |
56379: workload/schemachange: improve error screening r=ajwerner a=jayshrivastava workload/schemachange: improve error screening Several ops will be updated to screen for errors (#56119): - dropTable - dropView - createSequence - renameSequence - dropSequence - insertRow Sequences were also updated to have more interesting cases: non-existing sequences can randomly be returned and sequences can be owned by columns. Furthermore, this change fixes a bug where non-existing tables were getting returned instead of existing ones. Finally, error screening logic is updated to ignore transaction retry errors. This fixes one of the issues in #56230. 57034: cli: add reset-quorum command r=knz a=TheSamHuang This adds `reset-quorum` as a command in the debug CLI. This command utilizes the functionality added in #56333 to recover unavailable ranges where all replicas are lost. 57272: roachtest: Add missing gossiped StoreDescriptor to expected relocate errors r=nvanbenschoten a=nvanbenschoten Closes #57191. This seems possible shortly after the cluster starts up, if gossip hasn't propagated by the time the `ALTER TABLE _ EXPERIMENTAL_RELOCATE _` statement is run. 57274: roachtest: remove initial disk space check from drop test r=nvanbenschoten a=nvanbenschoten Fixes #56040. Will need to be backported. This assertion seemed like a good idea, but it was repeatedly (dec148a) a source of flakiness when fixtures changed. Now that we're using IMPORT for TPC-C, the check is even harder to get right without making it so small as to be useless. It doesn't seem to be worth the instability, so remove it. 57454: util/log: de-flake TestSecondaryGC and some SQL tests r=irfansharif a=knz Fixes #57442 Fixes #57444 See individual commits for details. Co-authored-by: Jayant Shrivastava <[email protected]> Co-authored-by: Sam Huang <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>
Introduces and implements an
ResetQuorumRequest
RPC.ResetQuorumRequest
takes in the range id of a range that that is unavailable due to lost quorum and makes it available again, at the cost of losing all of the data in that range. Any existing replica, even one residing on the target node, will irrevocably be removed. ResetQuorumRequest first uses meta2 to identify the range descriptor. Then, it removes all replicas from the range descriptor and adds a store from the target node as the one designated survivor replica. This change is then written to meta2 and sent as a snapshot to a store local to the target node in order to use crdb internal upreplication and rebalancing mechanisms to create further replicas from this fresh snapshot.This RPC is meant to be called by the user directly via the CLI command being introduced in #57034. It will not work for ranges that have not lost quorum or for a meta range.