-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver/loqrecovery: record and post replica recovery events #73785
Conversation
47f5b0f
to
592784e
Compare
592784e
to
23ae0b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review (meeting coming up), also I didn't look at the first two commits since they're from different PRs (that was the intention right?)
This PR links two PRs, could you also link the tracking issue(s) directly?
Reviewed 25 of 25 files at r1, 7 of 7 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)
pkg/keys/constants.go, line 181 at r3 (raw file):
// LocalStoreCachedSettingsKeyMax is the end of span of possible cached settings keys. LocalStoreCachedSettingsKeyMax = LocalStoreCachedSettingsKeyMin.PrefixEnd() // LocalStoreUnsafeReplicaRecoverySuffix suffix for record entries put
is a
pkg/keys/constants.go, line 183 at r3 (raw file):
// LocalStoreUnsafeReplicaRecoverySuffix suffix for record entries put // when loss of quorum recovery operations are performed offline on the store. localStoreUnsafeReplicaRecoverySuffix = []byte("loqr")
The list here is sorted, so move this diff up in front of stng
.
pkg/keys/constants.go, line 185 at r3 (raw file):
localStoreUnsafeReplicaRecoverySuffix = []byte("loqr") // LocalStoreUnsafeReplicaRecoveryKeyMin is the start of keyspace used to store // replica LOQ recovery keys.
consider expanding LOQ since it's not a widely-known abbreviation.
pkg/keys/constants.go, line 188 at r3 (raw file):
LocalStoreUnsafeReplicaRecoveryKeyMin = MakeStoreKey(localStoreUnsafeReplicaRecoverySuffix, nil) // LocalStoreUnsafeReplicaRecoveryKeyMax is the end of keyspace used to store replica // LOQ recovery keys.
ditto
pkg/keys/doc.go, line 224 at r3 (raw file):
StoreLastUpKey, // "uptm" StoreCachedSettingsKey, // "stng" StoreReplicaUnsafeRecoveryKey, // "loqr"
ditto order, also looks like uptm and stng are in wrong order so flip them as well while we're here.
pkg/keys/keys.go, line 126 at r3 (raw file):
} // StoreReplicaUnsafeRecoveryKey creates a key for replica LOQ recovery entry.
LOQ
Also feel like short usage might be helpful, you can then refer to this method in other comments and don't have to re-explain all the time.
// These keys are written during cockroach debug recover apply-plan
and are translated into structured log events when the store next boots up, to leave an audit trail of the recovery operation. The key encodes a sequence number whose only purpose is to generate unique keys.
pkg/keys/keys.go, line 131 at r3 (raw file):
} // DecodeStoreReplicaUnsafeRecoverKey decodes index out of replica LOQ recovery entry.
How do you feel about "sequence number" instead of "index"? In a SQL database, the word "index"has an existing meaning and I want to avoid confusion.
// DecodeStoreReplicaUnsafeRecoverKey decodes the sequence number for the loss of quorum recovery operation represented by this key.
Code quote:
// DecodeStoreReplicaUnsafeRecoverKey decodes index out of replica LOQ recovery entry.
pkg/kv/kvserver/loqrecovery/apply.go, line 79 at r3 (raw file):
) (PrepareStoreReport, error) { var report PrepareStoreReport updateTs := timeutil.Now().UnixNano()
Is it ok to introduce non-determinism here? I primarily think of this method as a library method and would want it to be deterministic on its inputs.
pkg/kv/kvserver/loqrecovery/record.go, line 55 at r3 (raw file):
return errors.Wrap(err, "failed to marshal update record entry") } fmt.Printf("Writing replica update record to store.")
Just checking if it is ok to use Printf
here. Seems a little odd, again, I'm thinking of this code as a library and in the method below you pass logF
.
pkg/kv/kvserver/loqrecovery/record.go, line 61 at r3 (raw file):
// RegisterOfflineRecoveryEvents checks if recovery data was captured in the store and writes // appropriate structured entries to the log. Temporary storage local keys for those event // records are removed once they are processed.
On top of that issue, I think it would be helpful to just not delete those keys and to include the count in a metric (a gauge). That way, we can quickly, forever, from a debug.zip, see that the recovery has taken place, and this will never expire.
If this sounds good, let's file an issue to track the metric, and remove the deletion here (adding a EmittedStructuredLogEntry bool
). cc @erikgrinaker.
Code quote:
73679
pkg/kv/kvserver/loqrecovery/record.go, line 75 at r3 (raw file):
eventCount := 0 iter := readWriter.NewMVCCIterator(storage.MVCCKeyIterKind, storage.IterOptions{
why not readWriter.MVCCIterate
? Always harder to spot bugs in manual iterator code IMO.
pkg/kv/kvserver/loqrecovery/record.go, line 86 at r3 (raw file):
for ; valid && err == nil; valid, err = iter.Valid() { key := iter.Key() recoverEntryIndex, _ := keys.DecodeStoreReplicaUnsafeRecoverKey(iter.UnsafeRawKey())
Surely you want to check the error?
pkg/kv/kvserver/loqrecovery/record.go, line 103 at r3 (raw file):
No caps here I think
Also feels like this could explain what's going on better.
emitted %d log entries to reflect recent use of
debug recover apply-plan
pkg/kv/kvserver/loqrecovery/record.go, line 108 at r3 (raw file):
} // findFirstAvailableRecoveryEventIndex finds first unused index of replica recovery events
Stopping review here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 22 of 22 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @erikgrinaker)
pkg/kv/kvserver/loqrecovery/apply.go, line 84 at r3 (raw file):
// Map contains a set of store names that were found in plan for this node, but were not // configured in this command invocation. missing := make(map[roachpb.StoreID]struct{})
I'd think you'd load the nextUpdateRecordIndex (or nextUpdateRecordSeqNo? ;-) ) here.
pkg/kv/kvserver/loqrecovery/record.go, line 61 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
On top of that issue, I think it would be helpful to just not delete those keys and to include the count in a metric (a gauge). That way, we can quickly, forever, from a debug.zip, see that the recovery has taken place, and this will never expire.
If this sounds good, let's file an issue to track the metric, and remove the deletion here (adding aEmittedStructuredLogEntry bool
). cc @erikgrinaker.
Caveat, decomissioning nodes out of the cluster would lose this state. So if we're serious about keeping this forever, moving it to a SQL table sounds better. But I'm not sure introducing a separate retention policy to the rangelog is the way we should go about this. Anyway, none of this is too important, we can discuss separately.
pkg/kv/kvserver/loqrecovery/utils.go, line 59 at r3 (raw file):
// NewUpdatableStore creates new UpdatableStore from storage.Engine func NewUpdatableStore(storage storage.Engine) (*UpdatableStore, error) { ind, err := findFirstAvailableRecoveryEventIndex(storage)
Why is this done here? You can load the index when you need it, wouldn't this keep things simpler overall?
pkg/kv/kvserver/loqrecovery/loqrecoverypb/recovery.proto, line 72 at r3 (raw file):
// downstream destinations. message ReplicaRecoveryRecord { // timestamp is unix time in nanos
// The timestamp of the record creation (i.e. when the record was instantiated, prior to being committed to the storage engine). Expressed as nanoseconds since the Unix epoch.
pkg/kv/kvserver/loqrecovery/loqrecoverypb/recovery.proto, line 76 at r3 (raw file):
int64 range_id = 2 [(gogoproto.customname) = "RangeID", (gogoproto.casttype) = "github.com/cockroachdb/cockroach/pkg/roachpb.RangeID", (gogoproto.moretags) = "yaml:\"RangeID\""];
throughout: yaml:"X"
requires less escaping and is thus preferable.
This actually goes for all of this file, so could as well be a separate cleanup PR after this lands, or a prepended commit here, your call.
pkg/server/server.go, line 1726 at r3 (raw file):
s.node.waitForAdditionalStoreInit() // Once all stores are initialized, check if offline storage recovery
We strictly don't have to wait, since a store that is just getting bootstrapped now couldn't have meaningfully participated in a recovery event (since it was empty then, didn't even have a store ident). I still think it's fine to keep this here, but maybe add a comment. You could also move this earlier if you can think of a better home. The earlier we log these records out, the better. Where we are now, we won't log until cluster is available, which may never happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 25 of 25 files at r1, 7 of 7 files at r2, 22 of 22 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @tbg)
pkg/kv/kvserver/loqrecovery/apply.go, line 79 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Is it ok to introduce non-determinism here? I primarily think of this method as a library method and would want it to be deterministic on its inputs.
+1
pkg/kv/kvserver/loqrecovery/record.go, line 61 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Caveat, decomissioning nodes out of the cluster would lose this state. So if we're serious about keeping this forever, moving it to a SQL table sounds better. But I'm not sure introducing a separate retention policy to the rangelog is the way we should go about this. Anyway, none of this is too important, we can discuss separately.
Yeah, we should have a permanent record of this. Unless anyone have better ideas, maybe we'll just move them to the replicated keyspace and export them in debug.zips, along with a metric (assuming we can store metrics independently of nodes/stores being removed)? I think it's fine to remove the store-local record once it's stored permanently somewhere else. We should also record an entry in the range log, even though it's going to get GCed.
pkg/kv/kvserver/loqrecovery/record.go, line 75 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
why not
readWriter.MVCCIterate
? Always harder to spot bugs in manual iterator code IMO.
MVCCIterate
may not be long for this world: #66828
pkg/kv/kvserver/loqrecovery/record.go, line 85 at r3 (raw file):
valid, err := iter.Valid() for ; valid && err == nil; valid, err = iter.Valid() { key := iter.Key()
nit: this can use UnsafeKey()
. Also, key
is only used in one place, so we can call it directly.
pkg/kv/kvserver/loqrecovery/record.go, line 90 at r3 (raw file):
record := loqrecoverypb.ReplicaRecoveryRecord{} if err = iter.ValueProto(&record); err != nil { return errors.Wrapf(err, "failed to deserialize replica recovery event at index %d", recoverEntryIndex)
nit: keep within 100 columns. I think you can set up your editor to show or enforce that.
pkg/kv/kvserver/loqrecovery/record.go, line 92 at r3 (raw file):
return errors.Wrapf(err, "failed to deserialize replica recovery event at index %d", recoverEntryIndex) } logF(ctx, record.AsStructuredLog())
Instead of logF
, consider making this more general and passing in the record, since we'll probably want to record it in other places as well (Tobi mentioned metrics, for example).
pkg/kv/kvserver/loqrecovery/record.go, line 103 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
No caps here I think
Also feels like this could explain what's going on better.
emitted %d log entries to reflect recent use of
debug recover apply-plan
I'd leave it up to the caller to log this, instead returning the number of events recorded. That would also remove the need to detect the store ID from the events, since the caller already knows the store ID.
pkg/kv/kvserver/loqrecovery/record.go, line 110 at r3 (raw file):
// findFirstAvailableRecoveryEventIndex finds first unused index of replica recovery events // store local key range. func findFirstAvailableRecoveryEventIndex(reader storage.Reader) (uint64, error) {
s/first/next. If there is a single record at index 2, this will pick index 3, not 0 (which would be the first available index).
pkg/kv/kvserver/loqrecovery/loqrecoverypb/recovery.proto, line 72 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
// The timestamp of the record creation (i.e. when the record was instantiated, prior to being committed to the storage engine). Expressed as nanoseconds since the Unix epoch.
Or even clearer, when the recovery was performed.
pkg/server/server.go, line 1738 at r3 (raw file):
// We don't want to abort server if we can't record recovery // as it is the last thing we need if cluster is already unhealthy. log.Errorf(ctx, "failed to record store recovery info: %v", err)
Consider including "loss of quorum" in this message.
pkg/util/log/eventpb/debug_events.proto, line 53 at r3 (raw file):
int64 store_id = 4 [(gogoproto.customname) = "StoreID"]; int32 original_replica_id = 5 [(gogoproto.customname) = "OriginalReplicaID"]; int32 updated_replica_id = 6 [(gogoproto.customname) = "UpdatedReplicaID"];
I find the terms "original" and "updated" replicas to be a bit unclear. In the CLI I believe we use the term "promoted" replica, and "survivor" has also been used in the code/RFC. The plan itself uses Old
and New
. Let's pick a consistent clear term and use it everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @tbg)
pkg/kv/kvserver/loqrecovery/utils.go, line 59 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Why is this done here? You can load the index when you need it, wouldn't this keep things simpler overall?
I don't like the idea of doing scan for every message that we try to write. For that we need to either do it upfront or lazily on first change. Lazy seem complicated and was already reviewed away for batches on previous PR.
pkg/server/server.go, line 1726 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
We strictly don't have to wait, since a store that is just getting bootstrapped now couldn't have meaningfully participated in a recovery event (since it was empty then, didn't even have a store ident). I still think it's fine to keep this here, but maybe add a comment. You could also move this earlier if you can think of a better home. The earlier we log these records out, the better. Where we are now, we won't log until cluster is available, which may never happen.
I was thinking we may need to push this to a later point actually if we want to add it to rangelog or shared keyspace. I think we have a point where we already connected but not accepting client connections yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @tbg)
pkg/kv/kvserver/loqrecovery/utils.go, line 59 at r3 (raw file):
Previously, aliher1911 (Oleg) wrote…
I don't like the idea of doing scan for every message that we try to write. For that we need to either do it upfront or lazily on first change. Lazy seem complicated and was already reviewed away for batches on previous PR.
Something just bugs me about this, we're doing what feels like too much code work picking these unique indices and any bug will likely result in overwriting entries we wanted to have kept (and there are unspoken assumptions about there not being concurrency, etc). How about we scrap all that and put a random uuid into the key?
pkg/server/server.go, line 1726 at r3 (raw file):
Previously, aliher1911 (Oleg) wrote…
I was thinking we may need to push this to a later point actually if we want to add it to rangelog or shared keyspace. I think we have a point where we already connected but not accepting client connections yet.
Yeah, it makes sense to do this relatively late when we are moving things to replicated storage. For putting it to structured logs alone, it makes sense to do it as early as possible. For simplicity's sake doing it late seems preferable so that we don't have to maintain two pieces of code.
23ae0b9
to
d69ba3f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)
pkg/keys/constants.go, line 183 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
The list here is sorted, so move this diff up in front of
stng
.
"cver", "goss", "hlcu", "iden", "ntmb", "uptm", "dlre", "stng" ... hmmm :) Let me sort that out.
pkg/keys/keys.go, line 126 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
LOQ
Also feel like short usage might be helpful, you can then refer to this method in other comments and don't have to re-explain all the time.
// These keys are written during
cockroach debug recover apply-plan
and are translated into structured log events when the store next boots up, to leave an audit trail of the recovery operation. The key encodes a sequence number whose only purpose is to generate unique keys.
What is the right terminology here, do we write keys or values? Writing keys sounds like the key itself has a meaningful value rather than being just an address on how to find data.
pkg/keys/keys.go, line 131 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
How do you feel about "sequence number" instead of "index"? In a SQL database, the word "index"has an existing meaning and I want to avoid confusion.
// DecodeStoreReplicaUnsafeRecoverKey decodes the sequence number for the loss of quorum recovery operation represented by this key.
This unfortunate that index is already hijacked by sql ppl. At least we kept that in raft log!
Calling it a sequence seems a bit of a stretch. We write values sequentially, but the sequence could vanish if all entries are successfully consumed and it starts from 0 again.
Needs some proper explanations then.
pkg/kv/kvserver/loqrecovery/apply.go, line 79 at r3 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
+1
Fair point, it should be supplied externally. Should we have a timestamp as a time when plan was created and then embedded in the plan itself? Superficially this seems excessive and the actual application time is of interest as well.
pkg/kv/kvserver/loqrecovery/record.go, line 61 at r3 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Yeah, we should have a permanent record of this. Unless anyone have better ideas, maybe we'll just move them to the replicated keyspace and export them in debug.zips, along with a metric (assuming we can store metrics independently of nodes/stores being removed)? I think it's fine to remove the store-local record once it's stored permanently somewhere else. We should also record an entry in the range log, even though it's going to get GCed.
Until we have a better storage place for those records beside rangelog and structured events, should we post them again and again on every restart, or mark them as consumed, or keep last saved event index as a separate key?
pkg/util/log/eventpb/debug_events.proto, line 53 at r3 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
I find the terms "original" and "updated" replicas to be a bit unclear. In the CLI I believe we use the term "promoted" replica, and "survivor" has also been used in the code/RFC. The plan itself uses
Old
andNew
. Let's pick a consistent clear term and use it everywhere.
I think i scrapped all "promoted" usage and it is just updated and new_replica while original replica is called survivor.
40ef4de
to
7f87ed2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)
pkg/kv/kvserver/loqrecovery/utils.go, line 59 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Something just bugs me about this, we're doing what feels like too much code work picking these unique indices and any bug will likely result in overwriting entries we wanted to have kept (and there are unspoken assumptions about there not being concurrency, etc). How about we scrap all that and put a random uuid into the key?
UUID + sequence number? We can use timestamp for that. That would allow iteration in sequence so events won't be spewed randomly but in key order or even recovery attempt then key order.
7f87ed2
to
c0d556d
Compare
Looking on the rangelog. We have helper "chain" that updates rangelog and are called as a part of replica changes. It is very much tailored to do normal updates that are initiated from replica, so are the types and reasons for changes. We can ignore that because it is very specific and private and add new I can see that admin API that queries RangeLog in |
cbbb41d
to
9e52034
Compare
9e52034
to
b9cfc1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 41 files at r4, 22 of 22 files at r11, 21 of 21 files at r12, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)
pkg/cli/debug_recover_loss_of_quorum.go, line 440 at r12 (raw file):
updateTime := timeutil.Now() updateUUID, _ := uuid.NewV4()
Why are we ignoring the error here?
pkg/keys/constants.go, line 166 at r12 (raw file):
// when loss of quorum recovery operations are performed offline on the store. // See StoreReplicaUnsafeRecoveryKey for details. localStoreUnsafeReplicaRecoverySuffix = []byte("loqr")
Why do we use "UnsafeReplica" here and "ReplicaUnsafe" in StoreReplicaUnsafeRecoveryKey
? Let's pick one.
pkg/keys/keys.go, line 133 at r12 (raw file):
// there are no key clashes when creating records. Each run of apply will generate UUID and // write records with sequential keys under this UUID. func StoreReplicaUnsafeRecoveryKey(detailPrefix roachpb.Key, sequenceNumber uint64) roachpb.Key {
I'm not sure where we landed on the keys here, I know there was talk of using timestamps too -- but if we're going with UUIDs then let's call the parameter uuid
or id
.
Also, is it really necessary with a sequence number for UUIDs? UUIDv1 is almost guaranteed to be unique, since it includes the MAC address and local timestamp, while the randomly generated UUIDv4 has such a low collision probability that I think for our purposes here we can ignore it.
pkg/kv/kvserver/metrics.go, line 506 at r12 (raw file):
} metaRangeLossOfQuorumRecoveries = metric.Metadata{ Name: "range.loss.of.quorum.recoveries",
Use loss_of_quorum
-- the .
separator is for logical grouping, not for words.
pkg/kv/kvserver/loqrecovery/record.go, line 72 at r12 (raw file):
ctx context.Context, readWriter storage.ReadWriter, registerEvent func(context.Context, loqrecoverypb.ReplicaRecoveryRecord) bool,
This should probably return an error rather than a bool.
pkg/kv/kvserver/loqrecovery/record.go, line 85 at r12 (raw file):
iter.SeekGE(storage.MVCCKey{Key: keys.LocalStoreUnsafeReplicaRecoveryKeyMin}) valid, err := iter.Valid() for ; valid && err == nil; valid, err = iter.Valid() {
The err
here needs to be checked after the loop, or returned along with eventCount
.
pkg/kv/kvserver/loqrecovery/record.go, line 90 at r12 (raw file):
return eventCount, errors.Wrapf( err, "failed to deserialize replica recovery event at key %s", iter.UnsafeRawKey())
Any reason for using RawKey
rather than Key
here? Won't Key
have better formatting, since RawKey
is just a byte slice?
pkg/kv/kvserver/loqrecovery/utils.go, line 59 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
That sounds good, keys sorted by (wall_time, uuid) sound like they'd do the right thing. The uuid will be largely unnecessary but it'll let us sleep soundly at night. We could also use a rand uint64 instead if that's easier.
Timestamp is probably fine. UUIDv1 incorporates both the MAC address, timestamp, and a sequence number, so we could use that too and not worry about duplicates.
pkg/ts/catalog/chart_catalog.go, line 1564 at r12 (raw file):
}, { Title: "Unsafe Loss Of Quorum Recoveries",
s/Of/of/
bc018d1
to
8a6ee84
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)
pkg/keys/constants.go, line 166 at r12 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Why do we use "UnsafeReplica" here and "ReplicaUnsafe" in
StoreReplicaUnsafeRecoveryKey
? Let's pick one.
yes sir!
pkg/keys/keys.go, line 133 at r12 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
I'm not sure where we landed on the keys here, I know there was talk of using timestamps too -- but if we're going with UUIDs then let's call the parameter
uuid
orid
.Also, is it really necessary with a sequence number for UUIDs? UUIDv1 is almost guaranteed to be unique, since it includes the MAC address and local timestamp, while the randomly generated UUIDv4 has such a low collision probability that I think for our purposes here we can ignore it.
So the keys would go like /Local/Store/lossOfQuorumRecovery/applied/123becd2-5839-4edd-ab37-176521bfbaad and since UUIDv1 has timestamp at the front, we should have them ordered in creation time so that it would match our presented application order. So I changed them as you suggested not to use seq bug just uuid instead.
pkg/kv/kvserver/loqrecovery/record.go, line 61 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I think it would make sense to have a separate prefix for consumed keys and to move keys over as they get emitted. So
/to-send/<ts><salt>
will be moved to/already-emitted/<ts><salt>
. This is easy and tides us over well, without having to worry about code paths repeatedly touching the keys.
so I made it /Local/Store/lossOfQuorumRecovery/applied/123becd2-5839-4edd-ab37-176521bfbaad for now and once the change is there to preserve after processing instead of deleting we can use /Local/Store/lossOfQuorumRecovery/archived/123becd2-5839-4edd-ab37-176521bfbaad for that. I think we can have it as separate PR. We have two distinct ranges so it would ne furure proof this way.
0b2474c
to
d9e7910
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 11 of 27 files at r13, 2 of 2 files at r14.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)
pkg/keys/keys.go, line 133 at r12 (raw file):
Previously, aliher1911 (Oleg) wrote…
So the keys would go like /Local/Store/lossOfQuorumRecovery/applied/123becd2-5839-4edd-ab37-176521bfbaad and since UUIDv1 has timestamp at the front, we should have them ordered in creation time so that it would match our presented application order. So I changed them as you suggested not to use seq bug just uuid instead.
Don't think that's true, UUIDv1 stores low-order bytes of the timestamp first: https://www.rfc-editor.org/rfc/rfc4122.html#section-4.1.2
But I also don't think it matters. Do we care about these being ordered?
pkg/kv/kvserver/loqrecovery/record.go, line 61 at r3 (raw file):
Previously, aliher1911 (Oleg) wrote…
so I made it /Local/Store/lossOfQuorumRecovery/applied/123becd2-5839-4edd-ab37-176521bfbaad for now and once the change is there to preserve after processing instead of deleting we can use /Local/Store/lossOfQuorumRecovery/archived/123becd2-5839-4edd-ab37-176521bfbaad for that. I think we can have it as separate PR. We have two distinct ranges so it would ne furure proof this way.
Sure, that works.
9899bed
to
ba4b873
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of minor issues, otherwise this looks great!
Reviewed 12 of 35 files at r5, 2 of 23 files at r6, 2 of 21 files at r7, 23 of 23 files at r15, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @tbg)
pkg/keys/printer.go, line 197 at r15 (raw file):
{"/nodeTombstone", localStoreNodeTombstoneSuffix}, {"/cachedSettings", localStoreCachedSettingsSuffix}, {"/lossOfQuorumRecovery/applied", localStoreUnsafeReplicaRecoverySuffix},
Huh, any idea why these have a lowercased initial letter?
pkg/kv/kvserver/loqrecovery/record.go, line 28 at r15 (raw file):
// log the data and preserve this information for subsequent debugging as // needed. // Record keys have an index suffix. Every recovery run will find first unused
This is outdated.
pkg/kv/kvserver/loqrecovery/record.go, line 95 at r15 (raw file):
iter.UnsafeRawKey()) } if err := registerEvent(ctx, record); err == nil {
We need to do something with the error here, e.g. return it up the stack or something.
pkg/kv/kvserver/loqrecovery/testdata/replica_changes_recorded, line 83 at r15 (raw file):
# We now try to dump stores once again to verify that record # consumer cleans up events if it passed them on successfully. dump-events stores=(1,2)
nit: it's a little odd that dump-events
also modifies the events -- it sounds like a read-only action. Consider renaming it to e.g. register-events
or dump-and-remove-events
or something.
6d9e45a
to
ce71b50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)
pkg/keys/keys.go, line 133 at r12 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Don't think that's true, UUIDv1 stores low-order bytes of the timestamp first: https://www.rfc-editor.org/rfc/rfc4122.html#section-4.1.2
But I also don't think it matters. Do we care about these being ordered?
It is more like consistency when you have events are emitted in reverse of what you saw in plan or application process. Since those actions are spread in time and done by different people it shoudn't matter. Only if you run tests where you fail, recover and verify events manually it makes sense, but it doesn't justify extra machinery to ensure the order. Let's keep it like that.
pkg/kv/kvserver/loqrecovery/record.go, line 95 at r15 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
We need to do something with the error here, e.g. return it up the stack or something.
I changed everything to bail on first error regardless if that's iteration, unmarshalling or processing failure. That would at least be consistent across the board.
pkg/kv/kvserver/loqrecovery/testdata/replica_changes_recorded, line 83 at r15 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
nit: it's a little odd that
dump-events
also modifies the events -- it sounds like a read-only action. Consider renaming it to e.g.register-events
ordump-and-remove-events
or something.
Looking at it, I'll just make an argument to remove conditionally.
@tbg want to have a final look? I think all of the pending questions were ironed by now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 16 of 23 files at r15, 13 of 13 files at r16, all commit messages.
Dismissed @erikgrinaker from a discussion.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @erikgrinaker)
pkg/kv/kvserver/metrics.go, line 506 at r16 (raw file):
} metaRangeLossOfQuorumRecoveries = metric.Metadata{ Name: "range.loss-of-quorum.recoveries",
nit: perhaps
"range.splits"
"range.merges"
"range.adds"
"range.removes"
"range.recoveries" <--
Then we also sidestep the awkward question of whether to use -
or _
(unfortunately we use both)
pkg/kv/kvserver/metrics.go, line 507 at r16 (raw file):
metaRangeLossOfQuorumRecoveries = metric.Metadata{ Name: "range.loss-of-quorum.recoveries", Help: "Number of offline loss of quorum recovery operations performed on ranges",
`Count of offline loss of quorum recovery operations performed on ranges.
<Explain here when this metric increments. It's when we log out the operations after the designated survivor for the range restarts, right? Also, perhaps>
`
Code quote:
Number of offline loss of quorum recovery operations performed on ranges"
pkg/ts/catalog/chart_catalog.go, line 1565 at r16 (raw file):
{ Title: "Unsafe Loss of Quorum Recoveries", Metrics: []string{"range.loss-of-quorum.recoveries"},
Here you also need to make a change if we end up changing the metric name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚅 choo choo!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @erikgrinaker)
Previously when replica is rewritten offline to recover from loss of quorum, no trace of this event was kept in the system. This is not ideal as it is not possible to understand what happened unless person performing recovery documented its actions and then informed person doing investigation. This diff adds records of such actions. Records are created offline in store and then propagated to server logs when node is restarted. Release note: None
ce71b50
to
02854ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)
pkg/ts/catalog/chart_catalog.go, line 1565 at r16 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Here you also need to make a change if we end up changing the metric name.
Thanks for saving me 1h of life! 😀 Otherwise I'll have to wait for the TC to tell me lint has failed.
bors r=tbg,erikgrinaker |
Build failed: |
bors r=tbg,erikgrinaker |
Trying to force bors to build on top of pending fix for the merge failures. |
Build succeeded: |
Previously when replica is rewritten offline to recover from loss
of quorum, no trace of this event was kept in the system.
This is not ideal as it is not possible to understand what happened
unless person performing recovery documented its actions and then
informed person doing investigation.
This diff adds records of such actions. Records are created offline
in store and then propagated to server logs when node is restarted.
Release note: None
Fixes #73281