-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: don't use marshaled proto for liveness CPut #49135
Conversation
2d72d9f
to
cb0f02a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)
pkg/kv/kvserver/node_liveness.go, line 182 at r1 (raw file):
// update to the liveness record: the raw value will act as the expected value. // This way the proto's encoding can change without the CPut failing. type LivenessRecordEx struct {
nit: just LivenessRecord
?
pkg/kv/kvserver/node_liveness.go, line 324 at r1 (raw file):
return false, errors.Wrap(err, "invalid liveness record") } //if (oldLiveness == storagepb.Liveness{}) {
Remove or uncomment this? (remove, I think)
pkg/kv/kvserver/node_liveness.go, line 777 at r1 (raw file):
// the database has for this liveness record in addition to the decoded liveness // proto. func (nl *NodeLiveness) GetLivenessEx(nodeID roachpb.NodeID) (LivenessRecordEx, error) {
Do we really need two different versions of these methods? I'd rather just use the version that includes the raw value everywhere.
pkg/kv/kvserver/node_liveness.go, line 950 at r1 (raw file):
log.Fatalf(ctx, "failed to marshall proto: %s", err) } // Make a copy of the expected value. We can't pass oldRaw because b.CPut()
Really? Eww.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, and @nvanbenschoten)
pkg/kv/kvserver/node_liveness.go, line 408 at r1 (raw file):
// liveness record. It has both the new and the old version of the proto. type livenessUpdate struct { new storagepb.Liveness
how about "updated"? Using keywords (new
) tickles my OCD. It also tends to throw off syntax highlighters, as it does in reviewable.
pkg/kv/kvserver/node_liveness.go, line 419 at r1 (raw file):
// if they don't correspond, the CPut is considered to have failed. // // When ignoreCache is set, oldLRaw needs to be set as well.
oldRaw
pkg/kv/kvserver/node_liveness.go, line 777 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Do we really need two different versions of these methods? I'd rather just use the version that includes the raw value everywhere.
+1
pkg/kv/kvserver/node_liveness.go, line 814 at r1 (raw file):
// another successful heartbeat, and a second increment to come in // after that) func (nl *NodeLiveness) IncrementEpoch(ctx context.Context, liveness storagepb.Liveness) error {
shouldn't this one be the augmented one with the raw bytes?
pkg/kv/kvserver/node_liveness.go, line 950 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Really? Eww.
Is it this?
Line 456 in f80ec9c
ev.ClearChecksum() |
That seems like bad code we would want to go away. I wonder why we need to clear the checksum - doesn't the checksum match expVal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @nvanbenschoten, and @tbg)
pkg/kv/kvserver/node_liveness.go, line 814 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
shouldn't this one be the augmented one with the raw bytes?
That's what I wanted initially, but the thing is that this is called from the lease handling code, which has a storagepb.LeaseStatus
. That guy is a proto (for no good reason) and currently doesn't include the raw bytes. I've considered changing that, but... I dunno. There's a bunch of stuff to change, and it's not really necessary given that this code doesn't set ignoreCache
- so we check the cache anyway, which has the raw bytes.
Thoughts?
cb0f02a
to
8d275dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Unsolicited) LGTM. Was looking around changing the liveness proto for random reasons and stumbled by this.
Reviewed 3 of 10 files at r1, 7 of 7 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @nvanbenschoten, and @tbg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @nvanbenschoten, @petermattis, and @tbg)
pkg/kv/kvserver/node_liveness.go, line 182 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
nit: just
LivenessRecord
?
done
pkg/kv/kvserver/node_liveness.go, line 324 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Remove or uncomment this? (remove, I think)
done
pkg/kv/kvserver/node_liveness.go, line 408 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
how about "updated"? Using keywords (
new
) tickles my OCD. It also tends to throw off syntax highlighters, as it does in reviewable.
Done.
pkg/kv/kvserver/node_liveness.go, line 419 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
oldRaw
Done.
pkg/kv/kvserver/node_liveness.go, line 777 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
+1
Done.
pkg/kv/kvserver/node_liveness.go, line 950 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Is it this?
Line 456 in f80ec9c
ev.ClearChecksum() That seems like bad code we would want to go away. I wonder why we need to clear the checksum - doesn't the checksum match
expVal
?
Yeah, this has bothered me greatly too. I spent some time digging.
So Batch.CPut()
takes an expected Value
. The value has a checksum, which includes the key corresponding to the respective key-value pair. So, if you read val
from k1
and then try to used it as the expected value for key k2
, the checksum needs to change.
At the moment, as we've seen, the CPut
code wipes the checksum that was given to it and recomputes it. I've tried to change it to deal with already-populated checksums by asserting that they're correct for the key currently being written - so then the contract would be that if you pass a value with a filled-in checksum to cput as the expected val, then it must have been read from the key you're currently trying to write. The caller would have the option to clear the checksum, in which case the CPut code will compute a good one.
This hit a problem because the liveness record values we're using for these CPuts often come from gossip, and gossip, alas, uses different damn keys when it broadcasts the liveness records (different from the keys where the records are in the db). So, in order to keep these gossiped liveness records immutable, we'd have to recompute their checksums when a node receives them. Which is doable, but pretty gross too.
So I'm taking opinions about what the right thing to do here is. Is a *roachpb.Value
even the right type for the expected val in the CPut
interface, or should it be simply []byte
(with no checksum)? The *roachpb.Value
has the advantage that it has a nice interface for encoding any type into it.
@petermattis I'm also curious if you have any insight into why gossip uses its own keys, different from the database keys (i.e. stuff in gossip/keys.go
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @nvanbenschoten, @petermattis, and @tbg)
pkg/kv/kvserver/node_liveness.go, line 950 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Yeah, this has bothered me greatly too. I spent some time digging.
SoBatch.CPut()
takes an expectedValue
. The value has a checksum, which includes the key corresponding to the respective key-value pair. So, if you readval
fromk1
and then try to used it as the expected value for keyk2
, the checksum needs to change.
At the moment, as we've seen, theCPut
code wipes the checksum that was given to it and recomputes it. I've tried to change it to deal with already-populated checksums by asserting that they're correct for the key currently being written - so then the contract would be that if you pass a value with a filled-in checksum to cput as the expected val, then it must have been read from the key you're currently trying to write. The caller would have the option to clear the checksum, in which case the CPut code will compute a good one.This hit a problem because the liveness record values we're using for these CPuts often come from gossip, and gossip, alas, uses different damn keys when it broadcasts the liveness records (different from the keys where the records are in the db). So, in order to keep these gossiped liveness records immutable, we'd have to recompute their checksums when a node receives them. Which is doable, but pretty gross too.
So I'm taking opinions about what the right thing to do here is. Is a
*roachpb.Value
even the right type for the expected val in theCPut
interface, or should it be simply[]byte
(with no checksum)? The*roachpb.Value
has the advantage that it has a nice interface for encoding any type into it.@petermattis I'm also curious if you have any insight into why gossip uses its own keys, different from the database keys (i.e. stuff in
gossip/keys.go
).
There's also the option of having the CPut
code clear the checksum without a fuss if it's not correct for the key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @nvanbenschoten, @petermattis, and @tbg)
pkg/kv/kvserver/node_liveness.go, line 950 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
There's also the option of having the
CPut
code clear the checksum without a fuss if it's not correct for the key.
I've discussed this offline with @bdarnell; I'm going to change the CPut interface to simply take a read-only []byte
for the expected val, so that we don't need to worry about checksums any more.
8d275dc
to
80686e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @irfansharif, @nvanbenschoten, @petermattis, and @tbg)
pkg/kv/kvserver/node_liveness.go, line 871 at r4 (raw file):
// cache. func (nl *NodeLiveness) updateLiveness( ctx context.Context, update livenessUpdate, handleCondFailed func(actual LivenessRecord) error,
Does this handleCondFailed
callback need a LivenessRecord
type, or could it to with a vanilla storagepb.Liveness
? I don't think the callers use the raw bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @irfansharif, @nvanbenschoten, @petermattis, and @tbg)
pkg/kv/kvserver/node_liveness.go, line 950 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I've discussed this offline with @bdarnell; I'm going to change the CPut interface to simply take a read-only
[]byte
for the expected val, so that we don't need to worry about checksums any more.
The change changing the CPut API is #50408. Let's agree on that one and come back.
79d5435
to
951bd0e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @irfansharif, @nvanbenschoten, @petermattis, and @tbg)
pkg/kv/kvserver/node_liveness.go, line 871 at r4 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Does this
handleCondFailed
callback need aLivenessRecord
type, or could it to with a vanillastoragepb.Liveness
? I don't think the callers use the raw bytes.
they do; the callback calls maybeUpdate()
(not new).
I've tried to make the callbacks not call maybeUpdate()
, and instead return the actual
from updateLiveness()
(since that caller needs to sometimes call maybeUpdate()
anyway), but it gets a bit gnarly because the callback can return errors too, and then it becomes unclear when the caller of updateLiveness
should call maybeUpdate
and when it shouldn't. So I've left updateLiveness()
return an empty LivenessRecord
when handleCondFailed()
is called.
951bd0e
to
6733b9e
Compare
Release note: None
The comment on UpdateLiveness claimed that the handleCondFailed() arg is optional, but it wasn't (not if ignoreCache was not set, at least). All callers pass that arg, so this patch removes the half-assed attempt at having it be optional. Release note: None
Fixes cockroachdb#38308 Updates to the liveness records are done as CPuts. Before this patch, the CPuts' expected value was the re-marshaled proto with the previous version. As cockroachdb#38308 explains, that's a bad practice since it prevents the proto's encoding to change in any way (e.g. fields can't be removed). Instead, this patch moves to keeping track of the raw bytes that have been read from the DB, besides the unmarshalled liveness protos. The raw bytes are used as the expected values. Release note: None
6733b9e
to
9c0dce9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @irfansharif, @nvanbenschoten, @petermattis, and @tbg)
Build succeeded |
50329: kvserver,cli,roachtest,sql: introduce a fully decommissioned bit r=tbg a=irfansharif Ignore first commit, cherry-picked from @andreimatei's #49135. --- This commit introduces a fully decommissioned bit to Cockroach. Previously our Liveness schema only contained a decommissioning bool, with consequently no ability to disamiguate between a node currently undergoing decommissioning, and a node that was fully decommissioned. We used some combination of store dead threshold to surface, in our UI, "fully decommissioned" nodes, but it was never quite so. We need this specificity for the Connect RPC. We wire up the new CommissionStatus enum that's now part of the liveness record. In doing so it elides usage of the decommissioning bool used in v20.1. We're careful to maintain an on-the-wire representation of the Liveness record that will be understood by v20.1 nodes (see Liveness.EnsureCompatible for details). We repurpose the AdminServer.Decommission RPC (which is now a misnomer, should be thought of being named Commission instead) to persist CommissionStatus states to KV through the lifetime of a node decommissioning/recommissioning. See cli/node.go for where that's done. For recommissioning a node, it suffices to simply persist a COMMISSIONED state. When decommissioning a node, since it's a longer running process, we first persist an in-progress DECOMMISSIONING state, and once we've moved off all the Replicas in the node, we finalize the decommissioning process by persisting the DECOMMISSIONED state. When transitioning between CommissionStatus states, we CPut against what's already there, disallowing illegal state transitions. The appropriate error codes are surfaced back to the user. An example would be in attempting to recommission a fully decommissioned node, in which case we'd error out with the following: > command failed: illegal commission status transition: can only > recommission a decommissioning node; n4 found to be decommissioned Note that this is a behavioral change for `cockroach node recommission`. Previously it was able to recommission any "decommissioned" node, regardless of how long ago it's was removed from the cluster. Now recommission serves to only cancel an accidental decommissioning process that wasn't finalized. The 'decommissioning' column in crdb_internal.gossip_liveness is now powered by this new CommissionStatus, and we introduce a new commission_status column to it. We also introduce the same column to the output generated by `cockroach node status --decommission`. The is_decommissioning column still exists, but is also powered by the CommissionStatus now. We also iron out the events plumbed into system.eventlog: it now has a dedicated event for "node decommissioning". Release note (general change): `cockroach node recommission` has new semantics.Previously it was able to recommission any decommissioned node, regardless of how long ago it's was fully removed from the cluster. Now recommission serves to only cancel an accidental decommissioning process that wasn't finalized. Release note (cli change): We introduce a commission_status column to the output generated by `cockroach node status --decommission`. It should be used in favor of the is_decommissioning column going forward. Co-authored-by: irfan sharif <[email protected]>
Fixes #38308
Updates to the liveness records are done as CPuts. Before this patch,
the CPuts' expected value was the re-marshaled proto with the previous
version. As #38308 explains, that's a bad practice since it prevents the
proto's encoding to change in any way (e.g. fields can't be removed).
Instead, this patch moves to keeping track of the raw bytes that have
been read from the DB, besides the unmarshalled liveness protos. The raw
bytes are used as the expected values.
Release note: None