Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: don't use marshaled proto for liveness CPut #49135

Merged
merged 3 commits into from
Jun 29, 2020

Conversation

andreimatei
Copy link
Contributor

Fixes #38308

Updates to the liveness records are done as CPuts. Before this patch,
the CPuts' expected value was the re-marshaled proto with the previous
version. As #38308 explains, that's a bad practice since it prevents the
proto's encoding to change in any way (e.g. fields can't be removed).
Instead, this patch moves to keeping track of the raw bytes that have
been read from the DB, besides the unmarshalled liveness protos. The raw
bytes are used as the expected values.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andreimatei andreimatei force-pushed the cleanup.liveness-cput branch from 2d72d9f to cb0f02a Compare May 15, 2020 22:04
Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 10 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)


pkg/kv/kvserver/node_liveness.go, line 182 at r1 (raw file):

// update to the liveness record: the raw value will act as the expected value.
// This way the proto's encoding can change without the CPut failing.
type LivenessRecordEx struct {

nit: just LivenessRecord?


pkg/kv/kvserver/node_liveness.go, line 324 at r1 (raw file):

			return false, errors.Wrap(err, "invalid liveness record")
		}
		//if (oldLiveness == storagepb.Liveness{}) {

Remove or uncomment this? (remove, I think)


pkg/kv/kvserver/node_liveness.go, line 777 at r1 (raw file):

// the database has for this liveness record in addition to the decoded liveness
// proto.
func (nl *NodeLiveness) GetLivenessEx(nodeID roachpb.NodeID) (LivenessRecordEx, error) {

Do we really need two different versions of these methods? I'd rather just use the version that includes the raw value everywhere.


pkg/kv/kvserver/node_liveness.go, line 950 at r1 (raw file):

			log.Fatalf(ctx, "failed to marshall proto: %s", err)
		}
		// Make a copy of the expected value. We can't pass oldRaw because b.CPut()

Really? Eww.

@tbg tbg requested a review from bdarnell May 19, 2020 08:29
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 10 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, and @nvanbenschoten)


pkg/kv/kvserver/node_liveness.go, line 408 at r1 (raw file):

// liveness record. It has both the new and the old version of the proto.
type livenessUpdate struct {
	new storagepb.Liveness

how about "updated"? Using keywords (new) tickles my OCD. It also tends to throw off syntax highlighters, as it does in reviewable.


pkg/kv/kvserver/node_liveness.go, line 419 at r1 (raw file):

	// if they don't correspond, the CPut is considered to have failed.
	//
	// When ignoreCache is set, oldLRaw needs to be set as well.

oldRaw


pkg/kv/kvserver/node_liveness.go, line 777 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Do we really need two different versions of these methods? I'd rather just use the version that includes the raw value everywhere.

+1


pkg/kv/kvserver/node_liveness.go, line 814 at r1 (raw file):

// another successful heartbeat, and a second increment to come in
// after that)
func (nl *NodeLiveness) IncrementEpoch(ctx context.Context, liveness storagepb.Liveness) error {

shouldn't this one be the augmented one with the raw bytes?


pkg/kv/kvserver/node_liveness.go, line 950 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Really? Eww.

Is it this?

ev.ClearChecksum()

That seems like bad code we would want to go away. I wonder why we need to clear the checksum - doesn't the checksum match expVal?

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @nvanbenschoten, and @tbg)


pkg/kv/kvserver/node_liveness.go, line 814 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

shouldn't this one be the augmented one with the raw bytes?

That's what I wanted initially, but the thing is that this is called from the lease handling code, which has a storagepb.LeaseStatus. That guy is a proto (for no good reason) and currently doesn't include the raw bytes. I've considered changing that, but... I dunno. There's a bunch of stuff to change, and it's not really necessary given that this code doesn't set ignoreCache - so we check the cache anyway, which has the raw bytes.
Thoughts?

@andreimatei andreimatei force-pushed the cleanup.liveness-cput branch from cb0f02a to 8d275dc Compare May 19, 2020 18:31
Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Unsolicited) LGTM. Was looking around changing the liveness proto for random reasons and stumbled by this.

Reviewed 3 of 10 files at r1, 7 of 7 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @nvanbenschoten, and @tbg)

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @nvanbenschoten, @petermattis, and @tbg)


pkg/kv/kvserver/node_liveness.go, line 182 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

nit: just LivenessRecord?

done


pkg/kv/kvserver/node_liveness.go, line 324 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Remove or uncomment this? (remove, I think)

done


pkg/kv/kvserver/node_liveness.go, line 408 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

how about "updated"? Using keywords (new) tickles my OCD. It also tends to throw off syntax highlighters, as it does in reviewable.

Done.


pkg/kv/kvserver/node_liveness.go, line 419 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

oldRaw

Done.


pkg/kv/kvserver/node_liveness.go, line 777 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

+1

Done.


pkg/kv/kvserver/node_liveness.go, line 950 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Is it this?

ev.ClearChecksum()

That seems like bad code we would want to go away. I wonder why we need to clear the checksum - doesn't the checksum match expVal?

Yeah, this has bothered me greatly too. I spent some time digging.
So Batch.CPut() takes an expected Value. The value has a checksum, which includes the key corresponding to the respective key-value pair. So, if you read val from k1 and then try to used it as the expected value for key k2, the checksum needs to change.
At the moment, as we've seen, the CPut code wipes the checksum that was given to it and recomputes it. I've tried to change it to deal with already-populated checksums by asserting that they're correct for the key currently being written - so then the contract would be that if you pass a value with a filled-in checksum to cput as the expected val, then it must have been read from the key you're currently trying to write. The caller would have the option to clear the checksum, in which case the CPut code will compute a good one.

This hit a problem because the liveness record values we're using for these CPuts often come from gossip, and gossip, alas, uses different damn keys when it broadcasts the liveness records (different from the keys where the records are in the db). So, in order to keep these gossiped liveness records immutable, we'd have to recompute their checksums when a node receives them. Which is doable, but pretty gross too.

So I'm taking opinions about what the right thing to do here is. Is a *roachpb.Value even the right type for the expected val in the CPut interface, or should it be simply []byte (with no checksum)? The *roachpb.Value has the advantage that it has a nice interface for encoding any type into it.

@petermattis I'm also curious if you have any insight into why gossip uses its own keys, different from the database keys (i.e. stuff in gossip/keys.go).

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @nvanbenschoten, @petermattis, and @tbg)


pkg/kv/kvserver/node_liveness.go, line 950 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Yeah, this has bothered me greatly too. I spent some time digging.
So Batch.CPut() takes an expected Value. The value has a checksum, which includes the key corresponding to the respective key-value pair. So, if you read val from k1 and then try to used it as the expected value for key k2, the checksum needs to change.
At the moment, as we've seen, the CPut code wipes the checksum that was given to it and recomputes it. I've tried to change it to deal with already-populated checksums by asserting that they're correct for the key currently being written - so then the contract would be that if you pass a value with a filled-in checksum to cput as the expected val, then it must have been read from the key you're currently trying to write. The caller would have the option to clear the checksum, in which case the CPut code will compute a good one.

This hit a problem because the liveness record values we're using for these CPuts often come from gossip, and gossip, alas, uses different damn keys when it broadcasts the liveness records (different from the keys where the records are in the db). So, in order to keep these gossiped liveness records immutable, we'd have to recompute their checksums when a node receives them. Which is doable, but pretty gross too.

So I'm taking opinions about what the right thing to do here is. Is a *roachpb.Value even the right type for the expected val in the CPut interface, or should it be simply []byte (with no checksum)? The *roachpb.Value has the advantage that it has a nice interface for encoding any type into it.

@petermattis I'm also curious if you have any insight into why gossip uses its own keys, different from the database keys (i.e. stuff in gossip/keys.go).

There's also the option of having the CPut code clear the checksum without a fuss if it's not correct for the key.

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @nvanbenschoten, @petermattis, and @tbg)


pkg/kv/kvserver/node_liveness.go, line 950 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

There's also the option of having the CPut code clear the checksum without a fuss if it's not correct for the key.

I've discussed this offline with @bdarnell; I'm going to change the CPut interface to simply take a read-only []byte for the expected val, so that we don't need to worry about checksums any more.

@andreimatei andreimatei force-pushed the cleanup.liveness-cput branch from 8d275dc to 80686e8 Compare May 21, 2020 21:52
Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @irfansharif, @nvanbenschoten, @petermattis, and @tbg)


pkg/kv/kvserver/node_liveness.go, line 871 at r4 (raw file):

// cache.
func (nl *NodeLiveness) updateLiveness(
	ctx context.Context, update livenessUpdate, handleCondFailed func(actual LivenessRecord) error,

Does this handleCondFailed callback need a LivenessRecord type, or could it to with a vanilla storagepb.Liveness? I don't think the callers use the raw bytes.

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @irfansharif, @nvanbenschoten, @petermattis, and @tbg)


pkg/kv/kvserver/node_liveness.go, line 950 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I've discussed this offline with @bdarnell; I'm going to change the CPut interface to simply take a read-only []byte for the expected val, so that we don't need to worry about checksums any more.

The change changing the CPut API is #50408. Let's agree on that one and come back.

@andreimatei andreimatei force-pushed the cleanup.liveness-cput branch 3 times, most recently from 79d5435 to 951bd0e Compare June 26, 2020 19:19
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @irfansharif, @nvanbenschoten, @petermattis, and @tbg)


pkg/kv/kvserver/node_liveness.go, line 871 at r4 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Does this handleCondFailed callback need a LivenessRecord type, or could it to with a vanilla storagepb.Liveness? I don't think the callers use the raw bytes.

they do; the callback calls maybeUpdate() (not new).
I've tried to make the callbacks not call maybeUpdate(), and instead return the actual from updateLiveness() (since that caller needs to sometimes call maybeUpdate() anyway), but it gets a bit gnarly because the callback can return errors too, and then it becomes unclear when the caller of updateLiveness should call maybeUpdate and when it shouldn't. So I've left updateLiveness() return an empty LivenessRecord when handleCondFailed() is called.

@andreimatei andreimatei force-pushed the cleanup.liveness-cput branch from 951bd0e to 6733b9e Compare June 27, 2020 00:21
Release note: None
The comment on UpdateLiveness claimed that the handleCondFailed() arg is
optional, but it wasn't (not if ignoreCache was not set, at least). All
callers pass that arg, so this patch removes the half-assed attempt at
having it be optional.

Release note: None
Fixes cockroachdb#38308

Updates to the liveness records are done as CPuts. Before this patch,
the CPuts' expected value was the re-marshaled proto with the previous
version. As cockroachdb#38308 explains, that's a bad practice since it prevents the
proto's encoding to change in any way (e.g. fields can't be removed).
Instead, this patch moves to keeping track of the raw bytes that have
been read from the DB, besides the unmarshalled liveness protos. The raw
bytes are used as the expected values.

Release note: None
@andreimatei andreimatei force-pushed the cleanup.liveness-cput branch from 6733b9e to 9c0dce9 Compare June 29, 2020 14:33
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @irfansharif, @nvanbenschoten, @petermattis, and @tbg)

@craig
Copy link
Contributor

craig bot commented Jun 29, 2020

Build succeeded

@craig craig bot merged commit 74d7021 into cockroachdb:master Jun 29, 2020
@andreimatei andreimatei deleted the cleanup.liveness-cput branch June 29, 2020 21:58
craig bot pushed a commit that referenced this pull request Jul 11, 2020
50329:  kvserver,cli,roachtest,sql: introduce a fully decommissioned bit r=tbg a=irfansharif

Ignore first commit, cherry-picked from @andreimatei's #49135.

---

This commit introduces a fully decommissioned bit to Cockroach.
Previously our Liveness schema only contained a decommissioning bool,
with consequently no ability to disamiguate between a node currently
undergoing decommissioning, and a node that was fully decommissioned. We
used some combination of store dead threshold to surface, in our UI,
"fully decommissioned" nodes, but it was never quite so. We need this
specificity for the Connect RPC.

We wire up the new CommissionStatus enum that's now part of the liveness
record. In doing so it elides usage of the decommissioning bool used in
v20.1. We're careful to maintain an on-the-wire representation of the
Liveness record that will be understood by v20.1 nodes (see
Liveness.EnsureCompatible for details).

We repurpose the AdminServer.Decommission RPC (which is now a misnomer,
should be thought of being named Commission instead) to persist
CommissionStatus states to KV through the lifetime of a node
decommissioning/recommissioning. See cli/node.go for where that's done.
For recommissioning a node, it suffices to simply persist a COMMISSIONED
state. When decommissioning a node, since it's a longer running process,
we first persist an in-progress DECOMMISSIONING state, and once we've
moved off all the Replicas in the node, we finalize the decommissioning
process by persisting the DECOMMISSIONED state.

When transitioning between CommissionStatus states, we CPut against
what's already there, disallowing illegal state transitions. The
appropriate error codes are surfaced back to the user. An example would
be in attempting to recommission a fully decommissioned node, in which
case we'd error out with the following:

> command failed: illegal commission status transition: can only
> recommission a decommissioning node; n4 found to be decommissioned

Note that this is a behavioral change for `cockroach node recommission`.
Previously it was able to recommission any "decommissioned" node,
regardless of how long ago it's was removed from the cluster. Now
recommission serves to only cancel an accidental decommissioning process
that wasn't finalized.

The 'decommissioning' column in crdb_internal.gossip_liveness is now
powered by this new CommissionStatus, and we introduce a new
commission_status column to it. We also introduce the same column to
the output generated by `cockroach node status --decommission`. The
is_decommissioning column still exists, but is also powered by the
CommissionStatus now.

We also iron out the events plumbed into system.eventlog: it now has a
dedicated event for "node decommissioning".

Release note (general change): `cockroach node recommission` has new
semantics.Previously it was able to recommission any decommissioned node,
regardless of how long ago it's was fully removed from the cluster. Now
recommission serves to only cancel an accidental decommissioning process
that wasn't finalized.

Release note (cli change): We introduce a commission_status column to
the output generated by `cockroach node status --decommission`. It
should be used in favor of the is_decommissioning column going forward.

Co-authored-by: irfan sharif <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stop using protos as the expected value in CPut
5 participants