-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: always create a liveness record before starting up #53842
Conversation
07ed6be
to
2f908bd
Compare
+cc @tbg, mind taking a look here as well? I'm still not sure if there's a simpler workaround for the --locality-advertise-addr issue, I feel like I must be missing something obvious. Though that said, writing to the store directly during bootstrap would work around that problem if nothing else is possible. See comments in the branch around epoch={0,2}, wanted your thoughts there. |
We probably don't care about having the liveness |
2f908bd
to
b976f92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This came out well!
Reviewed 8 of 8 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif)
pkg/kv/kvserver/node_liveness.go, line 461 at r1 (raw file):
// records after starting up, and incrementing to epoch=2 when doing so, at // which point we'll set an appropriate expiration timestamp, gossip the // liveness record, and update our in-memory representation of it.
Add/reword that an existing liveness record would not be overwritten but instead an error returned from this method.
pkg/kv/kvserver/node_liveness_test.go, line 174 at r1 (raw file):
defer tc.Stopper().Stop(ctx) // Verify liveness records exist for all nodes.
So at this point, something in StartTestCluster
has waited for all nodes to become live (otherwise this test is just flaky)? Add a comment.
pkg/server/node.go, line 383 at r1 (raw file):
// We're joining via gossip, so we don't have a liveness record for // ourselves yet. Let's create one while here. if err := n.storeCfg.NodeLiveness.CreateLivenessRecord(ctx, nodeID); err != nil {
Just reasoning this out to myself: we're not in the case in which "we are the cluster" because that only happens when this is the node that just got bootstrapped, but in that case it has a nodeID.
pkg/server/node.go, line 1160 at r1 (raw file):
// We create a liveness record here for the joining node while here. This // way nodes are always guaranteed to have a liveness record present before // fully starting up.
Add that this is required for long-running migrations correctness. (This might be a place where folks in the future might muck with this stuff and not know why it was the way it is now in the first place).
e60d516
to
0187c16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! CI had failed on #54079, which is a bit strange (+ funny). I think something else is going on there, and I'm curious to know what. I was also missing an update to some logic test.
I'm a bit sad about not being able to use epoch=0, for no good reason. I think it's still possible to do so safely. I'll merge this when green, and might try lowering it again on master
alone just to get back to the previous behavior.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @tbg)
pkg/kv/kvserver/node_liveness.go, line 461 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Add/reword that an existing liveness record would not be overwritten but instead an error returned from this method.
Done.
pkg/kv/kvserver/node_liveness_test.go, line 174 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
So at this point, something in
StartTestCluster
has waited for all nodes to become live (otherwise this test is just flaky)? Add a comment.
Done.
pkg/server/node.go, line 1160 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Add that this is required for long-running migrations correctness. (This might be a place where folks in the future might muck with this stuff and not know why it was the way it is now in the first place).
Done.
0187c16
to
ca1fd4e
Compare
bors r+ |
bors r- |
Canceled. |
ca1fd4e
to
6380a7a
Compare
bors r+ |
Build failed (retrying...): |
bors r- |
Canceled. |
6380a7a
to
8963e89
Compare
Hm, looks like I'll actually need to go with the epoch=0 initial write. Here's why: the existing node liveness tests use multiTestContext and all its quirks (I made use of TestCluster in the test I just added). One of the quirks is that it makes do without using Servers. Which also means it makes do without using the join RPC added in #52526. As a result, all nodes except for the bootstrap node will not actually have a liveness record persisted in KV. Let's say we write our initial liveness record at epoch=1. For the bootstrap node, given the NodeLiveness component starts off knowing about this liveness record, we'll increment it to epoch=2 during our first heartbeat. cockroach/pkg/kv/kvserver/node_liveness.go Lines 761 to 763 in d0ead9a
For all other nodes, given we don't already have a liveness record persisted, after our first heartbeat we'll find ourselves at epoch=1. cockroach/pkg/kv/kvserver/node_liveness.go Lines 756 to 758 in d0ead9a
The tests that want to assert that epoch=1 for all liveness records, without having seen restarts, will not be too happy about this discrepancy. cockroach/pkg/kv/kvserver/node_liveness_test.go Lines 152 to 154 in d0ead9a
cockroach/pkg/kv/kvserver/node_liveness_test.go Lines 613 to 615 in d0ead9a
This is another instance where having #8299 would have come in handy. I went ahead and kept the initial write to be at epoch=0 so we retain the previous epoch=1 pattern for very first start. That way at least the node liveness tests that make use of multiTestContext pass. As for the changes introduced in this PR, TestNodeLivenessAppearsAtStart sufficiently exercises those code paths. I can try and follow up this month/next with migrating some of these tests over to use TestCluster (partially addressing #8299), but I'm inclined to not hold this PR up for it. I'll bors it once you take another look, @tbg. |
2443dd6
to
64c3b57
Compare
I suppose it's fine for me to manually write liveness records as part of multiTestContext.Start, which I just did. Doh. I still prefer starting off at epoch=0. I'm not too worried about the zero value of the liveness record because it isn't the zero value, the node ID is set. My motivation is primarily to retain epoch=1 semantics that we have today. But I can change it out again if asked. |
I'm taking a look now, but already wanted to let you know that I'm fine writing at epoch zero. My initial inclination to avoid that was because you mentioned something tricky resulted from it (or so I remember). If that is not the case - perfect, epoch zero it is. Looking now through the code though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @tbg)
pkg/kv/kvserver/node_liveness.go, line 797 at r2 (raw file):
Previously, irfansharif (irfan sharif) wrote…
I don't think there are any migration concerns. By this phrasing I meant that this NodeLiveness guy hasn't yet discovered its liveness record (so the in-memory cache of liveness records needs updating).
In fact, if we did manually initialize here, it would actually do the same thing as we're doing explicitly. If we declared our oldLiveness to be empty, in trying to create a newLiveness our CPut attempt on the record below would fail. That would end up updating our in-memory cache with what was already found within KV, after which we'd have to retry anyway. I think writing it out in the way I did is a bit clearer (to me).
Separately, I think we actually want the error here to act as an assertion. Between us persisting a record during bootstrap, through the join rpc, and gossip code paths, there really should never be a missing liveness record once we get here. I'm inclined to let this bake on master for a bit as is to see what shakes out, and backporting to 20.2 after.
If a 20.1 node joins the cluster (at 20.1) and gets killed before persisting its liveness record, and then the cluster starts running 20.2 binaries, and the node comes back with the code of this PR, won't it eternally be unable to heartbeat itself? It's not going to get its liveness record created by the join rpc (it's not joining) nor bootstrap. I agree that this is sort of rare but if it's possible - we ought to handle it. What am I missing?
Good catch, hopefully the last time we (I) have to think about missing liveness records. Added it in #54216. |
In cockroachdb#53842 we introduced a change to always persist a liveness record on start up. As part of that change, we refactored how the liveness heartbeat codepath dealt with missing liveness records: it knew to fetch it from KV given we were now maintaining the invariant that it would always be present. Except that wasn't necessarily true, as demonstrated by the following scenario: ``` // - v20.1 node gets added to v20.1 cluster, and is quickly removed // before being able to persist its liveness record. // - The cluster is upgraded to v20.2. // - The node from earlier is rolled into v20.2, and re-added to the // cluster. // - It's never able to successfully heartbeat (it didn't join // through the join rpc, bootstrap, or gossip). Welp. ``` Though admittedly unlikely, we should handle it all the same instead of simply erroring out. We'll just fall back to creating the liveness record in-place as we did in v20.1 code. We can remove this fallback in 21.1 code. Release note: None
In cockroachdb#53842 we introduced a change to always persist a liveness record on start up. As part of that change, we refactored how the liveness heartbeat codepath dealt with missing liveness records: it knew to fetch it from KV given we were now maintaining the invariant that it would always be present. Except that wasn't necessarily true, as demonstrated by the following scenario: ``` // - v20.1 node gets added to v20.1 cluster, and is quickly removed // before being able to persist its liveness record. // - The cluster is upgraded to v20.2. // - The node from earlier is rolled into v20.2, and re-added to the // cluster. // - It's never able to successfully heartbeat (it didn't join // through the join rpc, bootstrap, or gossip). Welp. ``` Though admittedly unlikely, we should handle it all the same instead of simply erroring out. We'll just fall back to creating the liveness record in-place as we did in v20.1 code. We can remove this fallback in 21.1 code. Release note: None
54216: kvserver: address migration concern with node liveness r=irfansharif a=irfansharif In #53842 we introduced a change to always persist a liveness record on start up. As part of that change, we refactored how the liveness heartbeat codepath dealt with missing liveness records: it knew to fetch it from KV given we were now maintaining the invariant that it would always be present. Except that wasn't necessarily true, as demonstrated by the following scenario: ``` // - v20.1 node gets added to v20.1 cluster, and is quickly removed // before being able to persist its liveness record. // - The cluster is upgraded to v20.2. // - The node from earlier is rolled into v20.2, and re-added to the // cluster. // - It's never able to successfully heartbeat (it didn't join // through the join rpc, bootstrap, or gossip). Welp. ``` Though admittedly unlikely, we should handle it all the same instead of simply erroring out. We'll just fall back to creating the liveness record in-place as we did in v20.1 code. We can remove this fallback in 21.1 code. --- First commit is from #54224. Release note: None Co-authored-by: irfan sharif <[email protected]>
In cockroachdb#53842 we introduced a change to always persist a liveness record on start up. As part of that change, we refactored how the liveness heartbeat codepath dealt with missing liveness records: it knew to fetch it from KV given we were now maintaining the invariant that it would always be present. Except that wasn't necessarily true, as demonstrated by the following scenario: ``` // - v20.1 node gets added to v20.1 cluster, and is quickly removed // before being able to persist its liveness record. // - The cluster is upgraded to v20.2. // - The node from earlier is rolled into v20.2, and re-added to the // cluster. // - It's never able to successfully heartbeat (it didn't join // through the join rpc, bootstrap, or gossip). Welp. ``` Though admittedly unlikely, we should handle it all the same instead of simply erroring out. We'll just fall back to creating the liveness record in-place as we did in v20.1 code. We can remove this fallback in 21.1 code. Release note: None
Now that we have cockroachdb#53842, we maintain the invariant that there always exists a liveness record for any given node. We can now simplify our handling of liveness records internally: where previously we had code to handle the possibility of empty liveness records (we created a new one on the fly), we can change them to assertions to verify that's no longer possible. When retrieving the liveness record from our in-memory cache, it's possible for us to not find anything due to gossip delays. Instead of simply giving up then, now we can read the records directly from KV (and update our caches while in the area). This PR introduces this mechanism through usage of `getLivenessRecordFromKV`. Finally, as a bonus, this PR also surfaces a better error when trying to decommission non-existent nodes. We're able to do this because now we can always assume that a missing liveness record, as seen in the decommission codepath, implies that the user is trying to decommission a non-existent node. --- We don't intend to backport this to 20.2 due to the hazard described in \cockroachdb#54216. We want this PR to bake on master and (possibly) trip up the assertions added above if we've missed anything. They're the only ones checking for the invariant we've introduced around liveness records. That invariant will be depended on for long running migrations, so better to shake things out early. Release note: None
Now that we have cockroachdb#53842, we maintain the invariant that there always exists a liveness record for any given node. We can now simplify our handling of liveness records internally: where previously we had code to handle the possibility of empty liveness records (we created a new one on the fly), we can change them to assertions to verify that's no longer possible. When retrieving the liveness record from our in-memory cache, it's possible for us to not find anything due to gossip delays. Instead of simply giving up then, now we can read the records directly from KV (and update our caches while in the area). This PR introduces this mechanism through usage of `getLivenessRecordFromKV`. Finally, as a bonus, this PR also surfaces a better error when trying to decommission non-existent nodes. We're able to do this because now we can always assume that a missing liveness record, as seen in the decommission codepath, implies that the user is trying to decommission a non-existent node. --- We don't intend to backport this to 20.2 due to the hazard described in \cockroachdb#54216. We want this PR to bake on master and (possibly) trip up the assertions added above if we've missed anything. They're the only ones checking for the invariant we've introduced around liveness records. That invariant will be depended on for long running migrations, so better to shake things out early. Release note: None
Now that we have cockroachdb#53842, we maintain the invariant that there always exists a liveness record for any given node. We can now simplify our handling of liveness records internally: where previously we had code to handle the possibility of empty liveness records (we created a new one on the fly), we change them to assertions that verify that empty liveness records are no longer flying around in the system. When retrieving the liveness record from our in-memory cache, it was possible for us to not find anything due to gossip delays. Instead of simply giving up then, now we can read the records directly from KV (and evebtually update our caches to store this newly read record). This PR introduces this mechanism through usage of `getLivenessRecordFromKV`. We should note that the existing cache structure within NodeLiveness is a look-aside cache, and that's not changed. It would further simplify things if it was a look-through cache where the update happened while fetching any record and failing to find it, but we defer that to future work. A TODO outlining this will be introduced in a future commit. A note for ease of review: one structural change introduced in this diff is breaking down `ErrNoLivenessRecord` into `ErrMissingLivenessRecord` and `errLivenessRecordCacheMiss`. The former will be used in a future commit to generate better hints for users (it'll only ever surface when attempting to decommission/recommission non-existent nodes). The latter is used to represent cache misses. This too will be improved in a future commit, where instead of returning a specific error on cache access, we'll return a boolean instead. --- We don't intend to backport this to 20.2 due to the hazard described in \cockroachdb#54216. We want this PR to bake on master and (possibly) trip up the assertions added above if we've missed anything. They're the only ones checking for the invariant we've introduced around liveness records. That invariant will be depended on for long running migrations, so better to shake things out early. Release note: None
Now that we have cockroachdb#53842, we maintain the invariant that there always exists a liveness record for any given node. We can now simplify our handling of liveness records internally: where previously we had code to handle the possibility of empty liveness records (we created a new one on the fly), we change them to assertions that verify that empty liveness records are no longer flying around in the system. When retrieving the liveness record from our in-memory cache, it was possible for us to not find anything due to gossip delays. Instead of simply giving up then, now we can read the records directly from KV (and evebtually update our caches to store this newly read record). This PR introduces this mechanism through usage of `getLivenessRecordFromKV`. We should note that the existing cache structure within NodeLiveness is a look-aside cache, and that's not changed. It would further simplify things if it was a look-through cache where the update happened while fetching any record and failing to find it, but we defer that to future work. A TODO outlining this will be introduced in a future commit. A note for ease of review: one structural change introduced in this diff is breaking down `ErrNoLivenessRecord` into `ErrMissingLivenessRecord` and `errLivenessRecordCacheMiss`. The former will be used in a future commit to generate better hints for users (it'll only ever surface when attempting to decommission/recommission non-existent nodes). The latter is used to represent cache misses. This too will be improved in a future commit, where instead of returning a specific error on cache access, we'll return a boolean instead. --- We don't intend to backport this to 20.2 due to the hazard described in \cockroachdb#54216. We want this PR to bake on master and (possibly) trip up the assertions added above if we've missed anything. They're the only ones checking for the invariant we've introduced around liveness records. That invariant will be depended on for long running migrations, so better to shake things out early. Release note: None
Now that we have cockroachdb#53842, we maintain the invariant that there always exists a liveness record for any given node. We can now simplify our handling of liveness records internally: where previously we had code to handle the possibility of empty liveness records (we created a new one on the fly), we change them to assertions that verify that empty liveness records are no longer flying around in the system. When retrieving the liveness record from our in-memory cache, it was possible for us to not find anything due to gossip delays. Instead of simply giving up then, now we can read the records directly from KV (and evebtually update our caches to store this newly read record). This PR introduces this mechanism through usage of `getLivenessRecordFromKV`. We should note that the existing cache structure within NodeLiveness is a look-aside cache, and that's not changed. It would further simplify things if it was a look-through cache where the update happened while fetching any record and failing to find it, but we defer that to future work. A TODO outlining this will be introduced in a future commit. A note for ease of review: one structural change introduced in this diff is breaking down `ErrNoLivenessRecord` into `ErrMissingLivenessRecord` and `errLivenessRecordCacheMiss`. The former will be used in a future commit to generate better hints for users (it'll only ever surface when attempting to decommission/recommission non-existent nodes). The latter is used to represent cache misses. This too will be improved in a future commit, where instead of returning a specific error on cache access, we'll return a boolean instead. --- We don't intend to backport this to 20.2 due to the hazard described in \cockroachdb#54216. We want this PR to bake on master and (possibly) trip up the assertions added above if we've missed anything. They're the only ones checking for the invariant we've introduced around liveness records. That invariant will be depended on for long running migrations, so better to shake things out early. Release note: None
Now that we have cockroachdb#53842, we maintain the invariant that there always exists a liveness record for any given node. We can now simplify our handling of liveness records internally: where previously we had code to handle the possibility of empty liveness records (we created a new one on the fly), we change them to assertions that verify that empty liveness records are no longer flying around in the system. When retrieving the liveness record from our in-memory cache, it was possible for us to not find anything due to gossip delays. Instead of simply giving up then, now we can read the records directly from KV (and evebtually update our caches to store this newly read record). This PR introduces this mechanism through usage of `getLivenessRecordFromKV`. We should note that the existing cache structure within NodeLiveness is a look-aside cache, and that's not changed. It would further simplify things if it was a look-through cache where the update happened while fetching any record and failing to find it, but we defer that to future work. A TODO outlining this will be introduced in a future commit. A note for ease of review: one structural change introduced in this diff is breaking down `ErrNoLivenessRecord` into `ErrMissingLivenessRecord` and `errLivenessRecordCacheMiss`. The former will be used in a future commit to generate better hints for users (it'll only ever surface when attempting to decommission/recommission non-existent nodes). The latter is used to represent cache misses. This too will be improved in a future commit, where instead of returning a specific error on cache access, we'll return a boolean instead. --- We don't intend to backport this to 20.2 due to the hazard described in \cockroachdb#54216. We want this PR to bake on master and (possibly) trip up the assertions added above if we've missed anything. They're the only ones checking for the invariant we've introduced around liveness records. That invariant will be depended on for long running migrations, so better to shake things out early. Release note: None
54544: kvserver: add assertions for invariants around liveness records r=irfansharif a=irfansharif Now that we have #53842, we maintain the invariant that there always exists a liveness record for any given node. We can now simplify our handling of liveness records internally: where previously we had code to handle the possibility of empty liveness records (we created a new one on the fly), we can change them to assertions to verify that's no longer possible. When retrieving the liveness record from our in-memory cache, it's possible for us to not find anything due to gossip delays. Instead of simply giving up then, now we can read the records directly from KV (and update our caches while in the area). This PR introduces this mechanism through usage of `getLivenessRecordFromKV`. Finally, as a bonus, this PR also surfaces a better error when trying to decommission non-existent nodes. We're able to do this because now we can always assume that a missing liveness record, as seen in the decommission codepath, implies that the user is trying to decommission a non-existent node. --- We don't intend to backport this to 20.2 due to the hazard described in \#54216. We want this PR to bake on master and (possibly) trip up the assertions added above if we've missed anything. They're the only ones checking for the invariant we've introduced around liveness records. That invariant will be depended on for long running migrations, so better to shake things out early. Release note: None 54812: docker: Base the docker image on RedHat UBI r=bdarnell,DuskEagle a=jlinder Before: The docker image was based on Debian 9.12 slim. Why: This change will help on-prem customers from a security and compliance perspective. It also aligns with our publishing images into the RedHat Marketplace. Now: Published docker images are based on the RedHat UBI 8 base image. Fixes: #49643 Release note (backward-incompatible change): CockroachDB Docker images are now based on the RedHat ubi8/ubi base image instead of Debian 9.12 slim. This will help on-prem customers from a security and compliance perspective. Co-authored-by: irfan sharif <[email protected]> Co-authored-by: James H. Linder <[email protected]>
Now that we have #53842, we maintain the invariant that there always exists a liveness record for any given node. We can now simplify our handling of liveness records internally: where previously we had code to handle the possibility of empty liveness records (we created a new one on the fly), we change them to assertions that verify that empty liveness records are no longer flying around in the system. When retrieving the liveness record from our in-memory cache, it was possible for us to not find anything due to gossip delays. Instead of simply giving up then, now we can read the records directly from KV (and evebtually update our caches to store this newly read record). This PR introduces this mechanism through usage of `getLivenessRecordFromKV`. We should note that the existing cache structure within NodeLiveness is a look-aside cache, and that's not changed. It would further simplify things if it was a look-through cache where the update happened while fetching any record and failing to find it, but we defer that to future work. A TODO outlining this will be introduced in a future commit. A note for ease of review: one structural change introduced in this diff is breaking down `ErrNoLivenessRecord` into `ErrMissingLivenessRecord` and `errLivenessRecordCacheMiss`. The former will be used in a future commit to generate better hints for users (it'll only ever surface when attempting to decommission/recommission non-existent nodes). The latter is used to represent cache misses. This too will be improved in a future commit, where instead of returning a specific error on cache access, we'll return a boolean instead. --- We don't intend to backport this to 20.2 due to the hazard described in \#54216. We want this PR to bake on master and (possibly) trip up the assertions added above if we've missed anything. They're the only ones checking for the invariant we've introduced around liveness records. That invariant will be depended on for long running migrations, so better to shake things out early. Release note: None
Since cockroachdb#53842 we write an initial liveness record with a zero timestamp. This sometimes shows up as the following flake. ``` logic.go:2283: testdata/logic_test/crdb_internal:356: SELECT node_id, regexp_replace(epoch::string, '^\d+$', '<epoch>') as epoch, regexp_replace(expiration, '^\d+\.\d+,\d+$', '<timestamp>') as expiration, draining, decommissioning, membership FROM crdb_internal.gossip_liveness WHERE node_id = 1 expected: node_id epoch expiration draining decommissioning membership 1 <epoch> <timestamp> false false active but found (query options: "colnames") : node_id epoch expiration draining decommissioning membership 1 <epoch> 0,0 false false active ``` Release note: None
Previously it used to be the case that it was possible for a node to be
up and running, and for there to be no corresponding liveness record for
it. This was a very transient situation as liveness records are created
for a given node as soon as it out its first heartbeat. Still, given
that this could take a few seconds, it lent to a lot of complexity in
our handling of node liveness where we had to always anticipate the
possibility of there being no corresponding liveness record for a given
node (and thus creating it if necessary).
Having a liveness record for each node always present is a crucial
building block for long running migrations (#48843). There the intention
is to have the orchestrator process look towards the list of liveness
records for an authoritative view of cluster membership. Previously when
it was possible for an active member of the cluster to not have a
corresponding liveness record (no matter how unlikely or short-lived in
practice), we could not generate such a view.
This is an alternative implementation for #53805. Here we choose to
manually write the liveness record for the bootstrapping node when
writing initial cluster data. For all other nodes, we do it on the
server-side of the join RPC. We're also careful to do it in the legacy
codepath when joining a cluster through gossip.
Release note: None