-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: always create a liveness record before starting up #53805
server: always create a liveness record before starting up #53805
Conversation
Deferred doing this in cockroachdb#52526. Probably a good idea to do have it, it'll bring down the cluster convergence time (time taken for all nodes to find out about the initialization) by a bit. Release justification: low risk, high benefit changes to existing functionality Release note: None
@tbg, I'm a bit confused. I'm not sure why why this wasn't possible before. The only thing I can think of to make us not want to to this is the following scenario:
While shitty, this is also a bug that already exists today. I've actually seen it happen in https://github.com/cockroachlabs/support/issues/584 where a node was quickly added+removed before it was able to persist a liveness record for itself. So that said, is there another reason why we shouldn't do what we're doing in this PR? |
Previously it used to be the case that it was possible for a node to be up and running, and for there to be no corresponding liveness record for it. This was a very transient situation as liveness records are created for a given node as soon as it out its first heartbeat. Still, given that this could take a few seconds, it lent to a lot of complexity in our handling of node liveness where we had to always anticipate the possibility of there being no corresponding liveness record for a given node (and thus creating it if necessary). It was originally thought we'd only be able to guarantee persisting a liveness record for a joining node if we had infrastructure like the join rpc, introduced in cockroachdb#52526, but staring at it a bit harder we can simply ensure we heartbeat our liveness record at least once before fully spinning up our server (aka opening the RPC floodgates). Having a liveness record for each node always present is a crucial building block for long running migrations (cockroachdb#48843). There the intention is to have the orchestrator process look towards the list of liveness records for an authoritative view of cluster membership. Previously when it was possible for an active member of the cluster to not have a corresponding liveness record (no matter how unlikely or short-lived in practice), we could not generate such a view. Release justification: low risk, high benefit changes to existing functionality Release note: None
c8d2132
to
ecc0abf
Compare
I'm not sure why I'm seeing gossip/locality-address fail with my changes. I suspect it's due to #42741, but I could be wrong. Edit: hm, looks like it's unable to send itself batch requests (needed to heartbeat) given it's not yet operational. For the following setup:
This works:
But this doesn't:
Is there some grpc magic "don't send an external rpc if headed for 127.0.0.1" that's coming into play here? TIL. I see that the following also works:
|
// Begin the node liveness heartbeat. Add a callback which records the | ||
// local store "last up" timestamp for every store whenever the liveness | ||
// record is updated. We're sure to do this before we open RPC | ||
// floodgates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not guaranteed since it's async, so add that it's best effort
// local store "last up" timestamp for every store whenever the liveness | ||
// record is updated. We're sure to do this before we open RPC | ||
// floodgates. | ||
var livenessOnce sync.Once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: don't leak livenessOnce
into main scope, do
livenessRecordCreated := make(chan struct{}, 1)
{
// all the other stuff
}
if initialStart { ... }
This PR is good and there's no good reason we haven't done this yet. But as you've found out, it also doesn't give us the invariant that we need, which is that the liveness record exists before the node can ever consider itself part of the cluster. I don't really understand the problem you're seeing regarding the "sending itself heartbeats". Until we're in modeOperational, the node can not receive replicas, so it shouldn't ever be consulted for anything. Gossip should be active by this point so the KV client should be able to route to the correct nodes. Can you post some more details?
Yes there is! See Line 481 in a3284b7
But if we rely on that for things working, there's a problem. Ah, I understand now - the issue is in the case in which this node is the one bootstrapping the local cluster. In that case, in the new code, we're trying to synchronously heartbeat the liveness record but the gRPC server does not yet allow RPCs. Ok, makes sense. Add a comment to that effect and simply move the liveness record stuff past the |
Previously it used to be the case that it was possible for a node to be up and running, and for there to be no corresponding liveness record for it. This was a very transient situation as liveness records are created for a given node as soon as it out its first heartbeat. Still, given that this could take a few seconds, it lent to a lot of complexity in our handling of node liveness where we had to always anticipate the possibility of there being no corresponding liveness record for a given node (and thus creating it if necessary). Having a liveness record for each node always present is a crucial building block for long running migrations (cockroachdb#48843). There the intention is to have the orchestrator process look towards the list of liveness records for an authoritative view of cluster membership. Previously when it was possible for an active member of the cluster to not have a corresponding liveness record (no matter how unlikely or short-lived in practice), we could not generate such a view. --- This is an alternative implementation for cockroachdb#53805. Here we choose to manually write the liveness record for the bootstrapping node when writing initial cluster data. For all other nodes, we do it on the server-side of the join RPC. We're also careful to do it in the legacy codepath when joining a cluster through gossip. Release note: None
Abandoning this PR in favor of #53842. |
Previously it used to be the case that it was possible for a node to be up and running, and for there to be no corresponding liveness record for it. This was a very transient situation as liveness records are created for a given node as soon as it out its first heartbeat. Still, given that this could take a few seconds, it lent to a lot of complexity in our handling of node liveness where we had to always anticipate the possibility of there being no corresponding liveness record for a given node (and thus creating it if necessary). Having a liveness record for each node always present is a crucial building block for long running migrations (cockroachdb#48843). There the intention is to have the orchestrator process look towards the list of liveness records for an authoritative view of cluster membership. Previously when it was possible for an active member of the cluster to not have a corresponding liveness record (no matter how unlikely or short-lived in practice), we could not generate such a view. --- This is an alternative implementation for cockroachdb#53805. Here we choose to manually write the liveness record for the bootstrapping node when writing initial cluster data. For all other nodes, we do it on the server-side of the join RPC. We're also careful to do it in the legacy codepath when joining a cluster through gossip. Release note: None
Previously it used to be the case that it was possible for a node to be up and running, and for there to be no corresponding liveness record for it. This was a very transient situation as liveness records are created for a given node as soon as it out its first heartbeat. Still, given that this could take a few seconds, it lent to a lot of complexity in our handling of node liveness where we had to always anticipate the possibility of there being no corresponding liveness record for a given node (and thus creating it if necessary). Having a liveness record for each node always present is a crucial building block for long running migrations (cockroachdb#48843). There the intention is to have the orchestrator process look towards the list of liveness records for an authoritative view of cluster membership. Previously when it was possible for an active member of the cluster to not have a corresponding liveness record (no matter how unlikely or short-lived in practice), we could not generate such a view. --- This is an alternative implementation for cockroachdb#53805. Here we choose to manually write the liveness record for the bootstrapping node when writing initial cluster data. For all other nodes, we do it on the server-side of the join RPC. We're also careful to do it in the legacy codepath when joining a cluster through gossip. Release note: None
53842: server: always create a liveness record before starting up r=irfansharif a=irfansharif Previously it used to be the case that it was possible for a node to be up and running, and for there to be no corresponding liveness record for it. This was a very transient situation as liveness records are created for a given node as soon as it out its first heartbeat. Still, given that this could take a few seconds, it lent to a lot of complexity in our handling of node liveness where we had to always anticipate the possibility of there being no corresponding liveness record for a given node (and thus creating it if necessary). Having a liveness record for each node always present is a crucial building block for long running migrations (#48843). There the intention is to have the orchestrator process look towards the list of liveness records for an authoritative view of cluster membership. Previously when it was possible for an active member of the cluster to not have a corresponding liveness record (no matter how unlikely or short-lived in practice), we could not generate such a view. --- This is an alternative implementation for #53805. Here we choose to manually write the liveness record for the bootstrapping node when writing initial cluster data. For all other nodes, we do it on the server-side of the join RPC. We're also careful to do it in the legacy codepath when joining a cluster through gossip. Release note: None 53994: cli: allow SQL commands to use password authn in more cases r=bdarnell,irfansharif,aaron-crl a=knz First two commits from #53991. Previously, SQL password authn was only allowed over TLS connections. With this change, password authn is allowed regardless of whether the connection uses TLS. This is implemented by also only asking for a password interactively the first time that the server complains that pw auth has failed. This way, no password is ever requested interactively if the server "trusts" the connection (via HBA rules or `--insecure`). Release justification: low risk, high benefit changes to existing functionality 54035: sql: emit more tracing events from the stats cache r=RaduBerinde a=RaduBerinde The stats cache has various "slow" paths (where we need to query the system table). These are currently only logged if verbosity is high. This change switches to `VEvent` in most cases, so that these are visible during tracing (including in statement diagnostics bundles). This will allow us to diagnose slow planning times, e.g. due to the stats cache getting full. Release justification: low-risk change to existing functionality, high potential benefit for debugging issues. Release note: None Co-authored-by: irfan sharif <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Radu Berinde <[email protected]>
Previously it used to be the case that it was possible for a node to be up and running, and for there to be no corresponding liveness record for it. This was a very transient situation as liveness records are created for a given node as soon as it out its first heartbeat. Still, given that this could take a few seconds, it lent to a lot of complexity in our handling of node liveness where we had to always anticipate the possibility of there being no corresponding liveness record for a given node (and thus creating it if necessary). Having a liveness record for each node always present is a crucial building block for long running migrations (cockroachdb#48843). There the intention is to have the orchestrator process look towards the list of liveness records for an authoritative view of cluster membership. Previously when it was possible for an active member of the cluster to not have a corresponding liveness record (no matter how unlikely or short-lived in practice), we could not generate such a view. --- This is an alternative implementation for cockroachdb#53805. Here we choose to manually write the liveness record for the bootstrapping node when writing initial cluster data. For all other nodes, we do it on the server-side of the join RPC. We're also careful to do it in the legacy codepath when joining a cluster through gossip. Release note: None
… not just availability Fixes cockroachdb#53515. We should have the autoupgrade process look at the fully decommissioned bit we added in cockroachdb#50329, instead of just looking at availability. It would avoid the hazard described in cockroachdb#53515. Previously the autoupgrade process was also looking at NodeStatusLiveness, which we've since soured upon (see cockroachdb#50478). Now that we always create a liveness record on start up (cockroachdb#53805), we can simply fetch all liveness records from KV. We add a helper to do this, which we'll also rely on in future PRs for other purposes. It's a bit unfortunate that we're further adding on to the NodeLiveness API without changing the caching structure, but the methods fetching records from KV is the world we're hoping to move towards going forward. Release note: None
Now that we always create a liveness record on start up (cockroachdb#53805), we can simply fetch all liveness records from KV when wanting an up-to-date view of all nodes in the cluster. We add a helper to do as much, which we'll rely on in future commits. It's a bit unfortunate that we're further adding on to the NodeLiveness API without changing the underlying look-aside caching structure, but the methods fetching records from KV is the world we're hoping to start moving towards over time. Release note: None
To flesh out the migrations infrastructure (in future commits), we'll need a handle on all the nodes present in the system. Now that we always create a liveness record on start up (cockroachdb#53805), we can simply fetch all liveness records from KV. We add a helper to do so. It's a bit unfortunate that we're further adding on to the NodeLiveness API without changing the caching structure, but the methods fetching records from KV is the world we're hoping to move towards going forward. This does mean that we're introducing a direct dependency on NodeLiveness in the sql layer, and there's improvements to be made here around interfaces delineating between "system tenant only" sql code and everything else. Only system tenants have the privilege to set cluster settings (or at least the version setting specifically), which is what this API will look to power. Release note: None
Now that we always create a liveness record on start up (cockroachdb#53805), we can simply fetch all liveness records from KV when wanting an up-to-date view of all nodes in the cluster. We add a helper to do as much, which we'll rely on in future commits. It's a bit unfortunate that we're further adding on to the NodeLiveness API without changing the underlying look-aside caching structure, but the methods fetching records from KV is the world we're hoping to start moving towards over time. Release note: None
To flesh out the migrations infrastructure (in future commits), we'll need a handle on all the nodes present in the system. Now that we always create a liveness record on start up (cockroachdb#53805), we can simply fetch all liveness records from KV. We add a helper to do so. It's a bit unfortunate that we're further adding on to the NodeLiveness API without changing the caching structure, but the methods fetching records from KV is the world we're hoping to move towards going forward. This does mean that we're introducing a direct dependency on NodeLiveness in the sql layer, and there's improvements to be made here around interfaces delineating between "system tenant only" sql code and everything else. Only system tenants have the privilege to set cluster settings (or at least the version setting specifically), which is what this API will look to power. Release note: None
Now that we always create a liveness record on start up (cockroachdb#53805), we can simply fetch all records from KV when wanting an up-to-date view of all nodes that have ever been a part of the cluster. We add a helper to do as much, which we'll rely on when introducing long running migrations (cockroachdb#56107). It's a bit unfortunate that we're further adding on to the liveness API without changing the underlying look-aside cache structure, but the up-to-date records from KV directly is the world we're hoping to start moving towards over time. The TODO added in [1] outlines what the future holds. [1]: cockroachdb@d631239 Release note: None
Now that we always create a liveness record on start up (cockroachdb#53805), we can simply fetch all records from KV when wanting an up-to-date view of all nodes that have ever been a part of the cluster. We add a helper to do as much, which we'll rely on when introducing long running migrations (cockroachdb#56107). It's a bit unfortunate that we're further adding on to the liveness API without changing the underlying look-aside cache structure, but the up-to-date records from KV directly is the world we're hoping to start moving towards over time. The TODO added in [1] outlines what the future holds. [1]: cockroachdb@d631239 Release note: None
56130: opt: updated normalization rules for folding is expressions with null predicate r=mgartner a=jayshrivastava opt: updated normalization rules for folding is expressions with null predicate Previously, for statements such as `SELECT (foo,bar) IS DISTINCT FROM NULL FROM a_table`, the expression `(foo,bar) IS DISTINCT FROM NULL` would not be normalized to `true`. Similarly, if `IS NOT DISTINCT FROM NULL` were used, then the expression would not be normalized to `false`. The previous statement would only normalize if the tuple/array in the statement contained only constants. Given the updates in this commit, normalization will be applied when any arrays or tuples are provided in this situation. Release note: None 56217: sql: clean up uses of Statement r=RaduBerinde a=RaduBerinde This change cleans up the use of `sql.Statement` and reduces some allocations. Specifically: - we create a `Statement` lower in the stack (in `execStmtInOpenState`), and pass only what we need in the higher layers; - we change various functions to take a `tree.Statement` rather than an entire `Statement` when possible; - we move down the `withStatement` context allocation, so that it is avoided in the implicit transaction state transition; - we store a copy rather than a pointer to the Statement in the planner; - we avoid directly using `stmt` fields from `func()` declarations that escape; - we populate `Statement.AnonymizedStr` upfront. The anonymized string is always needed (to update statement stats). ``` name old time/op new time/op delta EndToEnd/kv-read/EndToEnd 153µs ± 1% 154µs ± 2% ~ (p=0.486 n=4+4) EndToEnd/kv-read-no-prep/EndToEnd 216µs ± 1% 217µs ± 1% ~ (p=0.886 n=4+4) EndToEnd/kv-read-const/EndToEnd 111µs ± 1% 113µs ± 1% +1.01% (p=0.029 n=4+4) name old alloc/op new alloc/op delta EndToEnd/kv-read/EndToEnd 25.8kB ± 1% 25.5kB ± 1% ~ (p=0.114 n=4+4) EndToEnd/kv-read-no-prep/EndToEnd 32.2kB ± 1% 31.9kB ± 1% ~ (p=0.686 n=4+4) EndToEnd/kv-read-const/EndToEnd 21.0kB ± 1% 20.7kB ± 2% ~ (p=0.200 n=4+4) name old allocs/op new allocs/op delta EndToEnd/kv-read/EndToEnd 252 ± 1% 250 ± 0% -0.99% (p=0.029 n=4+4) EndToEnd/kv-read-no-prep/EndToEnd 332 ± 0% 330 ± 1% ~ (p=0.229 n=4+4) EndToEnd/kv-read-const/EndToEnd 214 ± 0% 212 ± 0% -0.93% (p=0.029 n=4+4) ``` Release note: None 56243: liveness: introduce GetLivenessesFromKV r=irfansharif a=irfansharif Now that we always create a liveness record on start up (#53805), we can simply fetch all records from KV when wanting an up-to-date view of all nodes that have ever been a part of the cluster. We add a helper to do as much, which we'll rely on when introducing long running migrations (#56107). It's a bit unfortunate that we're further adding on to the liveness API without changing the underlying look-aside cache structure, but the up-to-date records from KV directly is the world we're hoping to start moving towards over time. The TODO added in [1] outlines what the future holds. We'll also expose the GetLivenessesFromKV API we introduced earlier to pkg/sql. We'll rely on it when needing to plumb in the liveness instance into the migration manager process (prototyped in #56107) It should be noted that this will be a relatively meatier form of a dependency on node liveness from pkg/sql than we have currently. Today the only uses are in DistSQL planning and in jobs[2]. As it relates to our multi-tenancy work, the real use of this API will happen only on the system tenant. System tenants alone have the privilege to set cluster settings (or at least the version setting specifically), which is what the migration manager will be wired into. [1]: d631239 [2]: #48795 Release note: None --- First commit is from #56221, and can be ignored here. Co-authored-by: Jayant Shrivastava <[email protected]> Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: irfan sharif <[email protected]>
Previously it used to be the case that it was possible for a node to be
up and running, and for there to be no corresponding liveness record for
it. This was a very transient situation as liveness records are created
for a given node as soon as it out its first heartbeat. Still, given
that this could take a few seconds, it lent to a lot of complexity in
our handling of node liveness where we had to always anticipate the
possibility of there being no corresponding liveness record for a given
node (and thus creating it if necessary).
It was originally thought we'd only be able to guarantee persisting a
liveness record for a joining node if we had infrastructure like the
join rpc, introduced in #52526, but staring at it a bit harder we can
simply ensure we heartbeat our liveness record at least once before
fully spinning up our server (aka opening the RPC floodgates).
Having a liveness record for each node always present is a crucial
building block for long running migrations (#48843). There the intention
is to have the orchestrator process look towards the list of liveness
records for an authoritative view of cluster membership. Previously when
it was possible for an active member of the cluster to not have a
corresponding liveness record (no matter how unlikely or short-lived in
practice), we could not generate such a view.
Release justification: low risk, high benefit changes to existing functionality
Release note: None
First commit is from #53713, both are necessary to remove the hack in the decommissioning roachtest.