-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-22.2: gossip: reintroduce GossipClientsKey
gossip
#103788
release-22.2: gossip: reintroduce GossipClientsKey
gossip
#103788
Conversation
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reminding myself of the history in #89613. I'm not sure this is worth fixing given the CPU overhead this is introducing in turn, but perhaps we could just publish a release note calling it out instead.
- We know exactly the 22.2.x/23.1 releases where
is_live
is a false positive, and those users can use newer patch releases of 22.2.x while still in their mixed version state and be fine. - 22.1.x is out of the maintenance window tomorrow (2023-05-24, https://www.cockroachlabs.com/docs/releases/release-support-policy.html). This
is_live
false positive, given it happens with 22.1 clusters present, could be spun as as a known issue in their release that's now out of support but is fixed in 22.2.3+.
What do you think?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker)
I'm inclined to agree. We could version gate this such that we stop gossiping in finalized 22.2 clusters while retaining 22.1 compatibility, and live with the <22.2.3 incompatibility. |
dc25d5c
to
5e935c7
Compare
I gated this on |
5e935c7
to
158b09f
Compare
In 2114678, we stopped gossiping the connectivity graph via `GossipClientsKey`. However, this broke backwards compatibility with `cockroach node status` which relied on this key to determine `is_live` liveness. This patch partially reverts that change, by reintroducing gossip of these keys in mixed 22.1/22.2 clusters, but otherwise not using them for anything. We stop gossiping them in finalized 22.2 clusters, since the cost of gossiping these is significant in large clusters (which motivated their removal in the first place). This breaks backwards compatibility with <22.2.3, we'll just have to live with that. Epic: none Release note (bug fix): Fixed a bug where `cockroach node status` could incorrectly report nodes as `is_live = false` in mixed 22.1/22.2 clusters. The bug still exists between 22.2 patch versions before and after 22.2.3.
158b09f
to
a6ee652
Compare
Verified the fix in a mixed-version clusters. TFTR! |
In 2114678, we stopped gossiping the connectivity graph via
GossipClientsKey
. However, this broke backwards compatibility withcockroach node status
which relied on this key to determineis_live
liveness.This patch partially reverts that change, by reintroducing gossip of these keys in mixed 22.1/22.2 clusters, but otherwise not using them for anything. We stop gossiping them in finalized 22.2 clusters. This breaks backwards compatibility with <22.2.3, we'll just have to live with that.
Touches #89613.
Touches #51838.
Epic: none
Release note (bug fix): Fixed a bug where
cockroach node status
could incorrectly report nodes asis_live = false
in mixed 22.1/22.2 clusters. The bug still exists between 22.2 patch versions before and after 22.2.3.