-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
webui: update to handle removal of decommissioned nodes #61812
Comments
Better yet, we should use the contents of node liveness directly (not the ephemeral gossip state): |
Related is #50707, which in hindsight is a very badly worded issue. Really we should purge all uses of these status entries in favor of liveness entries. |
internal slack thread on the issue. I think the current presentation is actively dangerous. The UI will tell users that nodes that are actively undergoing the decommissioning process are "decommissioned" which users may interpret as them being removable from the cluster. This could lead to loss of quorum scenarios, as removing live nodes that houses replicas (which a decommissioning node is) in sufficient numbers does. As a quick fix, we can hide this section altogether. |
I ran into some confusion while going through the decommissioning lifecycle. @erikgrinaker can you advise? I started to decommission a node, but when the node reached zero replicas, it disappeared from the Node List on the Overview page. I would have expected it to remain on the Node List with a Additionally, querying This was tested using a local 4 node cluster running 21.2.3 via roachprod. |
No, the decommissioning process starts out marking the node as |
The way to interpret
in the latter case of a down node is that the rest of the cluster realizes that this node is not coming back, and will update their replica placement to make additional replicas elsewhere. |
Is the documentation using different semantics? It seems like a conflicting definition is given:
|
Yes, that is outdated. I opened a docs issue to fix it way back when we changed this, but it hasn't been done yet: I'll ping the docs team about it. |
In #56529 we remove a node's status entry once it's decommissioned. This means that it no longer shows up in neither "Recently Decommissioned Nodes", nor the "Decommissioned Node History" in the UI. We may want to either remove these views, or change them to use ephemeral info from gossip liveness (e.g.
crdb_internal.gossip_liveness
) rather than the status entry (e.g.crdb_internal.kv_node_status
).Also requested a docs update in #61808, should coordinate any follow-up actions.
Epic CRDB-10792
The text was updated successfully, but these errors were encountered: