-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli: let commands return partial information when run against unavailable/broken clusters #16489
Comments
An example - Cassandra gives a status information in the following form:
Where U and D signals whether a node is reachable. The current CockroachDB behaviour when there is a serious problem and the quorum is lost is to hang indefinitely on |
Thanks @davibo-oc for the context. That's very helpful, and appreciated! |
Hope you guys are going to give this the focus it deserves. Admin functions that fail to respond along with everything else precludes it from use in some of the crisis situations for which it was designed. |
@shawnrichards thanks for the additional nudge here. Unfortunately, this isn't going to make it into our 1.1 release, but it is slotted for investigation in the 1.2 release. |
@dianasaur323 As long as it is on you guys' radar, that is perfect. |
Draft Acceptance Criteria -- Accepting Comments Rationale Feature Scope
PM Acceptance Testing
|
Also: time out after trying for a while. |
Thanks @tschottdorf! I've allocated time in our roadmap to close this out in 2.1, so modifying the milestone sounds good to me. |
@tschottdorf If no one is working on this, I can pick this up. |
@Nishant9 I think we're not quite at the point here where we know what exactly we want, so if you took this on you'd spend a lot of time stuck in limbo. However, there's another issue I could use your help with if you're interested. |
@nstewart who's in charge of this now on the PM side? We're basically ready to change how Note to self: #20403 has some WIP. |
@piyush-singh is tackling this. @piyush-singh can you follow up here? |
spoke to @tschottdorf and we'll prioritize this up as part of the CLI team's upcoming 2 day bugfixing cycle |
Expand the `crdb_internal.gossip_liveness` and `crdb_internal.gossip_nodes` tables to include columns needed to satisfy the basic usage of `node status`. Specifically, added `address`, `build`, `started_at`, `updated_at` and `replicas` columns. Changed `node status` to use `gossip_{liveness,nodes}` instead of `kv_node_status`. The latter table requires the range containing the consistent node status descriptors to be available, while `gossip_{liveness,nodes}` only retrieves info from gossip. `node status` and `node status --decommission` will work on unavailable/broken clusters as long as the node they are pointed to is up. `node status {--stats,--ranges,--all}` continue to require a reasonably healthy cluster. Fixes cockroachdb#16489 Release note (cli change): Enhance `node status` to work on unavailable/broken clusters.
Expand the `crdb_internal.gossip_liveness` and `crdb_internal.gossip_nodes` tables to include columns needed to satisfy the basic usage of `node status`. Specifically, added `address`, `build`, `started_at`, `updated_at` and `replicas` columns. Changed `node status` to use `gossip_{liveness,nodes}` instead of `kv_node_status`. The latter table requires the range containing the consistent node status descriptors to be available, while `gossip_{liveness,nodes}` only retrieves info from gossip. `node status` and `node status --decommission` will work on unavailable/broken clusters as long as the node they are pointed to is up. `node status {--stats,--ranges,--all}` continue to require a reasonably healthy cluster. Fixes cockroachdb#16489 Release note (cli change): Enhance `node status` to work on unavailable/broken clusters.
Expand the `crdb_internal.gossip_liveness` and `crdb_internal.gossip_nodes` tables to include columns needed to satisfy the basic usage of `node status`. Specifically, added `address`, `build`, `started_at`, `updated_at` and `replicas` columns. Changed `node status` to use `gossip_{liveness,nodes}` instead of `kv_node_status`. The latter table requires the range containing the consistent node status descriptors to be available, while `gossip_{liveness,nodes}` only retrieves info from gossip. `node status` and `node status --decommission` will work on unavailable/broken clusters as long as the node they are pointed to is up. `node status {--stats,--ranges,--all}` continue to require a reasonably healthy cluster. Fixes cockroachdb#16489 Release note (cli change): Enhance `node status` to work on unavailable/broken clusters.
Expand the `crdb_internal.gossip_liveness` and `crdb_internal.gossip_nodes` tables to include columns needed to satisfy the basic usage of `node status`. Specifically, added `address`, `build`, `started_at`, `updated_at` and `replicas` columns. Changed `node status` to use `gossip_{liveness,nodes}` instead of `kv_node_status`. The latter table requires the range containing the consistent node status descriptors to be available, while `gossip_{liveness,nodes}` only retrieves info from gossip. `node status` and `node status --decommission` will work on unavailable/broken clusters as long as the node they are pointed to is up. `node status {--stats,--ranges,--all}` continue to require a reasonably healthy cluster. Fixes cockroachdb#16489 Release note (cli change): Enhance `node status` to work on unavailable/broken clusters.
28249: cli: allow `node status` to work in unavailable/broken clusters r=bdarnell a=petermattis Expand the `crdb_internal.gossip_liveness` table to include columns needed to satisfy the basic usage of `node status`. Specifically, added `address`, `build`, `started_at`, `updated_at` and `replicas` columns. Changed `node status` to use `gossip_liveness` instead of `kv_node_status`. The latter table requires the range containing the consistent node status descriptors to be available, while `gossip_liveness` only retrieves info from gossip. `node status` and `node status --decommission` will work on unavailable/broken clusters as long as the node they are pointed to is up. `node status {--stats,--ranges,--all}` continue to require a reasonably healthy cluster. Fixes #16489 Release note (cli change): Enhance `node status` to work on unavailable/broken clusters. Co-authored-by: Peter Mattis <[email protected]>
Users expect to be able to get some information after running cockroach node status, even in the case of partitioning or loss of quorum.
It would be helpful to reveal some information from the local node (basically a warning that the node is no longer able to communicate with the rest of the cluster).
The text was updated successfully, but these errors were encountered: