-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli: allow node status
to work in unavailable/broken clusters
#28249
Conversation
This needs a test. @knz do any of the cli tests start up multi-node clusters? If yes, can you point me towards where that is done. |
36fd8d9
to
1197589
Compare
1197589
to
17dd686
Compare
17dd686
to
dea6e26
Compare
Rather than extend |
dea6e26
to
c445a94
Compare
This is done. Still need to add a test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM aside from tests. None of the CLI tests start a multi-node cluster (but test_init_command.tcl
comes closest). Or you could do it in a regular test using TestCluster (like I did for the remove-dead-replicas command). Not quite end-to-end, but close enough.
Reviewed 11 of 11 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the hassle to do this using a TCL test is comparable to that of a Go test, so in this case I would recommend a Go test.
It can also become an acceptance test instead of a CLI test. In acceptance tests we have multi-node clusters and we're properly testing end-to-end.
Reviewed 11 of 11 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/crdb_internal.go, line 1835 at r1 (raw file):
} // crdbInternalGossipLivenessTable exposes local information about the nodes' liveness.
- Extend this comment to explain the information can be stale / incomplete because gossip doesn't come with a guarantee of accuracy.
- please check if the query against this table in
workloadccl/fixture.go
is still correct with your changes.
c445a94
to
e5a9754
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I still need to add a test).
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/crdb_internal.go, line 1835 at r1 (raw file):
Extended the comment, though any time the term gossip is used, the reader should understand that consistency and freshness are not present.
please check if the query against this table in workloadccl/fixture.go is still correct with your changes.
Thanks for the pointer. That query is SELECT count(node_id) FROM "".crdb_internal.gossip_liveness
. Do you have any reason to suspect it wouldn't continue to work?
e5a9754
to
a546939
Compare
74b641c
to
aef0c35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a cli/node-status
roachtest and also added that roachtest to the list that is run on every CI. PTAL.
Reviewable status: complete! 0 of 0 LGTMs obtained
aef0c35
to
7a35838
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 6 files at r2, 9 of 9 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained
bors r=bdarnell |
Build failed |
Expand the `crdb_internal.gossip_liveness` and `crdb_internal.gossip_nodes` tables to include columns needed to satisfy the basic usage of `node status`. Specifically, added `address`, `build`, `started_at`, `updated_at` and `replicas` columns. Changed `node status` to use `gossip_{liveness,nodes}` instead of `kv_node_status`. The latter table requires the range containing the consistent node status descriptors to be available, while `gossip_{liveness,nodes}` only retrieves info from gossip. `node status` and `node status --decommission` will work on unavailable/broken clusters as long as the node they are pointed to is up. `node status {--stats,--ranges,--all}` continue to require a reasonably healthy cluster. Fixes cockroachdb#16489 Release note (cli change): Enhance `node status` to work on unavailable/broken clusters.
7a35838
to
bf33234
Compare
bors r=bdarnell |
28249: cli: allow `node status` to work in unavailable/broken clusters r=bdarnell a=petermattis Expand the `crdb_internal.gossip_liveness` table to include columns needed to satisfy the basic usage of `node status`. Specifically, added `address`, `build`, `started_at`, `updated_at` and `replicas` columns. Changed `node status` to use `gossip_liveness` instead of `kv_node_status`. The latter table requires the range containing the consistent node status descriptors to be available, while `gossip_liveness` only retrieves info from gossip. `node status` and `node status --decommission` will work on unavailable/broken clusters as long as the node they are pointed to is up. `node status {--stats,--ranges,--all}` continue to require a reasonably healthy cluster. Fixes #16489 Release note (cli change): Enhance `node status` to work on unavailable/broken clusters. Co-authored-by: Peter Mattis <[email protected]>
Build succeeded |
Expand the
crdb_internal.gossip_liveness
table to include columnsneeded to satisfy the basic usage of
node status
. Specifically, addedaddress
,build
,started_at
,updated_at
andreplicas
columns. Changed
node status
to usegossip_liveness
instead ofkv_node_status
. The latter table requires the range containing theconsistent node status descriptors to be available, while
gossip_liveness
only retrieves info from gossip.node status
andnode status --decommission
will work onunavailable/broken clusters as long as the node they are pointed to is
up.
node status {--stats,--ranges,--all}
continue to require areasonably healthy cluster.
Fixes #16489
Release note (cli change): Enhance
node status
to work onunavailable/broken clusters.