Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: allow node status to work in unavailable/broken clusters #28249

Merged
merged 1 commit into from
Aug 10, 2018

Conversation

petermattis
Copy link
Collaborator

Expand the crdb_internal.gossip_liveness table to include columns
needed to satisfy the basic usage of node status. Specifically, added
address, build, started_at, updated_at and replicas
columns. Changed node status to use gossip_liveness instead of
kv_node_status. The latter table requires the range containing the
consistent node status descriptors to be available, while
gossip_liveness only retrieves info from gossip.

node status and node status --decommission will work on
unavailable/broken clusters as long as the node they are pointed to is
up. node status {--stats,--ranges,--all} continue to require a
reasonably healthy cluster.

Fixes #16489

Release note (cli change): Enhance node status to work on
unavailable/broken clusters.

@petermattis petermattis requested a review from a team as a code owner August 3, 2018 16:44
@petermattis petermattis requested review from a team August 3, 2018 16:44
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@petermattis petermattis requested a review from knz August 3, 2018 16:44
@petermattis
Copy link
Collaborator Author

This needs a test. @knz do any of the cli tests start up multi-node clusters? If yes, can you point me towards where that is done.

@petermattis petermattis requested a review from a team August 3, 2018 16:57
@petermattis petermattis requested a review from a team as a code owner August 3, 2018 17:06
@petermattis
Copy link
Collaborator Author

Rather than extend gossip_liveness, perhaps I should be adding these new fields to gossip_nodes and then joining with that table.

@petermattis
Copy link
Collaborator Author

Rather than extend gossip_liveness, perhaps I should be adding these new fields to gossip_nodes and then joining with that table.

This is done. Still need to add a test.

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from tests. None of the CLI tests start a multi-node cluster (but test_init_command.tcl comes closest). Or you could do it in a regular test using TestCluster (like I did for the remove-dead-replicas command). Not quite end-to-end, but close enough.

Reviewed 11 of 11 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the hassle to do this using a TCL test is comparable to that of a Go test, so in this case I would recommend a Go test.

It can also become an acceptance test instead of a CLI test. In acceptance tests we have multi-node clusters and we're properly testing end-to-end.

Reviewed 11 of 11 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/crdb_internal.go, line 1835 at r1 (raw file):

}

// crdbInternalGossipLivenessTable exposes local information about the nodes' liveness.
  1. Extend this comment to explain the information can be stale / incomplete because gossip doesn't come with a guarantee of accuracy.
  2. please check if the query against this table in workloadccl/fixture.go is still correct with your changes.

Copy link
Collaborator Author

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I still need to add a test).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/crdb_internal.go, line 1835 at r1 (raw file):
Extended the comment, though any time the term gossip is used, the reader should understand that consistency and freshness are not present.

please check if the query against this table in workloadccl/fixture.go is still correct with your changes.

Thanks for the pointer. That query is SELECT count(node_id) FROM "".crdb_internal.gossip_liveness. Do you have any reason to suspect it wouldn't continue to work?

@petermattis petermattis requested a review from a team August 10, 2018 14:50
@petermattis petermattis force-pushed the pmattis/node-ls branch 3 times, most recently from 74b641c to aef0c35 Compare August 10, 2018 14:58
Copy link
Collaborator Author

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a cli/node-status roachtest and also added that roachtest to the list that is run on every CI. PTAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 6 files at r2, 9 of 9 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@petermattis
Copy link
Collaborator Author

bors r=bdarnell

@craig
Copy link
Contributor

craig bot commented Aug 10, 2018

Build failed

Expand the `crdb_internal.gossip_liveness` and
`crdb_internal.gossip_nodes` tables to include columns needed to satisfy
the basic usage of `node status`. Specifically, added `address`,
`build`, `started_at`, `updated_at` and `replicas` columns. Changed
`node status` to use `gossip_{liveness,nodes}` instead of
`kv_node_status`. The latter table requires the range containing the
consistent node status descriptors to be available, while
`gossip_{liveness,nodes}` only retrieves info from gossip.

`node status` and `node status --decommission` will work on
unavailable/broken clusters as long as the node they are pointed to is
up. `node status {--stats,--ranges,--all}` continue to require a
reasonably healthy cluster.

Fixes cockroachdb#16489

Release note (cli change): Enhance `node status` to work on
unavailable/broken clusters.
@petermattis
Copy link
Collaborator Author

bors r=bdarnell

craig bot pushed a commit that referenced this pull request Aug 10, 2018
28249: cli: allow `node status` to work in unavailable/broken clusters r=bdarnell a=petermattis

Expand the `crdb_internal.gossip_liveness` table to include columns
needed to satisfy the basic usage of `node status`. Specifically, added
`address`, `build`, `started_at`, `updated_at` and `replicas`
columns. Changed `node status` to use `gossip_liveness` instead of
`kv_node_status`. The latter table requires the range containing the
consistent node status descriptors to be available, while
`gossip_liveness` only retrieves info from gossip.

`node status` and `node status --decommission` will work on
unavailable/broken clusters as long as the node they are pointed to is
up. `node status {--stats,--ranges,--all}` continue to require a
reasonably healthy cluster.

Fixes #16489

Release note (cli change): Enhance `node status` to work on
unavailable/broken clusters.

Co-authored-by: Peter Mattis <[email protected]>
@craig
Copy link
Contributor

craig bot commented Aug 10, 2018

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants