Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-19.2: cockroach init and /health fixes #46477

Merged
merged 2 commits into from
Mar 24, 2020

Conversation

knz
Copy link
Contributor

@knz knz commented Mar 24, 2020

Backport:

Please see individual PRs for details.

/cc @cockroachdb/release

Prior to this patch, `cockroach init` would fail with an error in
various situations:

- when the server was not started already;
- when the server was started but not yet started accepting TCP
  connections;
- when the server was started but not yet started listening for RPC
  requests.

In particular the error for the last case is rather difficult
to interpret (e.g. `snappy compression disabled`) and makes
problems hard(er) to troubleshoot.

This made it cumbersome to use `cockroach init` in scripts to
guarantee that the cluster would be initialized when it's ready.

Additionally, using `cockroach init` in a loop is reported unsafe: as
explained in
cockroachdb#19753 (comment)
it's theoretically possible for two `init` invocations to target two
different nodes in rapid succession, thereby initializing two separate
clusters.

To clarify this situation, this patch changes `cockroach init` to wait
until the node is ready to accept a `Bootstrap` request, by waiting
until its `/health` endpoint (`NodeDetails`) is ready.

Release note (cli change): The `cockroach init` command now waits
until the node at the provided server address is ready to accept
initialization. This also waits for network readiness. This makes it
easier to implement initialization scripts by removing the need for a
loop. (Such a loop was operationally unsafe anyway and should be
discouraged for correctness.)

Release note (backward-incompatible change): `cockroach init` now
waits for server readiness and thus does not fail quickly with an
error any more when a mistaken server address is provided.

Release justification: Bug fix to existing functionality
@knz knz requested review from ajwerner and tbg March 24, 2020 13:30
@knz knz requested a review from a team as a code owner March 24, 2020 13:30
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r1, 15 of 15 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)

@knz knz force-pushed the backport19.2-43904-45119 branch from 5a966af to 094f41b Compare March 24, 2020 14:49
This patch fixes `/health` by ensuring it does not require
authentication, does not perform KV operations and does not expose
sensitive details.

Reminder, for posterity:

- `/_status/details` exposes health, readiness and node details,
  which is potentially privileged information. This requires an
  authenticated admin user. It also performs a KV operation for
  authentication and thus subject to cluster availability.

- `/_admin/v1/health` only exposes health and readiness. It
  is non-privileged and does not require authentication. It does
  not perform KV operations and is thus not subject to cluster
  availability.

- `/health` is now an alias for `/_admin/v1/health`.  (It used to be a
  non-authenticated alias for `/_status/details/local` which was both a
  UX and security bug. See release notes below for details.)

- both `/health` and `/_admin/v1/health` accept a boolean flag e.g.
  via `?ready=1`. When `ready` is specified, a HTTP error is returned if
  the node is up but not able to accept client connections, or not live,
  or shutting down. When `ready` is *not* specified, a HTTP success is
  always returned if the node is up, even when it cannot accept client
  connections or is not live.

Release justification: Category 3: Fixes for high-priority or high-severity bugs in existing functionality

Release note (security update): The non-authenticated `/health` HTTP
endpoint was previously exposing the private IP address of the node,
which can be privileged information in some deployments. This has been
corrected. Deployments using automation to retrieve a node build
details, address details etc should use `/_status/details/local`
instead and use a valid admin authentication cookie.

Release note (bug fix): Accesses to `/health` using a non-root
authentication token do not hang any more when a node is currently
under load or if a system range is unavailable.

Release justification: Bug fix to existing functionality
@knz knz force-pushed the backport19.2-43904-45119 branch from 094f41b to 17a1f6d Compare March 24, 2020 16:17
@knz knz merged commit 26511d2 into cockroachdb:release-19.2 Mar 24, 2020
@knz knz deleted the backport19.2-43904-45119 branch March 24, 2020 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants