Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-2.0: cli: fix cockroach quit #26163

Merged
merged 1 commit into from
Jun 8, 2018

Conversation

knz
Copy link
Contributor

@knz knz commented May 29, 2018

Backport 1/1 commits from #26158.

This cockroach quit bug is likely to affect production users, so we need to back-port it.

/cc @cockroachdb/release

This patch fixes the following:

- the logic in `doShutdown()` aims to ignore errors caused by attempts
  connect to a server which is closing its gRPC channels, but was
  missing one case of such errors: during the initial check whether
  the node was running. This patch causes gRPC "closed connection"
  errors to become also ignored in that case.

- previously if there was a transient gRPC error during a hard
  shutdown whereby the shutdown could still succeed, then `cockroach
  quit` would fail no matter what. This patch makes `cockroach quit`
  retry a hard shutdown.

- the warning messages are now emitted on stderr (via `log.Warningf`)
  instead of stdout.

Release note (bug fix): fix a bug where `cockroach quit` would
erroneously fail even though the node already successfully shut down.

Release note (cli change): `cockroach quit` now emits warning message
on its standard error stream, not standard output.
@knz knz requested a review from tbg May 29, 2018 12:28
@knz knz requested a review from a team as a code owner May 29, 2018 12:28
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg
Copy link
Member

tbg commented May 29, 2018

LGTM. I assume you were able to verify (otherwise I would suggest leaving it to bake on master for a few days before merging this)

@knz
Copy link
Contributor Author

knz commented May 29, 2018

Let's let it bake on master before merging then.

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the bug present in release-2.0? I thought our theory was that it was related to the GRPC upgrade on master.

@@ -1042,6 +1042,9 @@ func doShutdown(ctx context.Context, c serverpb.AdminClient, onModes []int32) er
// out, or perhaps drops the connection while waiting). To that end, we first
// run a noop DrainRequest. If that fails, we give up.
if err := checkNodeRunning(ctx, c); err != nil {
if grpcutil.IsClosedConnection(err) {
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you try to run cockroach quit against a node that's not running, we still want to fail instead of swallowing the error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing still fails. IsClosedConnection does not return true on ECONNREFUSED.

@knz
Copy link
Contributor Author

knz commented May 29, 2018

Is the bug present in release-2.0? I thought our theory was that it was related to the GRPC upgrade on master.

This PR fixes an orthogonal bug: cockroach quit should not fail if checkNodeRunning encounters a "RPC connection is closed" error, like the other paths.

@bdarnell
Copy link
Contributor

We ignore "RPC connection is closed" errors on the other paths because the Drain RPC causes the connection to be closed. It's not an expected error from checkNodeRunning.

@knz
Copy link
Contributor Author

knz commented May 29, 2018

It's not an expected error from checkNodeRunning.

it is expected when checkNodeRunning is ran the 2nd time for a hard shutdown.

@bdarnell
Copy link
Contributor

Ah, I missed that this whole routine was called twice.

LGTM

@knz
Copy link
Contributor Author

knz commented Jun 8, 2018

This hasn't hurt us on master, so merging this.

bors r+

craig bot pushed a commit that referenced this pull request Jun 8, 2018
26163: release-2.0: cli: fix `cockroach quit` r=knz a=knz

Backport 1/1 commits from #26158.

This `cockroach quit` bug is likely to affect production users, so we need to back-port it.

/cc @cockroachdb/release


Co-authored-by: Raphael 'kena' Poss <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jun 8, 2018

Build succeeded

@craig craig bot merged commit f0fc0d5 into cockroachdb:release-2.0 Jun 8, 2018
@knz knz deleted the backport2.0-26158 branch June 8, 2018 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants