-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-2.0: cli: fix cockroach quit
#26163
Conversation
This patch fixes the following: - the logic in `doShutdown()` aims to ignore errors caused by attempts connect to a server which is closing its gRPC channels, but was missing one case of such errors: during the initial check whether the node was running. This patch causes gRPC "closed connection" errors to become also ignored in that case. - previously if there was a transient gRPC error during a hard shutdown whereby the shutdown could still succeed, then `cockroach quit` would fail no matter what. This patch makes `cockroach quit` retry a hard shutdown. - the warning messages are now emitted on stderr (via `log.Warningf`) instead of stdout. Release note (bug fix): fix a bug where `cockroach quit` would erroneously fail even though the node already successfully shut down. Release note (cli change): `cockroach quit` now emits warning message on its standard error stream, not standard output.
LGTM. I assume you were able to verify (otherwise I would suggest leaving it to bake on master for a few days before merging this) |
Let's let it bake on master before merging then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the bug present in release-2.0? I thought our theory was that it was related to the GRPC upgrade on master.
@@ -1042,6 +1042,9 @@ func doShutdown(ctx context.Context, c serverpb.AdminClient, onModes []int32) er | |||
// out, or perhaps drops the connection while waiting). To that end, we first | |||
// run a noop DrainRequest. If that fails, we give up. | |||
if err := checkNodeRunning(ctx, c); err != nil { | |||
if grpcutil.IsClosedConnection(err) { | |||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you try to run cockroach quit
against a node that's not running, we still want to fail instead of swallowing the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing still fails. IsClosedConnection
does not return true on ECONNREFUSED.
This PR fixes an orthogonal bug: |
We ignore "RPC connection is closed" errors on the other paths because the Drain RPC causes the connection to be closed. It's not an expected error from checkNodeRunning. |
it is expected when |
Ah, I missed that this whole routine was called twice. LGTM |
This hasn't hurt us on master, so merging this. bors r+ |
26163: release-2.0: cli: fix `cockroach quit` r=knz a=knz Backport 1/1 commits from #26158. This `cockroach quit` bug is likely to affect production users, so we need to back-port it. /cc @cockroachdb/release Co-authored-by: Raphael 'kena' Poss <[email protected]>
Build succeeded |
Backport 1/1 commits from #26158.
This
cockroach quit
bug is likely to affect production users, so we need to back-port it./cc @cockroachdb/release