Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-21.2: colrpc: propagate the flow cancellation as ungraceful for FlowStream RPC #73958

Merged
merged 3 commits into from
Dec 29, 2021

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Dec 17, 2021

Backport 1/1 commits from #73887.
Backport 2/3 commits from #74163.

/cc @cockroachdb/release


colrpc: propagate the flow cancellation as ungraceful for FlowStream RPC

This commit fixes an oversight in the cancellation protocol of the
vectorized inbox/outbox communication. Previously, when the flow context
of the inbox host has been canceled (indicating that the whole query
should be canceled) we would propagate it as a graceful completion of
the FlowStream RPC which would result in the outbox cancelling only
its own subtree on the remote node. However, what we ought to do is to
propagate such cancellation as the ungraceful RPC completion so that the
outbox would also cancel the flow context of its own host.

In some rare cases the old behavior could result in some flows being
stuck forever (until a node is restarted) because they would get blocked
on producing the data when their consumer has already exited.

The behavior in this fix is what we already have in place for the
row-by-row engine (see processInboundStreamHelper in
flowinfra/inbound.go).

Fixes: https://github.com/cockroachlabs/support/issues/1326.
Fixes: #72445.

Release note (bug fix): The bug with the ungraceful shutdown of the
distributed queries in some rare cases has been fixed. "Ungraceful" here
means because of the statement_timeout (most likely) or because a node
crashed.

colrpc: deflake TestOutboxInbox

In flowCtxCancel scenario there are two possible errors that might be
returned to the reader depending on the exact sequence of events:

  • QueryCanceledError is used when the flow ctx cancellation is
    observed before the stream arrived
  • wrapped context.Canceled error is used when the inbox handler
    goroutine notices the cancellation first and ungracefully shuts down
    the stream.

Previously, we assumed that only the latter could occur, and we allow
for either.

Release note: None

colflow: fix the shutdown for good

This commit fixes a long standing bug in the distributed vectorized
query shutdown where in case of a graceful completion of the flow on one
node, we might get an error on another node resulting in the ungraceful
termination of the query. This was caused by the fact that on remote
nodes the last outbox to exit would cancel the flow context; however,
when instantiating FlowStream RPC the outboxes used a child context of
the flow context, so that "graceful" cancellation of the flow context
would cause the inbox to get an ungraceful termination of the gRPC
stream. As a result, the whole query could get "context canceled" error.

I believe this bug was introduced by me over two years ago because
I didn't fully understand how the shutdown should work, and in
particular I was missing that when an inbox observes the flow context
cancellation, it should terminate the FlowStream RPC ungracefully in
order to propagate the ungracefullness to the other side of the stream.
This shortcoming was fixed in the previous commit.

Another possible bug was caused by the outbox canceling its own context
in case of a graceful shutdown. As mentioned above, FlowStream RPC was
issued using the outbox context, so there was a possibility of a race
between CloseSend call being delivered to the inbox (graceful
termination) and the context of the RPC being canceled (ungraceful
termination).

Both of these problems are now fixed, and the shutdown protocol now is as
follows:

  • on the gateway node we keep on canceling the flow context at the very
    end of the query execution. It doesn't matter whether the query resulted
    in an error or not, and doing so allows us to ensure that everything
    exits on the gateway node. This behavior is already present.
  • due to the fix in a previous commit, that flow context cancellation
    terminates ungracefully all still open gRPC streams for FlowStream RPC
    for which the gateway node is the inbox host.
  • the outboxes on the remote nodes get the ungraceful termination and
    cancel the flow context of their hosts. This, in turn, would trigger
    propagation of the ungraceful termination on other gRPC streams, etc.
  • whenever an outbox exits gracefully, it cancels its own context, but
    the gRPC stream uses the flow context, so the stream is still alive.
    I debated a bit whether we want to keep this outbox context cancellation
    in case of a graceful completion and decided to keep it to minimize the
    scope of changes.

Release note (bug fix): Previously, CockroachDB could return a spurious
"context canceled" error for a query that actually succeeded in
extremely rare cases, and this has now been fixed.

This commit fixes an oversight in the cancellation protocol of the
vectorized inbox/outbox communication. Previously, when the flow context
of the inbox host has been canceled (indicating that the whole query
should be canceled) we would propagate it as a graceful completion of
the `FlowStream` RPC which would result in the outbox cancelling only
its own subtree on the remote node. However, what we ought to do is to
propagate such cancellation as the ungraceful RPC completion so that the
outbox would also cancel the flow context of its own host.

In some rare cases the old behavior could result in some flows being
stuck forever (until a node is restarted) because they would get blocked
on producing the data when their consumer has already exited.

The behavior in this fix is what we already have in place for the
row-by-row engine (see `processInboundStreamHelper` in
`flowinfra/inbound.go`).

Release note (bug fix): The bug with the ungraceful shutdown of the
distributed queries in some rare cases has been fixed. "Ungraceful" here
means because of the `statement_timeout` (most likely) or because a node
crashed.
@blathers-crl
Copy link

blathers-crl bot commented Dec 17, 2021

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rytaft
Copy link
Collaborator

rytaft commented Dec 17, 2021

Would be good to investigate the recent test failures (#73962, #73966, #73788) before merging this

In `flowCtxCancel` scenario there are two possible errors that might be
returned to the reader depending on the exact sequence of events:
- `QueryCanceledError` is used when the flow ctx cancellation is
observed before the stream arrived
- wrapped `context.Canceled` error is used when the inbox handler
goroutine notices the cancellation first and ungracefully shuts down
the stream.

Previously, we assumed that only the latter could occur, and we allow
for either.

Release note: None
@yuzefovich
Copy link
Member Author

I think this is now safe to merge since there hasn't been any new fallout and the reasons for why we saw some failures just with the first commit are understood. PTAL @rytaft @jordanlewis

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 6 of 6 files at r1, 2 of 2 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis)

This commit fixes a long standing bug in the distributed vectorized
query shutdown where in case of a graceful completion of the flow on one
node, we might get an error on another node resulting in the ungraceful
termination of the query. This was caused by the fact that on remote
nodes the last outbox to exit would cancel the flow context; however,
when instantiating `FlowStream` RPC the outboxes used a child context of
the flow context, so that "graceful" cancellation of the flow context
would cause the inbox to get an ungraceful termination of the gRPC
stream. As a result, the whole query could get "context canceled" error.

I believe this bug was introduced by me over two years ago because
I didn't fully understand how the shutdown should work, and in
particular I was missing that when an inbox observes the flow context
cancellation, it should terminate the `FlowStream` RPC ungracefully in
order to propagate the ungracefullness to the other side of the stream.
This shortcoming was fixed in the previous commit.

Another possible bug was caused by the outbox canceling its own context
in case of a graceful shutdown. As mentioned above, `FlowStream` RPC was
issued using the outbox context, so there was a possibility of a race
between `CloseSend` call being delivered to the inbox (graceful
termination) and the context of the RPC being canceled (ungraceful
termination).

Both of these problems are now fixed, and the shutdown protocol now is as
follows:
- on the gateway node we keep on canceling the flow context at the very
end of the query execution. It doesn't matter whether the query resulted
in an error or not, and doing so allows us to ensure that everything
exits on the gateway node. This behavior is already present.
- due to the fix in a previous commit, that flow context cancellation
terminates ungracefully all still open gRPC streams for `FlowStream` RPC
for which the gateway node is the inbox host.
- the outboxes on the remote nodes get the ungraceful termination and
cancel the flow context of their hosts. This, in turn, would trigger
propagation of the ungraceful termination on other gRPC streams, etc.
- whenever an outbox exits gracefully, it cancels its own context, but
the gRPC stream uses the flow context, so the stream is still alive.
I debated a bit whether we want to keep this outbox context cancellation
in case of a graceful completion and decided to keep it to minimize the
scope of changes.

Release note (bug fix): Previously, CockroachDB could return a spurious
"context canceled" error for a query that actually succeeded in
extremely rare cases, and this has now been fixed.
@yuzefovich
Copy link
Member Author

Force-pushed to trigger the CLA check.

@yuzefovich yuzefovich merged commit d4d0732 into cockroachdb:release-21.2 Dec 29, 2021
@yuzefovich yuzefovich deleted the backport21.2-73887 branch December 29, 2021 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants