Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-23.1: sql: fix cleanup of not exhausted pausable portals in some cases #110666

Merged
merged 3 commits into from
Sep 27, 2023

Conversation

blathers-crl[bot]
Copy link

@blathers-crl blathers-crl bot commented Sep 14, 2023

Backport 2/2 commits from #110625 on behalf of @yuzefovich.
Backport 1/1 commits from #110807 on behalf of @yuzefovich.

/cc @cockroachdb/release


execinfra: relax RowSource.ConsumerClosed contract

This commit adjusts RowSource.ConsumerClosed contract so that it's
possible for this method to be called multiple times. Previously, vast
majority of implementations already supported that (and some actually
assumed that), but there were a couple that would result in a panic or
an error if called multiple times - those have been relaxed.

This is needed for the follow-up commit which will introduce an
unconditional call of this method on the "head" processor of the flow
(needed for pausable portals execution model).

Release note: None

sql: fix cleanup of not exhausted pausable portals in some cases

Previously, if we executed a flow (either vectorized or row-based) that
contains a row-by-row processor in "pausable portal" model and didn't
fully exhaust that flow (meaning that it could still produce more rows),
this could result in an error when closing the portal because that
processor might not be closed properly. Vectorized operators aren't
affected by this, but row-by-row processors effectively require
ConsumerClosed method to be called on them, and this might not happen.
In particular, it was assumed that this method would get called by
execinfra.Run loop, but for pausable portal model we needed to adjust
it so that the loop could be exited (and then re-entered) when switching
to another portal; thus, we lost the guarantee that ConsumerClosed is
always called.

This commit fixes this problem by teaching both vectorized and row-based
flows to always call ConsumerClosed on their "head" processor on
Cleanup (which is called when closing the portal). Now, Cleanup
calls ConsumerClosed unconditionally (when the flow might be running
in the pausable portal mode), and this is safe given the interface
adjustment in the previous commit.

Fixes: #110486

Release note (bug fix): Previously, when evaluating some queries (most
likely with a lookup join) via "multiple active portals" execution
mode (preview feature that can be enabled via
multiple_active_portals_enabled session variable) CockroachDB could
encounter an internal error like unexpected 40960 leftover bytes if
that portal wasn't fully consumed. This is now fixed.

flowinfra: fix recently introduced minor bug

This commit fixes a minor bug introduced in #110625. In particular, that
PR made so that we now unconditionally call ConsumerClosed on the
"head" processor to make sure that resources are properly closed in the
pausable portal model. However, ConsumerClosed is only valid to be
called only if Start was called, so this commit fixes that.
Additionally, to de-risk #110625 further we now only call that method
for local flows since pausable portals currently disable DistSQL.

Epic: None

Release note: None


Release justification: low-risk bug fix.

This commit adjusts RowSource.ConsumerClosed contract so that it's
possible for this method to be called multiple times. Previously, vast
majority of implementations already supported that (and some actually
assumed that), but there were a couple that would result in a panic or
an error if called multiple times - those have been relaxed.

This is needed for the follow-up commit which will introduce an
unconditional call of this method on the "head" processor of the flow
(needed for pausable portals execution model).

Release note: None
Previously, if we executed a flow (either vectorized or row-based) that
contains a row-by-row processor in "pausable portal" model and didn't
fully exhaust that flow (meaning that it could still produce more rows),
this could result in an error when closing the portal because that
processor might not be closed properly. Vectorized operators aren't
affected by this, but row-by-row processors effectively require
`ConsumerClosed` method to be called on them, and this might not happen.
In particular, it was assumed that this method would get called by
`execinfra.Run` loop, but for pausable portal model we needed to adjust
it so that the loop could be exited (and then re-entered) when switching
to another portal; thus, we lost the guarantee that `ConsumerClosed` is
always called.

This commit fixes this problem by teaching both vectorized and row-based
flows to always call `ConsumerClosed` on their "head" processor on
`Cleanup` (which is called when closing the portal). Now, `Cleanup`
calls `ConsumerClosed` unconditionally (when the flow _might_ be running
in the pausable portal mode), and this is safe given the interface
adjustment in the previous commit.

Release note (bug fix): Previously, when evaluating some queries (most
likely with a lookup join) via "multiple active portals" execution
mode (preview feature that can be enabled via
`multiple_active_portals_enabled` session variable) CockroachDB could
encounter an internal error like `unexpected 40960 leftover bytes` if
that portal wasn't fully consumed. This is now fixed.
@blathers-crl blathers-crl bot requested review from a team as code owners September 14, 2023 18:33
@blathers-crl blathers-crl bot force-pushed the blathers/backport-release-23.1-110625 branch from 4ec93e7 to 6ec6a0f Compare September 14, 2023 18:33
@blathers-crl blathers-crl bot requested review from lidorcarmel and removed request for a team September 14, 2023 18:33
@blathers-crl blathers-crl bot force-pushed the blathers/backport-release-23.1-110625 branch from 6ec6a0f to e405323 Compare September 14, 2023 18:33
@blathers-crl
Copy link
Author

blathers-crl bot commented Sep 14, 2023

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@blathers-crl blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels Sep 14, 2023
@blathers-crl blathers-crl bot force-pushed the blathers/backport-release-23.1-110625 branch from b7384ee to 7200e88 Compare September 14, 2023 18:33
@blathers-crl blathers-crl bot requested review from mgartner and removed request for a team September 14, 2023 18:33
@blathers-crl blathers-crl bot added the backport Label PR's that are backports to older release branches label Sep 14, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich
Copy link
Member

I'll hold off on merging this for a bit in case more things are exposed on master (need to backport #110807).

@yuzefovich yuzefovich added the do-not-merge bors won't merge a PR with this label. label Sep 18, 2023
This commit fixes a minor bug introduced in #110625. In particular, that
PR made so that we now unconditionally call `ConsumerClosed` on the
"head" processor to make sure that resources are properly closed in the
pausable portal model. However, `ConsumerClosed` is only valid to be
called only if `Start` was called, so this commit fixes that.
Additionally, to de-risk #110625 further we now only call that method
for local flows since pausable portals currently disable DistSQL.

Epic: None

Release note: None
@yuzefovich yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Sep 27, 2023
@yuzefovich
Copy link
Member

No new things came up, so I think it should be good to go.

@yuzefovich yuzefovich merged commit 6a18a04 into release-23.1 Sep 27, 2023
@yuzefovich yuzefovich deleted the blathers/backport-release-23.1-110625 branch September 27, 2023 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Label PR's that are backports to older release branches blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants