-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: update connExecutor logic for pausable portals #99663
sql: update connExecutor logic for pausable portals #99663
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I think I have the last batch of comments and questions.
Reviewed 8 of 8 files at r1, 5 of 5 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @ZhouXing19)
pkg/sql/conn_executor.go
line 2400 at r1 (raw file):
// portal. However, we update the result for each execution. Thus, we need // to accumulate the number of affected rows before closing the result. switch tcmd := cmd.(type) {
nit: since we only have a single case, then probably if
would be better than switch
.
pkg/sql/conn_executor_exec.go
line 208 at r1 (raw file):
pinfo *tree.PlaceholderInfo, canAutoCommit bool, ) (ev fsm.Event, payload fsm.EventPayload, err error) {
nit: since we're inspecting this error in the defer, it'd be better to do s/err/retErr/g
.
pkg/sql/conn_executor_exec.go
line 443 at r1 (raw file):
// in that jungle, we just overwrite them all here with an error that's // nicer to look at for the client. // Explaining why we need to check if the result has been released:
How do portals differ from the regular queries? I.e. is it something about limitedCommandResult
that makes res
to be auto-released in some cases?
pkg/sql/conn_executor_exec.go
line 448 at r1 (raw file):
// connExecutor. We should allow this case, so don't want to return an // assertion error in resToPushErr.Err() when the result is released. if res != nil && cancelQueryCtx.Err() != nil && !res.IsReleased() && res.Err() != nil {
Would it be cleaner to change limitedCommandResult
a bit to avoid these !IsReleased()
checks? In particular, I'm thinking that we'd need add implementations of Err
and SetError
methods to limitedCommandResult
- Err
would be valid to be called even after Close
has been called (even when commandResult.released
is true
) and SetError
would store the error in limitedCommandResult
directly (in addition to also storing it in the wrapped commandResult
).
pkg/sql/conn_executor_exec.go
line 630 at r1 (raw file):
// the correct instrumentation helper for the paused portal. ihToFinish := ih if isPausablePortal && portal.pauseInfo.ihWrapper != nil {
nit: the second part of the conditional here is redundant, right? I.e. ihWrapper
must always be non-nil here.
pkg/sql/conn_executor_exec.go
line 871 at r1 (raw file):
p.pausablePortal = portal } else { p.cancelChecker.Reset(ctx)
nit: I'd just remove this call altogether (this resetting is done in ex.resetPlanner
above).
pkg/sql/conn_executor_exec.go
line 1519 at r2 (raw file):
// We only allow non-distributed plan for pausable portals. if planner.pausablePortal != nil { distSQLMode = sessiondatapb.DistSQLOff
We can perform the check whether curPlan
has any subqueries or postqueries here, before disabling DistSQL. This will make it so that if the stmt is not supported for pausable portals, we still execute the stmt with normal DistSQL mode.
pkg/sql/distsql_running.go
line 1643 at r1 (raw file):
if p := planCtx.getPortalPauseInfo(); p != nil { if buildutil.CrdbTestBuild && planCtx.getPortalPauseInfo().flow == nil {
nit: can use p
here.
pkg/sql/pgwire/testdata/pgtest/multiple_active_portals
line 0 at r1 (raw file):
nit: seems like an empty file in the first commit.
pkg/sql/pgwire/testdata/pgtest/multiple_active_portals
line 396 at r3 (raw file):
send Query {"String": "COMMIT"}
nit: let's read a couple more rows from p1
here before committing.
pkg/sql/pgwire/testdata/pgtest/multiple_active_portals
line 649 at r3 (raw file):
send Query {"String": "SET statement_timeout='100s'"}
nit: why 100s? Do you want to disable the timeout here?
pkg/sql/pgwire/testdata/pgtest/multiple_active_portals
line 690 at r3 (raw file):
{"Type":"ReadyForQuery","TxStatus":"T"} send crdb_only
nit: some comments for what we're trying to check here would be good.
pkg/sql/pgwire/testdata/pgtest/multiple_active_portals
line 761 at r3 (raw file):
Bind {"DestinationPortal": "p1", "PreparedStatement": "q1"} Execute {"Portal": "p1", "MaxRows": 1} Query {"String": "SET CLUSTER SETTING sql.pgwire.multiple_active_portals.enabled = false"}
nit: do you just want to poison the txn with a non-retriable error here? Perhaps crdb_internal.force_error
would be cleaner?
13871ad
to
43d4f3f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing!
In our pair programming last week, you mentioned that it's preferred to have a session variable to gate multiple active portals. Just wanted to confirm if we want to replace the cluster setting with the session var, or have them both.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @yuzefovich)
pkg/sql/conn_executor_exec.go
line 443 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
How do portals differ from the regular queries? I.e. is it something about
limitedCommandResult
that makesres
to be auto-released in some cases?
Not really -- we close the result at the end of each execution at connExecutor.execCmd()
:
cockroach/pkg/sql/conn_executor.go
Line 2367 in 3f113fc
res.Close(ctx, stateToTxnStatusIndicator(ex.machine.CurState())) |
pkg/sql/conn_executor_exec.go
line 448 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Would it be cleaner to change
limitedCommandResult
a bit to avoid these!IsReleased()
checks? In particular, I'm thinking that we'd need add implementations ofErr
andSetError
methods tolimitedCommandResult
-Err
would be valid to be called even afterClose
has been called (even whencommandResult.released
istrue
) andSetError
would store the error inlimitedCommandResult
directly (in addition to also storing it in the wrappedcommandResult
).
Good idea, but I'm not sure why we can't use the error in the wrapped commandResult
directly?
pkg/sql/conn_executor_exec.go
line 1519 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
We can perform the check whether
curPlan
has any subqueries or postqueries here, before disabling DistSQL. This will make it so that if the stmt is not supported for pausable portals, we still execute the stmt with normal DistSQL mode.
Done.
pkg/sql/pgwire/testdata/pgtest/multiple_active_portals
line 649 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: why 100s? Do you want to disable the timeout here?
Yeah 😅 changed it to RESET
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our pair programming last week, you mentioned that it's preferred to have a session variable to gate multiple active portals. Just wanted to confirm if we want to replace the cluster setting with the session var, or have them both.
I think both would be good, but also it seems like we're discouraging introduction of new cluster settings when session variables would suffice, so maybe just the session var is the right approach. I'll defer to Rafi on this one.
Reviewed 8 of 8 files at r4, 6 of 6 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @ZhouXing19)
pkg/sql/conn_executor_exec.go
line 443 at r1 (raw file):
Previously, ZhouXing19 (Jane Xing) wrote…
Not really -- we close the result at the end of each execution at
connExecutor.execCmd()
:cockroach/pkg/sql/conn_executor.go
Line 2367 in 3f113fc
res.Close(ctx, stateToTxnStatusIndicator(ex.machine.CurState()))
Ok, I see what's going on now. We can hit the panic that the result has been released when Sync
is reached in an implicit transaction - at that point we have called res.Close
(because the execution of the previous pgwire command was done) and we haven't created a new result
cockroach/pkg/sql/conn_executor.go
Line 2248 in 3f113fc
ev, payload = ex.handleAutoCommit(ctx, &tree.CommitTransaction{}) |
We're violating
CommandResultClose.Close
contract which says that no new calls after Close
on the result are allowed, but I think it's ok to make an exception here. It'd be good to explain this in the comment on limitedCommandResult.Err
.
pkg/sql/conn_executor_exec.go
line 448 at r1 (raw file):
Previously, ZhouXing19 (Jane Xing) wrote…
Good idea, but I'm not sure why we can't use the error in the wrapped
commandResult
directly?
Yeah, you're right, we can just use commandResult.err
directly. We also then don't need to add limitedCommandResult.SetError
method since we're only calling Err
after Close
.
pkg/sql/conn_executor_exec.go
line 1621 at r6 (raw file):
fName: "populate query level stats and regions", f: func() { populateQueryLevelStatsAndRegions(ctx, &curPlanner, ex.server.cfg, &stats, &ex.cpuStatsCollector)
Here and below stats
is captured incorrectly - we need to use ppInfo.queryStats
.
pkg/sql/conn_executor_exec.go
line 1624 at r6 (raw file):
stmtFingerprintID = ex.recordStatementSummary( ctx, &curPlanner, int(ex.state.mu.autoRetryCounter), ppInfo.rowsAffected, res.Err(), stats,
This res
is the result that was created on the first execution which is incorrect. We should store the "current" result in pauseInfo
, update it on each re-execution, and use the latest result here.
pkg/sql/pgwire/command_result.go
line 433 at r6 (raw file):
// Postgres protocol. // // This design is known to be flawed. It only supports a specific subset of the
nit: this comment is not out-of-date.
pkg/sql/pgwire/command_result.go
line 671 at r6 (raw file):
} // Err is part of the sql.RestrictedCommandResult interface.
nit: let's mention in the comment why we need this method. In particular, we need to allow Err
calls after Close
has been called in implicit txns.
650ad1b
to
3589eee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @yuzefovich)
-- commits
line 48 at r15:
I will make a separate PR to add a test for this. I'm thinking about having a field inExecutorTestingKnobs
that forces an error when resuming a portal, and ensures that we don't increment the number of executed stmts.
pkg/sql/conn_executor_exec.go
line 401 at r10 (raw file):
"increment executed stmt cnt", func() { if retErr == nil && !payloadHasError(retPayload) {
Addressed all comments.
@yuzefovich Inspired by your comment on retaining the "current" res in pauseInfo
, I think this was wrong, so made another commit sql: store retErr and retPayload for pausable portals
. The retErr
here should be the one evaluated when this cleanup is executed, rather than when this it is created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r11, 6 of 6 files at r12, 2 of 2 files at r13, 5 of 5 files at r14, 1 of 1 files at r15, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @ZhouXing19)
pkg/sql/conn_executor_exec.go
line 401 at r10 (raw file):
Previously, ZhouXing19 (Jane Xing) wrote…
Addressed all comments.
@yuzefovich Inspired by your comment on retaining the "current" res inpauseInfo
, I think this was wrong, so made another commitsql: store retErr and retPayload for pausable portals
. TheretErr
here should be the one evaluated when this cleanup is executed, rather than when this it is created.
Good catch, we need to be careful with all of these deferred cleanup functions.
pkg/sql/conn_executor_exec.go
line 437 at r12 (raw file):
cancelQueryFunc = portal.pauseInfo.cancelQueryFunc } var resToPushErr RestrictedCommandResult
nit: you could combine these two lines into one resToPushErr := res
.
pkg/sql/conn_executor_exec.go
line 487 at r13 (raw file):
retPayload = eventNonRetriableErrPayload{err: errToPush} if isPausablePortal { portal.pauseInfo.retPayload = retPayload
Shouldn't the defer
at the top of the method handle this for us? increment executed stmt cnt
deferred cleanup function should be executed after we update pauseInfo
with the latest retErr
and retPayload
. I think that defer is the only place where we need to modify pauseInfo
for this.
pkg/sql/pgwire/command_result.go
line 671 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: let's mention in the comment why we need this method. In particular, we need to allow
Err
calls afterClose
has been called in implicit txns.
Let's add a comment on Err
for future readers.
pkg/sql/pgwire/command_result.go
line 649 at r12 (raw file):
// SetError is part of the sql.RestrictedCommandResult interface. func (r *limitedCommandResult) SetError(err error) {
nit: we shouldn't need this method, right?
3589eee
to
b05b25f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chatted with Rafi today, and we agreed that session var is better. Will add a commit for that shortly.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @yuzefovich)
pkg/sql/conn_executor_exec.go
line 487 at r13 (raw file):
increment executed stmt cnt deferred cleanup function should be executed after we update pauseInfo with the latest retErr and retPayload.
Right, but this part of cleanup is also called after we update pauseInfo with the latest retErr and retPayload as well.
Note that this part is in the closure of processCleanupFunc
, so it's part of the cancel query
cleanup function. We trigger the cleanup from the first defer in execStmtInOpenState()
, which happens after pauseInfo.retPayload
is updated.
In summary, the execution order is is:
- update pauseInfo with the latest retErr and retPayload
- cleanup stack is triggered
- cancel query cleanup func
- increment executed stmt cnt
pkg/sql/pgwire/command_result.go
line 671 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Let's add a comment on
Err
for future readers.
Ah sorry for the confusion, a fixup seems missed -- added now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r18, 1 of 1 files at r19, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @ZhouXing19)
pkg/sql/conn_executor_exec.go
line 487 at r13 (raw file):
Previously, ZhouXing19 (Jane Xing) wrote…
increment executed stmt cnt deferred cleanup function should be executed after we update pauseInfo with the latest retErr and retPayload.
Right, but this part of cleanup is also called after we update pauseInfo with the latest retErr and retPayload as well.
Note that this part is in the closure of
processCleanupFunc
, so it's part of thecancel query
cleanup function. We trigger the cleanup from the first defer inexecStmtInOpenState()
, which happens afterpauseInfo.retPayload
is updated.In summary, the execution order is is:
Hm, I see, this seems very fragile. Perhaps it might be worth introducing a couple of helper methods like
updateRetPayload := func(payload fsm.EventPayload) {
retPayload = payload
if isPausablePortal {
portal.pauseInfo.retPayload = payload
}
}
and the same for retErr
to make it explicit that all updates to retPayload
and retErr
must also update pause info.
As the code written currently it seems very easy to introduce another place where retPayload
or retErr
is updated but forget to update the pause info.
pkg/sql/pgwire/command_result.go
line 657 at r19 (raw file):
// Err is part of the sql.RestrictedCommandResult interface. // Unlike commandResult.Err(), we don't assert the result is not release here.
nit: s/release/released/g
.
b05b25f
to
7753ae8
Compare
7753ae8
to
dce9e09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @yuzefovich)
pkg/sql/conn_executor_exec.go
line 487 at r13 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Hm, I see, this seems very fragile. Perhaps it might be worth introducing a couple of helper methods like
updateRetPayload := func(payload fsm.EventPayload) { retPayload = payload if isPausablePortal { portal.pauseInfo.retPayload = payload } }
and the same for
retErr
to make it explicit that all updates toretPayload
andretErr
must also update pause info.As the code written currently it seems very easy to introduce another place where
retPayload
orretErr
is updated but forget to update the pause info.
Yeah, it is confusing ... Essentially this is for the case where an error happens at a cleanup step, and we need to ensure all cleanup steps after it are able to see this error.
To do this, I have updateRetErrAndPayload(retErr, retPayload)
always called at the end of each cleanup step (I will add test for this in another PR). Does this make sense to you?
dee1dba
to
9a6b91c
Compare
Failed CI is because of a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @yuzefovich and @ZhouXing19)
pkg/sql/vars.go
line 2667 at r60 (raw file):
return formatBoolAsPostgresSetting(evalCtx.SessionData().MultipleActivePortalsEnabled), nil }, GlobalDefault: globalFalse,
are you still planning to make this default value be a metamorphic constant? (can be in a later PR too)
118c710
to
390bdc9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rafiss and @yuzefovich)
pkg/sql/vars.go
line 2667 at r60 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
are you still planning to make this default value be a metamorphic constant? (can be in a later PR too)
Updated it to be a metamorphic constant
390bdc9
to
6e0e5e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! very impressive work
i've just left two nits which can be addressed easily
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @yuzefovich and @ZhouXing19)
pkg/sql/distsql_running.go
line 1648 at r60 (raw file):
if p := planCtx.getPortalPauseInfo(); p != nil { if buildutil.CrdbTestBuild && p.resumableFlow.flow == nil {
is this CrdbTestBuild check intentional?
pkg/sql/prepared_stmt.go
line 189 at r60 (raw file):
// TODO(sql-session): address the compatibility of pausable portals and // prepared_statements_cache_size.
it's fine to leave as a todo - but what is the behavior right now? the cache is ignored? (we can put these details into #99959 since the issue is already linked here)
@yuzefovich @rafiss thanks for reviewing! Following suggestions from Rafi, I made the session var a metamorphic constant, and this revealed quite some bugs in the CI :) To make it easier for review I made a separate PR #100851 for the fix, and after getting approval I'll close #100851 and have changes in this PR. Could you give another round of review on it? Thanks!! |
1dd6c97
to
0fc7a6c
Compare
With the introduction of pausable portals, the comment for `limitedCommandResult` needs to be updated to reflect the current behavior. Release note: None
This change introduces a new session variable for a preview feature. When set to `true`, all non-internal portals with read-only [`SELECT`](../v23.1/selection-queries.html) queries without sub-queries or post-queries can be paused and resumed in an interleaving manner, but are executed with a local plan. Release note (SQL change): Added the session variable `multiple_active_portals_enabled`. This setting is only for a preview feature. When set to `true`, it allows multiple portals to be open at the same time, with their execution interleaved with each other. In other words, these portals can be paused. The underlying statement for a pausable portal must be a read-only `SELECT` query without sub-queries or post-queries, and such a portal is always executed with a local plan.
0fc7a6c
to
90d8ba9
Compare
…e persistence This commit is part of the implementation of multiple active portals. In order to execute portals interleavingly, certain resources need to be persisted and their clean-up must be delayed until the portal is closed. Additionally, these resources don't need to be re-setup when the portal is re-executed. To achieve this, we store the cleanup steps in the `cleanup` function stacks in `portalPauseInfo`, and they are called when any of the following events occur: 1. SQL transaction is committed or rolled back 2. Connection executor is closed 3. An error is encountered when executing the portal 4. The portal is explicited closed by the user The cleanup functions should be called in the original order of a normal portal. Since a portal's execution follows the `execPortal() -> execStmtInOpenState() -> dispatchToExecutionEngine() -> flow.Run()` function flow, we categorize the cleanup functions into 4 "layers", which are stored accordingly in `PreparedPortal.pauseInfo`. The cleanup is always LIFO, following the - resumableFlow.cleanup - dispatchToExecutionEngine.cleanup - execStmtInOpenState.cleanup - exhaustPortal.cleanup` order. Additionally, if an error occurs in any layer, we clean up the current and proceeding layers. For example, if an error occurs in `execStmtInOpenState()`, we perform `resumableFlow.cleanup` and `dispatchToExecutionEngine.cleanup` (proceeding) and then `execStmtInOpenState.cleanup` (current) before returning the error to `execPortal()`, where `exhaustPortal.cleanup` will eventually be called. This is to maintain the previous clean-up process for portals as much as possible. We also pass the `PreparedPortal` as a reference to the planner in `execStmtInOpenState()`,so that the portal's flow can be set and reused. Release note: None
When executing or cleaning up a pausable portal, we may encounter an error and need to run the corresponding clean-up steps, where we need to check the latest `retErr` and `retPayload` rather than the ones evaluated when creating the cleanup functions. To address this, we use portal.pauseInfo.retErr and .retPayload to record the latest error and payload. They need to be updated for each execution. Specifically, 1. If the error happens during portal execution, we ensure `portal.pauseInfo` records the error by adding the following code to the main body of `execStmtInOpenState()`: ``` go defer func() { updateRetErrAndPayload(retErr, retPayload) }() ``` Note this defer should always happen _after_ the defer of running the cleanups. 2. If the error occurs during a certain cleanup step for the pausable portal, we ensure that cleanup steps after it can see the error by always having `updateRetErrAndPayload(retErr, retPayload)` run at the end of each cleanup step. Release note: None
This commit adds several restrictions to pausable portals to ensure that they work properly with the current changes to the consumer-receiver model. Specifically, pausable portals must meet the following criteria: 1. Not be internal queries 2. Be read-only queries 3. Not contain sub-queries or post-queries 4. Only use local plans These restrictions are necessary because the current changes to the consumer-receiver model only consider the local push-based case. Release note: None
When resuming a portal, we always reset the planner. However we still need the planner to respect the outer txn's situation, as we did in cockroachdb#98120. Release note: None
Release note: None
We now only support multiple active portals with local plan, so explicitly disable it for this test for now. Release note: None
90d8ba9
to
199b177
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rafiss and @yuzefovich)
pkg/sql/distsql_running.go
line 1648 at r60 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
is this CrdbTestBuild check intentional?
Yes, it was suggested in #96358 (review)
pkg/sql/prepared_stmt.go
line 189 at r60 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
it's fine to leave as a todo - but what is the behavior right now? the cache is ignored? (we can put these details into #99959 since the issue is already linked here)
I just realized that this is not an issue -- in postgres, deallocating a prepared statement doesn't impact the execution of a portal, and we're doing the same already. I added the subtest deallocating_prepared_stmt_should_not_interrupt_portal_execution
for this as well.
I removed this comment and will close #99959.
I have merged the fixes from #100851. |
Build succeeded: |
blathers backport 23.1 |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 26da6a7 to blathers/backport-release-23.1-99663: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1 failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
This PR replaces #96358 and is part of the initial implementation of multiple active portals.
This PR is to add limited support for multiple active portals. Now portals satisfying all following restrictions can be paused and resumed (i.e., with other queries interleaving it):
And such a portal will only have the statement executed with a non-distributed plan.
This feature is gated by a session variable
multiple_active_portals_enabled
. When it's settrue
, all portals that satisfy the restrictions above will automatically become "pausable" when being created via the pgwireBind
stmt.The core idea of this implementation is
switchToAnotherPortal
status to the result-consumption state machine. When we receive anExecPortal
message for a different portal, we simply return the control to the connExecutor. (sql: addswitchToAnotherPortal
signal for result consumer #99052)flow
queryID
span
andinstrumentationHelper
for the portal, and reuse it when we re-execute a portal. This is to ensure we continue the fetching rather than starting all over. (sql: enable resumption of a flow for pausable portals #99173)Note that we kept the implementation of the original "un-pausable" portal, as we'd like to limit this new functionality only to a small set of statements. Eventually some of them should be replaced (e.g. the limitedCommandResult's lifecycle) with the new code.
Also, we don't support distributed plan yet, as it involves much more complicated changes. See
Start with an entirely local plan
section in the design doc. Support for this will come as a follow-up.Epic: CRDB-17622
Release note (sql change): initial support for multiple active portals. Now with session variable
multiple_active_portals_enabled
set to true, portals satisfying all following restrictions can be executed in an interleaving manner: 1. Not an internal query; 2. Read-only query; 3. No sub-queries or post-queries. And such a portal will only have the statement executed with an entirely local plan.