-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: disable pausable portals for all statements with mutations #108398
sql: disable pausable portals for all statements with mutations #108398
Conversation
This is an alternative approach to #107998 based on review feedback that isn't quite as heavy handed. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rharding6373 and @ZhouXing19)
-- commits
line 14 at r1:
Can these statements be PREPAREd? I'd really like to get rid of IsAllowedToPause
if possible in order to check / revoke pausability in less places.
pkg/sql/conn_executor_exec.go
line 2009 at r1 (raw file):
// un-pausable (normal) portal. // With pauseInfo is nil, no cleanup function will be added to the stack // and all clean-up steps will be performed as for normal portals.
I think planner.curPlan.close
will not be called this way because we already entered the branch for the pausable portal case. What do you think about lifting this "revoke" block outside of makeExecPlan
, one level up the stack? Then it'd be more clear what's going on, and it'll be easier to revoke pausability correctly.
pkg/sql/conn_executor_exec.go
line 2010 at r1 (raw file):
// With pauseInfo is nil, no cleanup function will be added to the stack // and all clean-up steps will be performed as for normal portals. planner.pausablePortal.pauseInfo = nil
nit: should we also increment a telemetry counter (perhaps introduce a new one)?
aac62fc
to
bd5c0a6
Compare
bd5c0a6
to
fde1a75
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFAL, thanks for the reviews so far!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich and @ZhouXing19)
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Can these statements be PREPAREd? I'd really like to get rid of
IsAllowedToPause
if possible in order to check / revoke pausability in less places.
According to the docs (https://www.postgresql.org/docs/current/protocol-flow.html), any operator that doesn't return rows (like TRUNCATE
) will execute to completion, so we don't necessarily have to prevent them from executing in a pausable portal, though @ZhouXing19 had some concerns about this.
pkg/sql/conn_executor_exec.go
line 2009 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I think
planner.curPlan.close
will not be called this way because we already entered the branch for the pausable portal case. What do you think about lifting this "revoke" block outside ofmakeExecPlan
, one level up the stack? Then it'd be more clear what's going on, and it'll be easier to revoke pausability correctly.
Good catch. Hmm... if we move this block before the call to makeOptimizerPlan
we can't use the flag, since the flag is set during that call. I did a small refactor to make sure we close correctly, but it isn't pretty.
pkg/sql/conn_executor_exec.go
line 2010 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: should we also increment a telemetry counter (perhaps introduce a new one)?
We increment sqltelemetry.NotReadOnlyStmtsTriedWithPausablePortals in this block. That seems like the appropriate counter for statements containing mutations.
pkg/sql/sem/tree/stmt.go
line 152 at r2 (raw file):
// Now we only allow SELECT, but is it too strict? And how to filter out // SELECT with data writes / schema changes? func IsAllowedToPause(stmt Statement) bool {
@ZhouXing19 by removing IsAllowedToPause
we're revoking pausable portals for operators in IsMutationOp
and IsDDLOp
in pkg/sql/opt/operator.og.go
instead of the original in operators from CanModifySchema
and CanWriteData
below. I think this should be ok, because the operators that are not included do not return any rows, and should therefore be able to be executed in a pausable portal (it won't pause). Could you please sanity check?
It should be ok to allow statements not returning rows for pausable portals, since with The concern is that in PG for statements of this kind, after executing the statement, the portal will be destroyed and won't be allowed for re-execution: (You can always run the pgtests against pg with
However, our current implementation will keep the portal and the underlying flow (also concerned about if there's any hazard for keeping the flow unreleased for a long time, maybe @yuzefovich knows more about this?) :
I think the easiest way to accommodate this group of statements could be to modify the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also concerned about if there's any hazard for keeping the flow unreleased for a long time, maybe @yuzefovich knows more about this?
Indeed, that can be a concern in a sense that we could be leaking (more precisely, not releasing) some resources (like RAM usage) until the flow is cleaned up.
I think the easiest way to accommodate this group of statements could be to modify the
CreateStatementResult()
to ignore the limit clause and create a normalcommandResult
for them.
I like this idea - if we can determine at PREPARE time that the stmt is such that it won't produce any rows, we should not use the pausable portal execution model for it.
Reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rharding6373 and @ZhouXing19)
pkg/sql/conn_executor_exec.go
line 2009 at r1 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
Good catch. Hmm... if we move this block before the call to
makeOptimizerPlan
we can't use the flag, since the flag is set during that call. I did a small refactor to make sure we close correctly, but it isn't pretty.
I didn't mean it to move this code block to be before makeOptimizerPlan
- I was thinking it would be right after makeExecPlan
is called in the pausable portal branch, something like this (I'll post it separate;y because reviewable messes up the formatting).
pkg/sql/conn_executor_exec.go
line 2010 at r1 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
We increment sqltelemetry.NotReadOnlyStmtsTriedWithPausablePortals in this block. That seems like the appropriate counter for statements containing mutations.
Indeed, I missed it, nvm.
pkg/sql/conn_executor_exec.go
line 1761 at r2 (raw file):
// We don't allow sub / post queries for pausable portal. Set it back to an // un-pausable (normal) portal. // With pauseInfo is nil, no cleanup function will be added to the stack
I think we might have an existing bug here. Namely, we added planner.curPlan.close
as the cleanup step, but now we're overriding pauseInfo
to nil
, so that deferred cleanup step won't be executed. Perhaps leave a TODO to investigated it.
pkg/sql/conn_executor_exec.go
line 2173 at r2 (raw file):
// paused either. if err := res.RevokePortalPausability(); err != nil { res.SetError(err)
I think we should be just returning the error as usual. In dispatchToExecutionEngine
we don't return the error and only write it in res
because the returned errors result in the connection shutdown, makeExecPlan
doesn't have such a contract.
I was thinking something along these lines:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the improvement! should this be considered a "bug fix" in release notes? (if only master
is affected, then no need)
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rharding6373 and @ZhouXing19)
fde1a75
to
814b494
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only fixes master (my original motivation was disabling pausable portals for UDFs with mutations, which will be supported in 23.2) and does a lot of cleanup/refactoring.
I implemented Jane's suggestion to override the limit when creating statement results, PTAL!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich and @ZhouXing19)
pkg/sql/conn_executor_exec.go
line 2009 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I didn't mean it to move this code block to be before
makeOptimizerPlan
- I was thinking it would be right aftermakeExecPlan
is called in the pausable portal branch, something like this (I'll post it separate;y because reviewable messes up the formatting).
Thanks for the suggestion, that's much cleaner.
pkg/sql/conn_executor_exec.go
line 1761 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I think we might have an existing bug here. Namely, we added
planner.curPlan.close
as the cleanup step, but now we're overridingpauseInfo
tonil
, so that deferred cleanup step won't be executed. Perhaps leave a TODO to investigated it.
Done.
pkg/sql/conn_executor_exec.go
line 2173 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I think we should be just returning the error as usual. In
dispatchToExecutionEngine
we don't return the error and only write it inres
because the returned errors result in the connection shutdown,makeExecPlan
doesn't have such a contract.
Fixed and moved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rharding6373 and @ZhouXing19)
pkg/sql/conn_executor_exec.go
line 1641 at r3 (raw file):
// an un-pausable (normal) portal. planner.pausablePortal.pauseInfo = nil if err := res.RevokePortalPausability(); err != nil {
We now have the revoke call twice. Let's remove this one (that contains return
inside of it) - we don't want to return right away to follow how we're processing the error in non-pausable portal branch (we let the code continue and do error handling some lines below).
pkg/sql/conn_executor_exec.go
line 2154 at r3 (raw file):
// should check if it still exists when this function returns. func (ex *connExecutor) makeExecPlan( ctx context.Context, planner *planner, res RestrictedCommandResult,
nit: changes to the comment and the signature are no longer needed.
814b494
to
0c3d1d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich and @ZhouXing19)
pkg/sql/conn_executor_exec.go
line 1641 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
We now have the revoke call twice. Let's remove this one (that contains
return
inside of it) - we don't want to return right away to follow how we're processing the error in non-pausable portal branch (we let the code continue and do error handling some lines below).
Copy paste error. Thanks for the catch.
pkg/sql/conn_executor_exec.go
line 2154 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: changes to the comment and the signature are no longer needed.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working through this!
Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ZhouXing19)
pkg/sql/conn_executor_exec.go
line 2154 at r3 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
Done.
nit: the callsites need to be reverted accordingly too.
0c3d1d4
to
652f2a3
Compare
Not for this PR, but should we therefore keep a cache for pausable portals, and clean up portals when the |
|
This makes sense, but I was thinking more about auto-cleanup the whole portal and its execution dependency (in contrast to cleaning up just the underlying flow but keeping the portal object) to prevent the resource leaking case that you mentioned. But this might not be urgent at this moment. |
Previously we examined the AST to determine whether a statement could be executed in a pausable portal or not. However, this was insufficient to identify volatile UDFs that could also contain mutations. We also revoked pausable portals for operators for whom the portal type didn't matter, since they would be executed to completion regardless. This PR revokes a portal's pausability if the statement's plan contains a mutation or DDL. That is, if the opt builder determines that any operator is a mutation operator (see `IsMutationOp` and `IsDDLOp`). Epic: None Fixes: cockroachdb#107130 Release note: None
652f2a3
to
1a193fc
Compare
TFTRs!
@ZhouXing19 Could you open an issue for this to describe the improvements you envision? I don't think adding a TODO in the code here would do it justice. |
bors r+ |
Build succeeded: |
Previously we examined the AST to determine whether a statement could be executed in a pausable portal or not. However, this was insufficient to identify volatile UDFs that could also contain mutations.
This PR revokes a portal's pausability if the statement's plan contains a mutation. That is, if the opt builder determines that any operator is a mutation operator (see
IsMutationOp
).Although there is some overlap between this PR and the existing
IsAllowedToPause
, this PR leaves the latter in place, since it restricts some statements that are not considered mutation operators, e.g., import operators.Epic: None
Fixes: #107130
Release note: None