sql: support foreign key checks in udfs #104553

rharding6373 · 2023-06-07T21:08:05Z

This change removes restrictions around and adds support for running postquery checks in routines, which allows postquery checks like foreign key constraint checks in UDFs.

Epic: CRDB-25388
Informs: #87289

Release note: None

cockroach-teamcity · 2023-06-07T21:08:14Z

This change is

rharding6373

I will be testing/supporting cascades in a subsequent PR.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @yuzefovich)

pkg/sql/apply_join.go line 353 at r1 (raw file):

			factoryEvalCtx := serialEvalCtx
			if usedConcurrently {
				factoryEvalCtx = params.p.ExtendedEvalContextCopy()

@yuzefovich I manually tested that a UDF with multiple FK checks could do them in parallel, and it works, but I'm not sure that making copies of the extended eval context is the right way to go about supporting that. Could you please review this part? Thanks!

yuzefovich

Reviewed 3 of 13 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @rharding6373)

pkg/sql/apply_join.go line 309 at r1 (raw file):

			ctx,
			params.p,
			params.extendedEvalCtx.copy,

I wonder whether we should be using the same factory method that - in addition to copying the extended eval ctx - also updates a few fields. I'm not sure about placeholders and annotations, but session ID definitely seems worth copying over. Perhaps we should just use the same method.

pkg/sql/apply_join.go line 345 at r1 (raw file):

	defer cleanup()
	var evalCtxFactory func(usedConcurrently bool) *extendedEvalContext
	if len(plan.subqueryPlans) != 0 ||

nit: subqueries are already handled above, so no need to check them here.

pkg/sql/apply_join.go line 349 at r1 (raw file):

		len(plan.checkPlans) != 0 {
		serialEvalCtx := params.p.ExtendedEvalContextCopy()
		evalCtxFactory = func(usedConcurrently bool) *extendedEvalContext {

nit: maybe make it a method on the planner to avoid code duplication with conn_executor_exec.go?

pkg/sql/apply_join.go line 353 at r1 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

@yuzefovich I manually tested that a UDF with multiple FK checks could do them in parallel, and it works, but I'm not sure that making copies of the extended eval context is the right way to go about supporting that. Could you please review this part? Thanks!

This looks good to me - I can't think of anything else that would need special mutex protection for parallel checks.

pkg/sql/distsql_running.go line 1928 at r1 (raw file):

	evalCtxFactory func(usedConcurrently bool) *extendedEvalContext,
) {
	dsp.PlanAndRun(ctx, evalCtx, planCtx, planner.Txn(), plan, recv, finishedSetupFn)

We need to check whether any error has been set on recv and short-circuit if it has (similar to what we do in PlanAndRunAll). I would probably not introduce this PlanAndRunPostquery method and just inline it since it seems pretty simple.

rharding6373

TFTR!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @yuzefovich)

pkg/sql/apply_join.go line 309 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I wonder whether we should be using the same factory method that - in addition to copying the extended eval ctx - also updates a few fields. I'm not sure about placeholders and annotations, but session ID definitely seems worth copying over. Perhaps we should just use the same method.

Done.

pkg/sql/apply_join.go line 345 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: subqueries are already handled above, so no need to check them here.

Done.

pkg/sql/apply_join.go line 349 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: maybe make it a method on the planner to avoid code duplication with conn_executor_exec.go?

Do the new changes reflect what you have in mind? There are a couple differences from the old implementation that I want to highlight, but after looking at the code I think are ok. 1) We only set the factoryEvalCtx fields once after the extendedEvalContext is copied and no longer every time we reuse the context for the serial use case. I think this is ok, since the fields are for session ID and semantics, which I don't expect to change. 2) For subqueries, we copy the planner's extendedEvalContext instead of the runParams extendedEvalContext. I think this is ok, since in the three places we create runParams, its extendedEvalContext is the same as its planner (maybe this is an opportunity to clean up the code, since it doesn't seem necessary to have the extra reference to the extendedEvalContext).

pkg/sql/apply_join.go line 353 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

This looks good to me - I can't think of anything else that would need special mutex protection for parallel checks.

👍

pkg/sql/distsql_running.go line 1928 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

We need to check whether any error has been set on recv and short-circuit if it has (similar to what we do in PlanAndRunAll). I would probably not introduce this PlanAndRunPostquery method and just inline it since it seems pretty simple.

Done.

yuzefovich

Reviewed 1 of 13 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @rharding6373)

pkg/sql/apply_join.go line 349 at r1 (raw file):

We only set the factoryEvalCtx fields once after the extendedEvalContext is copied and no longer every time we reuse the context for the serial use case. I think this is ok, since the fields are for session ID and semantics, which I don't expect to change.

True, the session ID and semantics aren't expected to change, but the subquery could modify some other fields in the eval context, and we probably don't want to have those changes be propagated to consequent subqueries. It's possible that this won't break anything, but it's not a no-op change, and I don't see the reason for why we'd do it in this PR.

pkg/sql/apply_join.go line 310 at r2 (raw file):

			ctx,
			params.p,
			func() *extendedEvalContext { return evalContextSubquery },

I think we don't want to have a single evalContextSubquery for all subqueries - we should be creating a fresh context for each invocation of this anonymous function.

pkg/sql/apply_join.go line 360 at r2 (raw file):

		ctx, evalCtx, planCtx, plannerCopy.Txn(), plan.main, recv, finishedSetupFn,
	)
	execCfg.DistSQLPlanner.PlanAndRunCascadesAndChecks(

We should check the error before running the cascades and checks, similar to what we do in PlanAndRunAll.

pkg/sql/conn_executor_exec.go line 2090 at r2 (raw file):

					ex.initEvalCtx(ctx, factoryEvalCtx, planner)
				}
				ex.resetEvalCtx(factoryEvalCtx, planner.txn, planner.ExtendedEvalContext().StmtTimestamp)

resetEvalCtx internally unsets Placeholders and Annotations fields, and I don't think we want that. Perhaps "copy and reset" is not the right new helper method. The way I see it is that it'd be good to extract "binding" the three fields (Placeholders, Annotations, and SessionID) of the eval context of sub- and post-queries to the current stmt / session.

pkg/sql/logictest/testdata/logic_test/udf_fk line 116 at r2 (raw file):

SELECT f_fk_c_seq_first(103,2);

statement ok

nit: should we explicitly check the value here rather than only that this statement didn't fail?

mgartner · 2023-06-13T18:00:27Z

-- commits line 6 at r2:
nit: Some more detail on how this works vs. how it worked previously would be helpful.

mgartner

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @rharding6373)

pkg/sql/opt/exec/execbuilder/scalar.go line 1115 at r2 (raw file):

			}
			if len(eb.checks) > 0 {
				return expectedLazyRoutineError("check")

I'm confused how this is working if we are ignoring checks built in this execbuilder. How are those checks executed?

mgartner · 2023-06-13T20:58:28Z

pkg/sql/opt/exec/execbuilder/scalar.go line 1115 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I'm confused how this is working if we are ignoring checks built in this execbuilder. How are those checks executed?

Nevermind, I see now that runPlanInsidePlan is running these checks.

This approach is much simpler than what I had in mind. I tried doing something like this for running subqueries inside routines but wasn't able to get it to work, but don't remember why. Are there any downsides you see with this approach?

mgartner · 2023-06-15T16:02:16Z

pkg/sql/conn_executor_exec.go line 2090 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

resetEvalCtx internally unsets Placeholders and Annotations fields, and I don't think we want that. Perhaps "copy and reset" is not the right new helper method. The way I see it is that it'd be good to extract "binding" the three fields (Placeholders, Annotations, and SessionID) of the eval context of sub- and post-queries to the current stmt / session.

Statements within a UDF there should be no placeholders after planning. So as long as factoryEvalCtx is a copy, unsetting Placeholders should be ok. I'm less familiar with Annotations, maybe they are needed?

rharding6373

TFTRs! I think we're getting closer on the extendedEvalContext refactor...

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @yuzefovich)

-- commits line 6 at r2:

Previously, mgartner (Marcus Gartner) wrote…

nit: Some more detail on how this works vs. how it worked previously would be helpful.

Done. PTAL.

pkg/sql/apply_join.go line 349 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

We only set the factoryEvalCtx fields once after the extendedEvalContext is copied and no longer every time we reuse the context for the serial use case. I think this is ok, since the fields are for session ID and semantics, which I don't expect to change.

True, the session ID and semantics aren't expected to change, but the subquery could modify some other fields in the eval context, and we probably don't want to have those changes be propagated to consequent subqueries. It's possible that this won't break anything, but it's not a no-op change, and I don't see the reason for why we'd do it in this PR.

We now reset the serial eval ctx before returning it. PTAL.

pkg/sql/apply_join.go line 310 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I think we don't want to have a single evalContextSubquery for all subqueries - we should be creating a fresh context for each invocation of this anonymous function.

Oh right this was a copy function before. I think I've fixed it now, ptal.

pkg/sql/apply_join.go line 360 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

We should check the error before running the cascades and checks, similar to what we do in PlanAndRunAll.

Done.

pkg/sql/conn_executor_exec.go line 2090 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Statements within a UDF there should be no placeholders after planning. So as long as factoryEvalCtx is a copy, unsetting Placeholders should be ok. I'm less familiar with Annotations, maybe they are needed?

I've made a separate helper function to reset that we can call here. PTAL.

pkg/sql/opt/exec/execbuilder/scalar.go line 1115 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Nevermind, I see now that runPlanInsidePlan is running these checks.

This approach is much simpler than what I had in mind. I tried doing something like this for running subqueries inside routines but wasn't able to get it to work, but don't remember why. Are there any downsides you see with this approach?

We'll see if it works with cascades (doesn't seem to be a reason why it shouldn't). One downside is that we may be duplicating work that may be hard to optimize using this approach. For example:

CREATE TABLE parent (p INT PRIMARY KEY);
CREATE TABLE child (c INT PRIMARY KEY, p INT NOT NULL REFERENCES parent(p));
CREATE FUNCTION f_fk_c(k INT, r INT) RETURNS RECORD AS $$
  INSERT INTO child VALUES (k,r) RETURNING *;
$$ LANGUAGE SQL;
CREATE FUNCTION f_fk_c2(k1 INT, r1 INT, k2 INT, r2 INT) RETURNS RECORD AS $$
  INSERT INTO child VALUES (k1,r1) RETURNING *;
  INSERT INTO child VALUES (k2,r2) RETURNING *;
$$ LANGUAGE SQL;
INSERT INTO parent VALUES (1),(2),(3);
-- Q1
SELECT f_fk_c(100,3), f_fk_c(101,3);
-- Q2
SELECT f_fk_c2(102, 2, 103, 2)

In both Q1 and Q2 we are doing an FK check for the same values twice. If the first FK check passes and there are no modifications to the parent before the second, the second is guaranteed to pass. I'm not sure if this would be easier to optimize using another approach or not.

pkg/sql/logictest/testdata/logic_test/udf_fk line 116 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: should we explicitly check the value here rather than only that this statement didn't fail?

Done.

yuzefovich

Reviewed 3 of 4 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @rharding6373)

pkg/sql/apply_join.go line 15 at r3 (raw file):

import (
	"context"
	"github.com/cockroachdb/cockroach/pkg/util/buildutil"

nit: out of order import.

pkg/sql/apply_join.go line 302 at r3 (raw file):

			params.p.curPlan.subqueryPlans = oldSubqueries
		}()
		// Create a separate memory account for the results of the subqueries.

nit: move this comment to be right above creating subqueryResultMemAcc.

pkg/sql/apply_join.go line 365 at r3 (raw file):

		ctx, evalCtx, planCtx, plannerCopy.Txn(), plan.main, recv, finishedSetupFn,
	)
	if p := planCtx.getPortalPauseInfo(); p != nil {

nit: I'd probably not include the part about pausable portals here since we currently don't support pausable portals for stmts that write data. That said, we should consider explicitly disabling the pausable portals feature for UDFs with mutations. Take a look at connExecutor.makePreparedPortal.

pkg/sql/conn_executor_exec.go line 2090 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

I've made a separate helper function to reset that we can call here. PTAL.

I think the order of things is still not the right one.

Here is what the code was doing previously:

create and initialize eval context
call resetEvalCtx on it (which internally unsets some fields, including Placeholders and Annotations)
update Placeholders, Annotations, and SessionID fields based on the planner.

What the code is doing now:

create and initialize eval context as well as "reset" it (which in this case means setting Placeholders, Annotations, and SessionID fields based on the planner).
call resetEvalCtx on the eval context, which unsets Placeholders and Annotations fields.

Now we have that Placeholders and Annotations fields are left nil, and that's not what we want.

mgartner

modulo Yahor's comments.

Reviewed 11 of 13 files at r1, 2 of 4 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @rharding6373)

pkg/sql/opt/exec/execbuilder/scalar.go line 1115 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

We'll see if it works with cascades (doesn't seem to be a reason why it shouldn't). One downside is that we may be duplicating work that may be hard to optimize using this approach. For example:
CREATE TABLE parent (p INT PRIMARY KEY);
CREATE TABLE child (c INT PRIMARY KEY, p INT NOT NULL REFERENCES parent(p));
CREATE FUNCTION f_fk_c(k INT, r INT) RETURNS RECORD AS $$
  INSERT INTO child VALUES (k,r) RETURNING *;
$$ LANGUAGE SQL;
CREATE FUNCTION f_fk_c2(k1 INT, r1 INT, k2 INT, r2 INT) RETURNS RECORD AS $$
  INSERT INTO child VALUES (k1,r1) RETURNING *;
  INSERT INTO child VALUES (k2,r2) RETURNING *;
$$ LANGUAGE SQL;
INSERT INTO parent VALUES (1),(2),(3);
-- Q1
SELECT f_fk_c(100,3), f_fk_c(101,3);
-- Q2
SELECT f_fk_c2(102, 2, 103, 2)
In both Q1 and Q2 we are doing an FK check for the same values twice. If the first FK check passes and there are no modifications to the parent before the second, the second is guaranteed to pass. I'm not sure if this would be easier to optimize using another approach or not.

I think this is fine. A DEFERRED constraint would probably be the best way to get around this, but we don't support that yet. In this case we do want a check after each individual INSERT. Optimizing away checks across multiple statements seems error prone and probably not worth the trouble.

pkg/sql/logictest/testdata/logic_test/udf_fk line 116 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: should we explicitly check the value here rather than only that this statement didn't fail?

+1

DrewKimball

Reviewed 10 of 13 files at r1, 4 of 4 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)

pkg/sql/logictest/testdata/logic_test/udf_fk line 62 at r3 (raw file):


query TT
SELECT f_fk_p(3), f_fk_c(102, 3);

The fact this succeeds seems like it might be problematic - we don't make any guarantees about the order in which these expression are evaluated, do we?

mgartner · 2023-06-23T13:00:28Z

pkg/sql/logictest/testdata/logic_test/udf_fk line 62 at r3 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

The fact this succeeds seems like it might be problematic - we don't make any guarantees about the order in which these expression are evaluated, do we?

Good catch - we don't guarantee evaluation order here.

rharding6373

TFTRs, and apologies for the delay turning this around. PTAL.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, and @yuzefovich)

pkg/sql/apply_join.go line 365 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: I'd probably not include the part about pausable portals here since we currently don't support pausable portals for stmts that write data. That said, we should consider explicitly disabling the pausable portals feature for UDFs with mutations. Take a look at connExecutor.makePreparedPortal.

I don't know enough about pausable portals yet to say whether we should allow them for UDF statements that are read-only, but the UDF body contains a write. I have omitted this as you suggested (but left the nil assignment above).

pkg/sql/conn_executor_exec.go line 2090 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I think the order of things is still not the right one.

Here is what the code was doing previously:

create and initialize eval context

call resetEvalCtx on it (which internally unsets some fields, including Placeholders and Annotations)

update Placeholders, Annotations, and SessionID fields based on the planner.

What the code is doing now:

create and initialize eval context as well as "reset" it (which in this case means setting Placeholders, Annotations, and SessionID fields based on the planner).

call resetEvalCtx on the eval context, which unsets Placeholders and Annotations fields.

Now we have that Placeholders and Annotations fields are left nil, and that's not what we want.

Right below this we call planner.ExtendedEvalContextReset(factoryEvalCtx) which sets those fields based on the planner. Does that line up with your current understanding of the order of setting/unsetting values?

pkg/sql/logictest/testdata/logic_test/udf_fk line 62 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Good catch - we don't guarantee evaluation order here.

Thank you for catching this. I removed these tests as they're not useful.

DrewKimball

Reviewed 14 of 14 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @yuzefovich)

yuzefovich

Reviewed 14 of 14 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, and @rharding6373)

pkg/sql/apply_join.go line 365 at r3 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

I don't know enough about pausable portals yet to say whether we should allow them for UDF statements that are read-only, but the UDF body contains a write. I have omitted this as you suggested (but left the nil assignment above).

Let's spend a bit time on this here. I don't think that the code that we copy-pasted from PlanAndRunAll that is related to pausable portals will work as expected, so I wouldn't blindly introduce it here - it also adds unnecessary complexity to this runPlanInsidePlan method.

I believe things can actually be quite simple - we don't need to support "pausable portal execution model" in this method at all. What matters for pausable portals is the fact that we can "paginate" the output of the main "outer" query, but here we're in the "inner" query context, and we actually want to run it fully to completion (before we produce any "outer" rows). In fact, we already ensure this by nilling out plannerCopy.pausablePortal field above. So I think we can (and should) just remove this whole if block as unnecessary.

However, I'm a bit concerned that we have an existing bug with pausable portals of stmts that have apply join with the inner plan having a subquery. Subqueries are disallowed for pausable portals, but in that case the outer plan doesn't have any subqueries, so it'd pass the existing check. I think we want for subqueries evaluation to also use plannerCopy with unset pausablePortal field. I'd apply a diff like this:

diff --git a/pkg/sql/apply\_join.go b/pkg/sql/apply\_join.go  
index d22e84a0e37..4a3aa41d0d8 100644  
\--- a/pkg/sql/apply\_join.go  
+++ b/pkg/sql/apply\_join.go  
@@ -277,6 +277,12 @@ func runPlanInsidePlan(  
        )  
        defer recv.Release()  
   
\+       plannerCopy := \*params.p  
\+       // If we reach this part when re-executing a pausable portal, we won't want to  
\+       // resume the flow bound to it. The inner-plan should have its own lifecycle  
\+       // for its flow.  
\+       plannerCopy.pausablePortal = nil  
+  
        if len(plan.subqueryPlans) != 0 {  
                // We currently don't support cases when both the "inner" and the  
                // "outer" plans have subqueries due to limitations of how we're  
@@ -291,13 +297,8 @@ func runPlanInsidePlan(  
                // plan (and we know there are none given the check above). If parts of  
                // the "inner" plan refer to the subqueries, we know that they must  
                // refer to the "inner" subqueries. To allow for that to happen we have  
\-               // to manually replace the subqueries on the planner's curPlan and  
\-               // restore the original state before exiting.  
\-               oldSubqueries := params.p.curPlan.subqueryPlans  
\-               params.p.curPlan.subqueryPlans = plan.subqueryPlans  
\-               defer func() {  
\-                       params.p.curPlan.subqueryPlans = oldSubqueries  
\-               }()  
\+               // to manually replace the subqueries on the planner's curPlan.  
\+               plannerCopy.curPlan.subqueryPlans = plan.subqueryPlans  
                // Create a separate memory account for the results of the subqueries.  
                // Note that we intentionally defer the closure of the account until we  
                // return from this method (after the main query is executed).  
@@ -305,7 +306,7 @@ func runPlanInsidePlan(  
                defer subqueryResultMemAcc.Close(ctx)  
                if !execCfg.DistSQLPlanner.PlanAndRunSubqueries(  
                        ctx,  
\-                       params.p,  
\+                       &plannerCopy,  
                        params.extendedEvalCtx.copy,  
                        plan.subqueryPlans,  
                        recv,  
@@ -319,11 +320,6 @@ func runPlanInsidePlan(  
   
        // Make a copy of the EvalContext so it can be safely modified.  
        evalCtx := params.p.ExtendedEvalContextCopy()  
\-       plannerCopy := \*params.p  
\-       // If we reach this part when re-executing a pausable portal, we won't want to  
\-       // resume the flow bound to it. The inner-plan should have its own lifecycle  
\-       // for its flow.  
\-       plannerCopy.pausablePortal = nil  
        distributePlan := getPlanDistribution(  
                ctx, plannerCopy.Descriptors().HasUncommittedTypes(),  
                plannerCopy.SessionData().DistSQLMode, plan.main,

(apologies, reviewable for some reason inserted back slashes in some places)

It might be worth to come up with a query with an apply join with an "inner" subquery and see what happens when this stmt is executed in pausable portal way. I'm guessing it wouldn't work as expected (it might get stuck indefinitely), so it might be good to extract this diff into a separate commit that is also backported to 23.1.

pkg/sql/conn_executor_exec.go line 2090 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Right below this we call planner.ExtendedEvalContextReset(factoryEvalCtx) which sets those fields based on the planner. Does that line up with your current understanding of the order of setting/unsetting values?

Oh yeah, I missed that, nvm.

rharding6373

Reviving this again now that I think the pausable portal issues with volatile UDFs is being resolved. Thanks for your patience, RFAL.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @mgartner, and @yuzefovich)

pkg/sql/apply_join.go line 365 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Let's spend a bit time on this here. I don't think that the code that we copy-pasted from PlanAndRunAll that is related to pausable portals will work as expected, so I wouldn't blindly introduce it here - it also adds unnecessary complexity to this runPlanInsidePlan method.

I believe things can actually be quite simple - we don't need to support "pausable portal execution model" in this method at all. What matters for pausable portals is the fact that we can "paginate" the output of the main "outer" query, but here we're in the "inner" query context, and we actually want to run it fully to completion (before we produce any "outer" rows). In fact, we already ensure this by nilling out plannerCopy.pausablePortal field above. So I think we can (and should) just remove this whole if block as unnecessary.

However, I'm a bit concerned that we have an existing bug with pausable portals of stmts that have apply join with the inner plan having a subquery. Subqueries are disallowed for pausable portals, but in that case the outer plan doesn't have any subqueries, so it'd pass the existing check. I think we want for subqueries evaluation to also use plannerCopy with unset pausablePortal field. I'd apply a diff like this:
diff --git a/pkg/sql/apply\_join.go b/pkg/sql/apply\_join.go  
index d22e84a0e37..4a3aa41d0d8 100644  
\--- a/pkg/sql/apply\_join.go  
+++ b/pkg/sql/apply\_join.go  
@@ -277,6 +277,12 @@ func runPlanInsidePlan(  
        )  
        defer recv.Release()  
   
\+       plannerCopy := \*params.p  
\+       // If we reach this part when re-executing a pausable portal, we won't want to  
\+       // resume the flow bound to it. The inner-plan should have its own lifecycle  
\+       // for its flow.  
\+       plannerCopy.pausablePortal = nil  
+  
        if len(plan.subqueryPlans) != 0 {  
                // We currently don't support cases when both the "inner" and the  
                // "outer" plans have subqueries due to limitations of how we're  
@@ -291,13 +297,8 @@ func runPlanInsidePlan(  
                // plan (and we know there are none given the check above). If parts of  
                // the "inner" plan refer to the subqueries, we know that they must  
                // refer to the "inner" subqueries. To allow for that to happen we have  
\-               // to manually replace the subqueries on the planner's curPlan and  
\-               // restore the original state before exiting.  
\-               oldSubqueries := params.p.curPlan.subqueryPlans  
\-               params.p.curPlan.subqueryPlans = plan.subqueryPlans  
\-               defer func() {  
\-                       params.p.curPlan.subqueryPlans = oldSubqueries  
\-               }()  
\+               // to manually replace the subqueries on the planner's curPlan.  
\+               plannerCopy.curPlan.subqueryPlans = plan.subqueryPlans  
                // Create a separate memory account for the results of the subqueries.  
                // Note that we intentionally defer the closure of the account until we  
                // return from this method (after the main query is executed).  
@@ -305,7 +306,7 @@ func runPlanInsidePlan(  
                defer subqueryResultMemAcc.Close(ctx)  
                if !execCfg.DistSQLPlanner.PlanAndRunSubqueries(  
                        ctx,  
\-                       params.p,  
\+                       &plannerCopy,  
                        params.extendedEvalCtx.copy,  
                        plan.subqueryPlans,  
                        recv,  
@@ -319,11 +320,6 @@ func runPlanInsidePlan(  
   
        // Make a copy of the EvalContext so it can be safely modified.  
        evalCtx := params.p.ExtendedEvalContextCopy()  
\-       plannerCopy := \*params.p  
\-       // If we reach this part when re-executing a pausable portal, we won't want to  
\-       // resume the flow bound to it. The inner-plan should have its own lifecycle  
\-       // for its flow.  
\-       plannerCopy.pausablePortal = nil  
        distributePlan := getPlanDistribution(  
                ctx, plannerCopy.Descriptors().HasUncommittedTypes(),  
                plannerCopy.SessionData().DistSQLMode, plan.main,
(apologies, reviewable for some reason inserted back slashes in some places)

It might be worth to come up with a query with an apply join with an "inner" subquery and see what happens when this stmt is executed in pausable portal way. I'm guessing it wouldn't work as expected (it might get stuck indefinitely), so it might be good to extract this diff into a separate commit that is also backported to 23.1.

This has been fixed by #106659, I think, so plannerCopy.pausablePortal is null above. I also have a PR open to disable pausable portals for queries with any function call: #107998.

pkg/sql/conn_executor_exec.go line 2090 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Oh yeah, I missed that, nvm.

Done.

yuzefovich

Thanks for pushing on this,

Reviewed 3 of 3 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball and @rharding6373)

pkg/sql/apply_join.go line 348 at r5 (raw file):

		return evalCtxFactory()
	}

nit: we should also check resultWriter.Err() here - no point in running cascades and checks if we already hit an error.

pkg/sql/pgwire/testdata/pgtest/multiple_active_portals line 1310 at r5 (raw file):

subtest end

subtest mytest

Did you mean to check this change in?

Before this change, we would return an error if a UDF attempted to make a foreign key check because checks were not supported in routines. This change adds support for running postquery checks, like FK checks, in routines. It leverages the existing DistSQL postquery planner to run checks built for a routine's statement after the statement has been planned and run. This also allows UDFs to take advantage of the parallel FK check capabilities. Note that foreign key checks are run after each statement in the UDF body, as well as after the main query if required. This change also refactors how extendedEvalContexts are copied for parallel checks. Support for FK cascades will come in a later PR. Epic: CRDB-25388 Informs: cockroachdb#87289 Release note: None

rharding6373

Thanks for all the reviews!

Reviewable status: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @DrewKimball and @yuzefovich)

pkg/sql/apply_join.go line 348 at r5 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: we should also check resultWriter.Err() here - no point in running cascades and checks if we already hit an error.

Good point. Looks like recv.commErr contains the errors from resultWriter, so I've used it here, too.

pkg/sql/pgwire/testdata/pgtest/multiple_active_portals line 1310 at r5 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Did you mean to check this change in?

No...Done.

rharding6373 · 2023-08-02T15:45:37Z

bors r+

craig · 2023-08-02T16:39:43Z

Build succeeded:

Bazel Essential CI (Cockroach)

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (and 3 stale)

pkg/sql/apply_join.go line 348 at r5 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Good point. Looks like recv.commErr contains the errors from resultWriter, so I've used it here, too.

Note that it is possible that resultWriter.Err() is set while recv.commErr is nil, and I think we want to check both for short-circuiting. (To be more precise, if we have commErr, then it's also set in resultWriter, but the opposite is not true, so we either need to check resultWriter.Err() only, or both, but not just recv.commErr.)

The reason for separation of errors is that "communication errors" are stronger in a sense that they result in the shutdown of the connection (on the main execution path) whereas "benign" query errors are written into resultWriter. In this apply join code path the notion of "communication errors" doesn't seem to really apply though.

In cockroachdb#104553, we modified `runPlanInsidePlan` to plan and run checks after planning and running the inside plan. Although we check for comm errors after the initial call to `PlanAndRun` to see if we should exit early, we neglected to also check the result writer for errors. Since not all result writer errors may be copied into the comm errors, we fix that here. Epic: None Informs: cockroachdb#87289 Release note: None

108126: sql: check for resultWriter error in runPlanInsidePlan r=rharding6373 a=rharding6373 In #104553, we modified `runPlanInsidePlan` to plan and run checks after planning and running the inside plan. Although we check for comm errors after the initial call to `PlanAndRun` to see if we should exit early, we neglected to also check the result writer for errors. Since not all result writer errors may be copied into the comm errors, we fix that here. Epic: None Informs: #87289 Release note: None Co-authored-by: rharding6373 <[email protected]>

rharding6373 requested review from mgartner, yuzefovich and DrewKimball June 7, 2023 21:08

rharding6373 requested a review from a team as a code owner June 7, 2023 21:08

rharding6373 commented Jun 7, 2023

View reviewed changes

yuzefovich reviewed Jun 8, 2023

View reviewed changes

rharding6373 force-pushed the 20230601_udf_fk_87289 branch from 956b786 to 1e44351 Compare June 9, 2023 20:06

rharding6373 commented Jun 9, 2023

View reviewed changes

yuzefovich reviewed Jun 10, 2023

View reviewed changes

mgartner reviewed Jun 13, 2023

View reviewed changes

rharding6373 force-pushed the 20230601_udf_fk_87289 branch from 1e44351 to a368bb3 Compare June 16, 2023 00:01

rharding6373 commented Jun 16, 2023

View reviewed changes

yuzefovich reviewed Jun 20, 2023

View reviewed changes

mgartner approved these changes Jun 22, 2023

View reviewed changes

DrewKimball reviewed Jun 22, 2023

View reviewed changes

rharding6373 force-pushed the 20230601_udf_fk_87289 branch from a368bb3 to e76c4a9 Compare July 11, 2023 17:47

rharding6373 commented Jul 11, 2023

View reviewed changes

DrewKimball approved these changes Jul 11, 2023

View reviewed changes

yuzefovich reviewed Jul 12, 2023

View reviewed changes

chengxiong-ruan mentioned this pull request Jul 25, 2023

sql: undefined behavior using pausable portals with volatile udfs #107127

Closed

rharding6373 force-pushed the 20230601_udf_fk_87289 branch from e76c4a9 to 4fca6b2 Compare August 2, 2023 00:02

rharding6373 requested review from a team as code owners August 2, 2023 00:02

rharding6373 commented Aug 2, 2023

View reviewed changes

rharding6373 requested a review from yuzefovich August 2, 2023 00:09

yuzefovich approved these changes Aug 2, 2023

View reviewed changes

rharding6373 force-pushed the 20230601_udf_fk_87289 branch from 4fca6b2 to 54c3e26 Compare August 2, 2023 13:42

rharding6373 commented Aug 2, 2023

View reviewed changes

craig bot merged commit 05ad3c4 into cockroachdb:master Aug 2, 2023

yuzefovich reviewed Aug 2, 2023

View reviewed changes

rharding6373 mentioned this pull request Aug 3, 2023

sql: check for resultWriter error in runPlanInsidePlan #108126

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: support foreign key checks in udfs #104553

sql: support foreign key checks in udfs #104553

rharding6373 commented Jun 7, 2023

cockroach-teamcity commented Jun 7, 2023

rharding6373 left a comment

yuzefovich left a comment

rharding6373 left a comment

yuzefovich left a comment

mgartner commented Jun 13, 2023

mgartner left a comment

mgartner commented Jun 13, 2023

mgartner commented Jun 15, 2023

rharding6373 left a comment

yuzefovich left a comment

mgartner left a comment

DrewKimball left a comment

mgartner commented Jun 23, 2023

rharding6373 left a comment

DrewKimball left a comment

yuzefovich left a comment

rharding6373 left a comment

yuzefovich left a comment

rharding6373 left a comment

rharding6373 commented Aug 2, 2023

craig bot commented Aug 2, 2023

yuzefovich left a comment

sql: support foreign key checks in udfs #104553

sql: support foreign key checks in udfs #104553

Conversation

rharding6373 commented Jun 7, 2023

cockroach-teamcity commented Jun 7, 2023

rharding6373 left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

rharding6373 left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

mgartner commented Jun 13, 2023

mgartner left a comment

Choose a reason for hiding this comment

mgartner commented Jun 13, 2023

mgartner commented Jun 15, 2023

rharding6373 left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

mgartner left a comment

Choose a reason for hiding this comment

DrewKimball left a comment

Choose a reason for hiding this comment

mgartner commented Jun 23, 2023

rharding6373 left a comment

Choose a reason for hiding this comment

DrewKimball left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

rharding6373 left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

rharding6373 left a comment

Choose a reason for hiding this comment

rharding6373 commented Aug 2, 2023

craig bot commented Aug 2, 2023

yuzefovich left a comment

Choose a reason for hiding this comment