Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support foreign key checks in udfs #104553

Merged
merged 1 commit into from
Aug 2, 2023

Conversation

rharding6373
Copy link
Collaborator

This change removes restrictions around and adds support for running postquery checks in routines, which allows postquery checks like foreign key constraint checks in UDFs.

Epic: CRDB-25388
Informs: #87289

Release note: None

@rharding6373 rharding6373 requested a review from a team as a code owner June 7, 2023 21:08
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator Author

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be testing/supporting cascades in a subsequent PR.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @yuzefovich)


pkg/sql/apply_join.go line 353 at r1 (raw file):

			factoryEvalCtx := serialEvalCtx
			if usedConcurrently {
				factoryEvalCtx = params.p.ExtendedEvalContextCopy()

@yuzefovich I manually tested that a UDF with multiple FK checks could do them in parallel, and it works, but I'm not sure that making copies of the extended eval context is the right way to go about supporting that. Could you please review this part? Thanks!

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 13 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @rharding6373)


pkg/sql/apply_join.go line 309 at r1 (raw file):

			ctx,
			params.p,
			params.extendedEvalCtx.copy,

I wonder whether we should be using the same factory method that - in addition to copying the extended eval ctx - also updates a few fields. I'm not sure about placeholders and annotations, but session ID definitely seems worth copying over. Perhaps we should just use the same method.


pkg/sql/apply_join.go line 345 at r1 (raw file):

	defer cleanup()
	var evalCtxFactory func(usedConcurrently bool) *extendedEvalContext
	if len(plan.subqueryPlans) != 0 ||

nit: subqueries are already handled above, so no need to check them here.


pkg/sql/apply_join.go line 349 at r1 (raw file):

		len(plan.checkPlans) != 0 {
		serialEvalCtx := params.p.ExtendedEvalContextCopy()
		evalCtxFactory = func(usedConcurrently bool) *extendedEvalContext {

nit: maybe make it a method on the planner to avoid code duplication with conn_executor_exec.go?


pkg/sql/apply_join.go line 353 at r1 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

@yuzefovich I manually tested that a UDF with multiple FK checks could do them in parallel, and it works, but I'm not sure that making copies of the extended eval context is the right way to go about supporting that. Could you please review this part? Thanks!

This looks good to me - I can't think of anything else that would need special mutex protection for parallel checks.


pkg/sql/distsql_running.go line 1928 at r1 (raw file):

	evalCtxFactory func(usedConcurrently bool) *extendedEvalContext,
) {
	dsp.PlanAndRun(ctx, evalCtx, planCtx, planner.Txn(), plan, recv, finishedSetupFn)

We need to check whether any error has been set on recv and short-circuit if it has (similar to what we do in PlanAndRunAll). I would probably not introduce this PlanAndRunPostquery method and just inline it since it seems pretty simple.

@rharding6373 rharding6373 force-pushed the 20230601_udf_fk_87289 branch from 956b786 to 1e44351 Compare June 9, 2023 20:06
Copy link
Collaborator Author

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @yuzefovich)


pkg/sql/apply_join.go line 309 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I wonder whether we should be using the same factory method that - in addition to copying the extended eval ctx - also updates a few fields. I'm not sure about placeholders and annotations, but session ID definitely seems worth copying over. Perhaps we should just use the same method.

Done.


pkg/sql/apply_join.go line 345 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: subqueries are already handled above, so no need to check them here.

Done.


pkg/sql/apply_join.go line 349 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: maybe make it a method on the planner to avoid code duplication with conn_executor_exec.go?

Do the new changes reflect what you have in mind? There are a couple differences from the old implementation that I want to highlight, but after looking at the code I think are ok. 1) We only set the factoryEvalCtx fields once after the extendedEvalContext is copied and no longer every time we reuse the context for the serial use case. I think this is ok, since the fields are for session ID and semantics, which I don't expect to change. 2) For subqueries, we copy the planner's extendedEvalContext instead of the runParams extendedEvalContext. I think this is ok, since in the three places we create runParams, its extendedEvalContext is the same as its planner (maybe this is an opportunity to clean up the code, since it doesn't seem necessary to have the extra reference to the extendedEvalContext).


pkg/sql/apply_join.go line 353 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

This looks good to me - I can't think of anything else that would need special mutex protection for parallel checks.

👍


pkg/sql/distsql_running.go line 1928 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

We need to check whether any error has been set on recv and short-circuit if it has (similar to what we do in PlanAndRunAll). I would probably not introduce this PlanAndRunPostquery method and just inline it since it seems pretty simple.

Done.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 13 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @rharding6373)


pkg/sql/apply_join.go line 349 at r1 (raw file):

  1. We only set the factoryEvalCtx fields once after the extendedEvalContext is copied and no longer every time we reuse the context for the serial use case. I think this is ok, since the fields are for session ID and semantics, which I don't expect to change.

True, the session ID and semantics aren't expected to change, but the subquery could modify some other fields in the eval context, and we probably don't want to have those changes be propagated to consequent subqueries. It's possible that this won't break anything, but it's not a no-op change, and I don't see the reason for why we'd do it in this PR.


pkg/sql/apply_join.go line 310 at r2 (raw file):

			ctx,
			params.p,
			func() *extendedEvalContext { return evalContextSubquery },

I think we don't want to have a single evalContextSubquery for all subqueries - we should be creating a fresh context for each invocation of this anonymous function.


pkg/sql/apply_join.go line 360 at r2 (raw file):

		ctx, evalCtx, planCtx, plannerCopy.Txn(), plan.main, recv, finishedSetupFn,
	)
	execCfg.DistSQLPlanner.PlanAndRunCascadesAndChecks(

We should check the error before running the cascades and checks, similar to what we do in PlanAndRunAll.


pkg/sql/conn_executor_exec.go line 2090 at r2 (raw file):

					ex.initEvalCtx(ctx, factoryEvalCtx, planner)
				}
				ex.resetEvalCtx(factoryEvalCtx, planner.txn, planner.ExtendedEvalContext().StmtTimestamp)

resetEvalCtx internally unsets Placeholders and Annotations fields, and I don't think we want that. Perhaps "copy and reset" is not the right new helper method. The way I see it is that it'd be good to extract "binding" the three fields (Placeholders, Annotations, and SessionID) of the eval context of sub- and post-queries to the current stmt / session.


pkg/sql/logictest/testdata/logic_test/udf_fk line 116 at r2 (raw file):

SELECT f_fk_c_seq_first(103,2);

statement ok

nit: should we explicitly check the value here rather than only that this statement didn't fail?

@mgartner
Copy link
Collaborator

-- commits line 6 at r2:
nit: Some more detail on how this works vs. how it worked previously would be helpful.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @rharding6373)


pkg/sql/opt/exec/execbuilder/scalar.go line 1115 at r2 (raw file):

			}
			if len(eb.checks) > 0 {
				return expectedLazyRoutineError("check")

I'm confused how this is working if we are ignoring checks built in this execbuilder. How are those checks executed?

@mgartner
Copy link
Collaborator

pkg/sql/opt/exec/execbuilder/scalar.go line 1115 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I'm confused how this is working if we are ignoring checks built in this execbuilder. How are those checks executed?

Nevermind, I see now that runPlanInsidePlan is running these checks.

This approach is much simpler than what I had in mind. I tried doing something like this for running subqueries inside routines but wasn't able to get it to work, but don't remember why. Are there any downsides you see with this approach?

@mgartner
Copy link
Collaborator

pkg/sql/conn_executor_exec.go line 2090 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

resetEvalCtx internally unsets Placeholders and Annotations fields, and I don't think we want that. Perhaps "copy and reset" is not the right new helper method. The way I see it is that it'd be good to extract "binding" the three fields (Placeholders, Annotations, and SessionID) of the eval context of sub- and post-queries to the current stmt / session.

Statements within a UDF there should be no placeholders after planning. So as long as factoryEvalCtx is a copy, unsetting Placeholders should be ok. I'm less familiar with Annotations, maybe they are needed?

@rharding6373 rharding6373 force-pushed the 20230601_udf_fk_87289 branch from 1e44351 to a368bb3 Compare June 16, 2023 00:01
Copy link
Collaborator Author

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs! I think we're getting closer on the extendedEvalContext refactor...

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @yuzefovich)


-- commits line 6 at r2:

Previously, mgartner (Marcus Gartner) wrote…

nit: Some more detail on how this works vs. how it worked previously would be helpful.

Done. PTAL.


pkg/sql/apply_join.go line 349 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…
  1. We only set the factoryEvalCtx fields once after the extendedEvalContext is copied and no longer every time we reuse the context for the serial use case. I think this is ok, since the fields are for session ID and semantics, which I don't expect to change.

True, the session ID and semantics aren't expected to change, but the subquery could modify some other fields in the eval context, and we probably don't want to have those changes be propagated to consequent subqueries. It's possible that this won't break anything, but it's not a no-op change, and I don't see the reason for why we'd do it in this PR.

We now reset the serial eval ctx before returning it. PTAL.


pkg/sql/apply_join.go line 310 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I think we don't want to have a single evalContextSubquery for all subqueries - we should be creating a fresh context for each invocation of this anonymous function.

Oh right this was a copy function before. I think I've fixed it now, ptal.


pkg/sql/apply_join.go line 360 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

We should check the error before running the cascades and checks, similar to what we do in PlanAndRunAll.

Done.


pkg/sql/conn_executor_exec.go line 2090 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Statements within a UDF there should be no placeholders after planning. So as long as factoryEvalCtx is a copy, unsetting Placeholders should be ok. I'm less familiar with Annotations, maybe they are needed?

I've made a separate helper function to reset that we can call here. PTAL.


pkg/sql/opt/exec/execbuilder/scalar.go line 1115 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Nevermind, I see now that runPlanInsidePlan is running these checks.

This approach is much simpler than what I had in mind. I tried doing something like this for running subqueries inside routines but wasn't able to get it to work, but don't remember why. Are there any downsides you see with this approach?

We'll see if it works with cascades (doesn't seem to be a reason why it shouldn't). One downside is that we may be duplicating work that may be hard to optimize using this approach. For example:

CREATE TABLE parent (p INT PRIMARY KEY);
CREATE TABLE child (c INT PRIMARY KEY, p INT NOT NULL REFERENCES parent(p));
CREATE FUNCTION f_fk_c(k INT, r INT) RETURNS RECORD AS $$
  INSERT INTO child VALUES (k,r) RETURNING *;
$$ LANGUAGE SQL;
CREATE FUNCTION f_fk_c2(k1 INT, r1 INT, k2 INT, r2 INT) RETURNS RECORD AS $$
  INSERT INTO child VALUES (k1,r1) RETURNING *;
  INSERT INTO child VALUES (k2,r2) RETURNING *;
$$ LANGUAGE SQL;
INSERT INTO parent VALUES (1),(2),(3);
-- Q1
SELECT f_fk_c(100,3), f_fk_c(101,3);
-- Q2
SELECT f_fk_c2(102, 2, 103, 2)

In both Q1 and Q2 we are doing an FK check for the same values twice. If the first FK check passes and there are no modifications to the parent before the second, the second is guaranteed to pass. I'm not sure if this would be easier to optimize using another approach or not.


pkg/sql/logictest/testdata/logic_test/udf_fk line 116 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: should we explicitly check the value here rather than only that this statement didn't fail?

Done.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 4 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @rharding6373)


pkg/sql/apply_join.go line 15 at r3 (raw file):

import (
	"context"
	"github.com/cockroachdb/cockroach/pkg/util/buildutil"

nit: out of order import.


pkg/sql/apply_join.go line 302 at r3 (raw file):

			params.p.curPlan.subqueryPlans = oldSubqueries
		}()
		// Create a separate memory account for the results of the subqueries.

nit: move this comment to be right above creating subqueryResultMemAcc.


pkg/sql/apply_join.go line 365 at r3 (raw file):

		ctx, evalCtx, planCtx, plannerCopy.Txn(), plan.main, recv, finishedSetupFn,
	)
	if p := planCtx.getPortalPauseInfo(); p != nil {

nit: I'd probably not include the part about pausable portals here since we currently don't support pausable portals for stmts that write data. That said, we should consider explicitly disabling the pausable portals feature for UDFs with mutations. Take a look at connExecutor.makePreparedPortal.


pkg/sql/conn_executor_exec.go line 2090 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

I've made a separate helper function to reset that we can call here. PTAL.

I think the order of things is still not the right one.

Here is what the code was doing previously:

  1. create and initialize eval context
  2. call resetEvalCtx on it (which internally unsets some fields, including Placeholders and Annotations)
  3. update Placeholders, Annotations, and SessionID fields based on the planner.

What the code is doing now:

  1. create and initialize eval context as well as "reset" it (which in this case means setting Placeholders, Annotations, and SessionID fields based on the planner).
  2. call resetEvalCtx on the eval context, which unsets Placeholders and Annotations fields.

Now we have that Placeholders and Annotations fields are left nil, and that's not what we want.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: modulo Yahor's comments.

Reviewed 11 of 13 files at r1, 2 of 4 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @rharding6373)


pkg/sql/opt/exec/execbuilder/scalar.go line 1115 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

We'll see if it works with cascades (doesn't seem to be a reason why it shouldn't). One downside is that we may be duplicating work that may be hard to optimize using this approach. For example:

CREATE TABLE parent (p INT PRIMARY KEY);
CREATE TABLE child (c INT PRIMARY KEY, p INT NOT NULL REFERENCES parent(p));
CREATE FUNCTION f_fk_c(k INT, r INT) RETURNS RECORD AS $$
  INSERT INTO child VALUES (k,r) RETURNING *;
$$ LANGUAGE SQL;
CREATE FUNCTION f_fk_c2(k1 INT, r1 INT, k2 INT, r2 INT) RETURNS RECORD AS $$
  INSERT INTO child VALUES (k1,r1) RETURNING *;
  INSERT INTO child VALUES (k2,r2) RETURNING *;
$$ LANGUAGE SQL;
INSERT INTO parent VALUES (1),(2),(3);
-- Q1
SELECT f_fk_c(100,3), f_fk_c(101,3);
-- Q2
SELECT f_fk_c2(102, 2, 103, 2)

In both Q1 and Q2 we are doing an FK check for the same values twice. If the first FK check passes and there are no modifications to the parent before the second, the second is guaranteed to pass. I'm not sure if this would be easier to optimize using another approach or not.

I think this is fine. A DEFERRED constraint would probably be the best way to get around this, but we don't support that yet. In this case we do want a check after each individual INSERT. Optimizing away checks across multiple statements seems error prone and probably not worth the trouble.


pkg/sql/logictest/testdata/logic_test/udf_fk line 116 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: should we explicitly check the value here rather than only that this statement didn't fail?

+1

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 13 files at r1, 4 of 4 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)


pkg/sql/logictest/testdata/logic_test/udf_fk line 62 at r3 (raw file):


query TT
SELECT f_fk_p(3), f_fk_c(102, 3);

The fact this succeeds seems like it might be problematic - we don't make any guarantees about the order in which these expression are evaluated, do we?

@mgartner
Copy link
Collaborator

pkg/sql/logictest/testdata/logic_test/udf_fk line 62 at r3 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

The fact this succeeds seems like it might be problematic - we don't make any guarantees about the order in which these expression are evaluated, do we?

Good catch - we don't guarantee evaluation order here.

@rharding6373 rharding6373 force-pushed the 20230601_udf_fk_87289 branch from a368bb3 to e76c4a9 Compare July 11, 2023 17:47
Copy link
Collaborator Author

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs, and apologies for the delay turning this around. PTAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, and @yuzefovich)


pkg/sql/apply_join.go line 365 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: I'd probably not include the part about pausable portals here since we currently don't support pausable portals for stmts that write data. That said, we should consider explicitly disabling the pausable portals feature for UDFs with mutations. Take a look at connExecutor.makePreparedPortal.

I don't know enough about pausable portals yet to say whether we should allow them for UDF statements that are read-only, but the UDF body contains a write. I have omitted this as you suggested (but left the nil assignment above).


pkg/sql/conn_executor_exec.go line 2090 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I think the order of things is still not the right one.

Here is what the code was doing previously:

  1. create and initialize eval context
  2. call resetEvalCtx on it (which internally unsets some fields, including Placeholders and Annotations)
  3. update Placeholders, Annotations, and SessionID fields based on the planner.

What the code is doing now:

  1. create and initialize eval context as well as "reset" it (which in this case means setting Placeholders, Annotations, and SessionID fields based on the planner).
  2. call resetEvalCtx on the eval context, which unsets Placeholders and Annotations fields.

Now we have that Placeholders and Annotations fields are left nil, and that's not what we want.

Right below this we call planner.ExtendedEvalContextReset(factoryEvalCtx) which sets those fields based on the planner. Does that line up with your current understanding of the order of setting/unsetting values?


pkg/sql/logictest/testdata/logic_test/udf_fk line 62 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Good catch - we don't guarantee evaluation order here.

Thank you for catching this. I removed these tests as they're not useful.

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 14 of 14 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @yuzefovich)

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 14 of 14 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, and @rharding6373)


pkg/sql/apply_join.go line 365 at r3 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

I don't know enough about pausable portals yet to say whether we should allow them for UDF statements that are read-only, but the UDF body contains a write. I have omitted this as you suggested (but left the nil assignment above).

Let's spend a bit time on this here. I don't think that the code that we copy-pasted from PlanAndRunAll that is related to pausable portals will work as expected, so I wouldn't blindly introduce it here - it also adds unnecessary complexity to this runPlanInsidePlan method.

I believe things can actually be quite simple - we don't need to support "pausable portal execution model" in this method at all. What matters for pausable portals is the fact that we can "paginate" the output of the main "outer" query, but here we're in the "inner" query context, and we actually want to run it fully to completion (before we produce any "outer" rows). In fact, we already ensure this by nilling out plannerCopy.pausablePortal field above. So I think we can (and should) just remove this whole if block as unnecessary.


However, I'm a bit concerned that we have an existing bug with pausable portals of stmts that have apply join with the inner plan having a subquery. Subqueries are disallowed for pausable portals, but in that case the outer plan doesn't have any subqueries, so it'd pass the existing check. I think we want for subqueries evaluation to also use plannerCopy with unset pausablePortal field. I'd apply a diff like this:

diff --git a/pkg/sql/apply\_join.go b/pkg/sql/apply\_join.go  
index d22e84a0e37..4a3aa41d0d8 100644  
\--- a/pkg/sql/apply\_join.go  
+++ b/pkg/sql/apply\_join.go  
@@ -277,6 +277,12 @@ func runPlanInsidePlan(  
        )  
        defer recv.Release()  
   
\+       plannerCopy := \*params.p  
\+       // If we reach this part when re-executing a pausable portal, we won't want to  
\+       // resume the flow bound to it. The inner-plan should have its own lifecycle  
\+       // for its flow.  
\+       plannerCopy.pausablePortal = nil  
+  
        if len(plan.subqueryPlans) != 0 {  
                // We currently don't support cases when both the "inner" and the  
                // "outer" plans have subqueries due to limitations of how we're  
@@ -291,13 +297,8 @@ func runPlanInsidePlan(  
                // plan (and we know there are none given the check above). If parts of  
                // the "inner" plan refer to the subqueries, we know that they must  
                // refer to the "inner" subqueries. To allow for that to happen we have  
\-               // to manually replace the subqueries on the planner's curPlan and  
\-               // restore the original state before exiting.  
\-               oldSubqueries := params.p.curPlan.subqueryPlans  
\-               params.p.curPlan.subqueryPlans = plan.subqueryPlans  
\-               defer func() {  
\-                       params.p.curPlan.subqueryPlans = oldSubqueries  
\-               }()  
\+               // to manually replace the subqueries on the planner's curPlan.  
\+               plannerCopy.curPlan.subqueryPlans = plan.subqueryPlans  
                // Create a separate memory account for the results of the subqueries.  
                // Note that we intentionally defer the closure of the account until we  
                // return from this method (after the main query is executed).  
@@ -305,7 +306,7 @@ func runPlanInsidePlan(  
                defer subqueryResultMemAcc.Close(ctx)  
                if !execCfg.DistSQLPlanner.PlanAndRunSubqueries(  
                        ctx,  
\-                       params.p,  
\+                       &plannerCopy,  
                        params.extendedEvalCtx.copy,  
                        plan.subqueryPlans,  
                        recv,  
@@ -319,11 +320,6 @@ func runPlanInsidePlan(  
   
        // Make a copy of the EvalContext so it can be safely modified.  
        evalCtx := params.p.ExtendedEvalContextCopy()  
\-       plannerCopy := \*params.p  
\-       // If we reach this part when re-executing a pausable portal, we won't want to  
\-       // resume the flow bound to it. The inner-plan should have its own lifecycle  
\-       // for its flow.  
\-       plannerCopy.pausablePortal = nil  
        distributePlan := getPlanDistribution(  
                ctx, plannerCopy.Descriptors().HasUncommittedTypes(),  
                plannerCopy.SessionData().DistSQLMode, plan.main,

(apologies, reviewable for some reason inserted back slashes in some places)


It might be worth to come up with a query with an apply join with an "inner" subquery and see what happens when this stmt is executed in pausable portal way. I'm guessing it wouldn't work as expected (it might get stuck indefinitely), so it might be good to extract this diff into a separate commit that is also backported to 23.1.


pkg/sql/conn_executor_exec.go line 2090 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Right below this we call planner.ExtendedEvalContextReset(factoryEvalCtx) which sets those fields based on the planner. Does that line up with your current understanding of the order of setting/unsetting values?

Oh yeah, I missed that, nvm.

Copy link
Collaborator Author

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviving this again now that I think the pausable portal issues with volatile UDFs is being resolved. Thanks for your patience, RFAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @mgartner, and @yuzefovich)


pkg/sql/apply_join.go line 365 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Let's spend a bit time on this here. I don't think that the code that we copy-pasted from PlanAndRunAll that is related to pausable portals will work as expected, so I wouldn't blindly introduce it here - it also adds unnecessary complexity to this runPlanInsidePlan method.

I believe things can actually be quite simple - we don't need to support "pausable portal execution model" in this method at all. What matters for pausable portals is the fact that we can "paginate" the output of the main "outer" query, but here we're in the "inner" query context, and we actually want to run it fully to completion (before we produce any "outer" rows). In fact, we already ensure this by nilling out plannerCopy.pausablePortal field above. So I think we can (and should) just remove this whole if block as unnecessary.


However, I'm a bit concerned that we have an existing bug with pausable portals of stmts that have apply join with the inner plan having a subquery. Subqueries are disallowed for pausable portals, but in that case the outer plan doesn't have any subqueries, so it'd pass the existing check. I think we want for subqueries evaluation to also use plannerCopy with unset pausablePortal field. I'd apply a diff like this:

diff --git a/pkg/sql/apply\_join.go b/pkg/sql/apply\_join.go  
index d22e84a0e37..4a3aa41d0d8 100644  
\--- a/pkg/sql/apply\_join.go  
+++ b/pkg/sql/apply\_join.go  
@@ -277,6 +277,12 @@ func runPlanInsidePlan(  
        )  
        defer recv.Release()  
   
\+       plannerCopy := \*params.p  
\+       // If we reach this part when re-executing a pausable portal, we won't want to  
\+       // resume the flow bound to it. The inner-plan should have its own lifecycle  
\+       // for its flow.  
\+       plannerCopy.pausablePortal = nil  
+  
        if len(plan.subqueryPlans) != 0 {  
                // We currently don't support cases when both the "inner" and the  
                // "outer" plans have subqueries due to limitations of how we're  
@@ -291,13 +297,8 @@ func runPlanInsidePlan(  
                // plan (and we know there are none given the check above). If parts of  
                // the "inner" plan refer to the subqueries, we know that they must  
                // refer to the "inner" subqueries. To allow for that to happen we have  
\-               // to manually replace the subqueries on the planner's curPlan and  
\-               // restore the original state before exiting.  
\-               oldSubqueries := params.p.curPlan.subqueryPlans  
\-               params.p.curPlan.subqueryPlans = plan.subqueryPlans  
\-               defer func() {  
\-                       params.p.curPlan.subqueryPlans = oldSubqueries  
\-               }()  
\+               // to manually replace the subqueries on the planner's curPlan.  
\+               plannerCopy.curPlan.subqueryPlans = plan.subqueryPlans  
                // Create a separate memory account for the results of the subqueries.  
                // Note that we intentionally defer the closure of the account until we  
                // return from this method (after the main query is executed).  
@@ -305,7 +306,7 @@ func runPlanInsidePlan(  
                defer subqueryResultMemAcc.Close(ctx)  
                if !execCfg.DistSQLPlanner.PlanAndRunSubqueries(  
                        ctx,  
\-                       params.p,  
\+                       &plannerCopy,  
                        params.extendedEvalCtx.copy,  
                        plan.subqueryPlans,  
                        recv,  
@@ -319,11 +320,6 @@ func runPlanInsidePlan(  
   
        // Make a copy of the EvalContext so it can be safely modified.  
        evalCtx := params.p.ExtendedEvalContextCopy()  
\-       plannerCopy := \*params.p  
\-       // If we reach this part when re-executing a pausable portal, we won't want to  
\-       // resume the flow bound to it. The inner-plan should have its own lifecycle  
\-       // for its flow.  
\-       plannerCopy.pausablePortal = nil  
        distributePlan := getPlanDistribution(  
                ctx, plannerCopy.Descriptors().HasUncommittedTypes(),  
                plannerCopy.SessionData().DistSQLMode, plan.main,

(apologies, reviewable for some reason inserted back slashes in some places)


It might be worth to come up with a query with an apply join with an "inner" subquery and see what happens when this stmt is executed in pausable portal way. I'm guessing it wouldn't work as expected (it might get stuck indefinitely), so it might be good to extract this diff into a separate commit that is also backported to 23.1.

This has been fixed by #106659, I think, so plannerCopy.pausablePortal is null above. I also have a PR open to disable pausable portals for queries with any function call: #107998.


pkg/sql/conn_executor_exec.go line 2090 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Oh yeah, I missed that, nvm.

Done.

@rharding6373 rharding6373 requested a review from yuzefovich August 2, 2023 00:09
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing on this, :lgtm:

Reviewed 3 of 3 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball and @rharding6373)


pkg/sql/apply_join.go line 348 at r5 (raw file):

		return evalCtxFactory()
	}

nit: we should also check resultWriter.Err() here - no point in running cascades and checks if we already hit an error.


pkg/sql/pgwire/testdata/pgtest/multiple_active_portals line 1310 at r5 (raw file):

subtest end

subtest mytest

Did you mean to check this change in?

Before this change, we would return an error if a UDF attempted to make
a foreign key check because checks were not supported in routines.

This change adds support for running postquery checks, like FK checks,
in routines. It leverages the existing DistSQL postquery planner to run
checks built for a routine's statement after the statement has been
planned and run. This also allows UDFs to take advantage of the parallel
FK check capabilities. Note that foreign key checks are run after each
statement in the UDF body, as well as after the main query if required.

This change also refactors how extendedEvalContexts are copied for
parallel checks.

Support for FK cascades will come in a later PR.

Epic: CRDB-25388
Informs: cockroachdb#87289

Release note: None
@rharding6373 rharding6373 force-pushed the 20230601_udf_fk_87289 branch from 4fca6b2 to 54c3e26 Compare August 2, 2023 13:42
Copy link
Collaborator Author

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the reviews!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @DrewKimball and @yuzefovich)


pkg/sql/apply_join.go line 348 at r5 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: we should also check resultWriter.Err() here - no point in running cascades and checks if we already hit an error.

Good point. Looks like recv.commErr contains the errors from resultWriter, so I've used it here, too.


pkg/sql/pgwire/testdata/pgtest/multiple_active_portals line 1310 at r5 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Did you mean to check this change in?

No...Done.

@rharding6373
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 2, 2023

Build succeeded:

@craig craig bot merged commit 05ad3c4 into cockroachdb:master Aug 2, 2023
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 3 stale)


pkg/sql/apply_join.go line 348 at r5 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Good point. Looks like recv.commErr contains the errors from resultWriter, so I've used it here, too.

Note that it is possible that resultWriter.Err() is set while recv.commErr is nil, and I think we want to check both for short-circuiting. (To be more precise, if we have commErr, then it's also set in resultWriter, but the opposite is not true, so we either need to check resultWriter.Err() only, or both, but not just recv.commErr.)

The reason for separation of errors is that "communication errors" are stronger in a sense that they result in the shutdown of the connection (on the main execution path) whereas "benign" query errors are written into resultWriter. In this apply join code path the notion of "communication errors" doesn't seem to really apply though.

rharding6373 added a commit to rharding6373/cockroach that referenced this pull request Aug 3, 2023
In cockroachdb#104553, we modified
`runPlanInsidePlan` to plan and run checks after planning and running
the inside plan. Although we check for comm errors after the initial
call to `PlanAndRun` to see if we should exit early, we neglected to
also check the result writer for errors. Since not all result writer
errors may be copied into the comm errors, we fix that here.

Epic: None
Informs: cockroachdb#87289

Release note: None
craig bot pushed a commit that referenced this pull request Aug 4, 2023
108126: sql: check for resultWriter error in runPlanInsidePlan r=rharding6373 a=rharding6373

In #104553, we modified `runPlanInsidePlan` to plan and run checks after planning and running the inside plan. Although we check for comm errors after the initial call to `PlanAndRun` to see if we should exit early, we neglected to also check the result writer for errors. Since not all result writer errors may be copied into the comm errors, we fix that here.

Epic: None
Informs: #87289

Release note: None

Co-authored-by: rharding6373 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants