Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support OUT parameters in PLpgSQL UDFs #119493

Merged
merged 3 commits into from
Mar 4, 2024

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Feb 22, 2024

This commit (mostly) adds the support for INOUT / OUT parameters in PLpgSQL UDFs. OUT parameters are not included into the signature (for overload resolution), but they result in uninitialized variables. Any parameter can be left unnamed but can still be referenced via $i notation which is unsupported so far. Unlike for SQL UDFs, in PLpgSQL input and output parameters share the same namespace for duplication check.

Additionally, this commit fixes a few issues:

  • for UDFs only input parameters need to be added to the scope of the body
  • for procedures, all parameters (including OUT) are used for overload resolution.
  • UDF returning VOID with empty body previously would hit an index out of bounds internal error.

Addresses: #100405.
Fixes: #119208.
Epic: CRDB-30611

Fixes: #119841.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the out-params branch 6 times, most recently from 618ac3d to 640b81e Compare February 27, 2024 01:51
Copy link

blathers-crl bot commented Feb 27, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@yuzefovich yuzefovich force-pushed the out-params branch 9 times, most recently from 3068ed6 to af3f480 Compare February 29, 2024 04:02
@yuzefovich yuzefovich changed the title optbuilder: only add IN parameters to body scope sql: support OUT parameters in PLpgSQL UDFs Feb 29, 2024
@yuzefovich yuzefovich force-pushed the out-params branch 2 times, most recently from f00a7bc to 11cf09c Compare February 29, 2024 04:37
@yuzefovich yuzefovich marked this pull request as ready for review February 29, 2024 04:53
@yuzefovich yuzefovich requested review from a team as code owners February 29, 2024 04:53
@yuzefovich yuzefovich requested review from herkolategan and DarrylWong and removed request for a team and herkolategan February 29, 2024 04:53
@yuzefovich yuzefovich removed the request for review from DarrylWong February 29, 2024 04:54
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to also add OUT parameters to pkg/ccl/logictestccl/testdata/logic_test/udf_rewrite and pkg/sql/logictest/testdata/logic_test/udf_rewrite, just to make sure we aren't skipping them during AST rewrites.

Reviewed 5 of 5 files at r1, 3 of 6 files at r2, 20 of 20 files at r5, 19 of 19 files at r6, 3 of 3 files at r7, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/create_function.go line 144 at r6 (raw file):

	signatureTypes := make([]*types.T, 0, len(udfDesc.Params))
	for _, param := range udfDesc.Params {
		if udfDesc.IsProcedure() || tree.IsInParamClass(funcdesc.ToTreeRoutineParamClass(param.Class)) {

[nit] Would it be worth adding a helper for these checks?


pkg/sql/opt/optbuilder/create_function.go line 343 at r6 (raw file):

			lastStmt := stmt.AST.Body[len(stmt.AST.Body)-1]
			if _, hasReturn := lastStmt.(*ast.Return); hasReturn && len(outParamTypes) > 0 {
				panic(returnWithOUTParameterErr)

We should check this in plpgsql.go when building RETURN statements, since otherwise we can only catch when it's the last statement. We're going to have to modify the parser to allow RETURN with no expression (should be pretty simple).


pkg/sql/opt/optbuilder/srfs.go line 212 at r5 (raw file):

			// If we have at least two OUT parameters, they specify an implicit
			// alias for the RECORD return type.
			if numOutParams < 2 {

If we have one out parameter, shouldn't the error be different? Is this line ever reachable with numOutParams == 1?


pkg/sql/sem/tree/overload.go line 271 at r5 (raw file):

	// This is currently either SQL or PL/pgSQL.
	Language RoutineLanguage
	// RoutineParams contains all parameter information of the routine.

[nit] Maybe talk about how this is distinguished from the Types field, e.g. Types is used for matching a routine signature, while RoutineParams contains the class, type, and name of each parameter for use in building the routine's body and return type.


pkg/ccl/logictestccl/testdata/logic_test/udf_params line 76 at r6 (raw file):


statement ok
CREATE FUNCTION f(INOUT param1 INT, OUT param2 INT) AS $$ BEGIN SELECT 3 INTO param1; END $$ LANGUAGE PLpgSQL;

Could we have a few cases use assignment := instead of SELECT INTO? Could also be good to verify that OUT parameters can be referenced in other statements, like RAISE.

This allows us to remove some of the recently introduced fields and this
information will be used by the follow up work.

Epic: None

Release note: None
@yuzefovich yuzefovich force-pushed the out-params branch 2 times, most recently from fd0a762 to 3e28aaf Compare March 4, 2024 03:11
@yuzefovich yuzefovich requested a review from DrewKimball March 4, 2024 03:13
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointers - didn't know about udf_rewrite tests.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball)


pkg/sql/create_function.go line 144 at r6 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] Would it be worth adding a helper for these checks?

Good point, done.


pkg/sql/opt/optbuilder/create_function.go line 343 at r6 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

We should check this in plpgsql.go when building RETURN statements, since otherwise we can only catch when it's the last statement. We're going to have to modify the parser to allow RETURN with no expression (should be pretty simple).

Good point, done.

I did try adjusting the parser to support empty RETURN with this rule:

| RETURN ';'
  {
    $$.val = &plpgsqltree.Return{}
  }

but it ran into a shift/reduce conflict, so I left a TODO instead.


pkg/sql/opt/optbuilder/srfs.go line 212 at r5 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

If we have one out parameter, shouldn't the error be different? Is this line ever reachable with numOutParams == 1?

No, this line is not reachable with one OUT parameter - the function would have ReturnsRecordType == false.


pkg/sql/sem/tree/overload.go line 271 at r5 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] Maybe talk about how this is distinguished from the Types field, e.g. Types is used for matching a routine signature, while RoutineParams contains the class, type, and name of each parameter for use in building the routine's body and return type.

Done.


pkg/ccl/logictestccl/testdata/logic_test/udf_params line 76 at r6 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Could we have a few cases use assignment := instead of SELECT INTO? Could also be good to verify that OUT parameters can be referenced in other statements, like RAISE.

Good point, added a few cases.

Adding a RAISE stmt actually exposed a problem when the function is used as a data source:

        (XX000) internal error: invalid datum type given: RECORD, expected INT8
        encoded_datum.go:195: in DatumToEncDatum()
        DETAIL: stack trace:
        github.com/cockroachdb/cockroach/pkg/sql/rowenc/encoded_datum.go:195: DatumToEncDatum()
        github.com/cockroachdb/cockroach/pkg/sql/rowexec/project_set.go:333: toEncDatum()
        github.com/cockroachdb/cockroach/pkg/sql/rowexec/project_set.go:240: nextGeneratorValues()
        github.com/cockroachdb/cockroach/pkg/sql/rowexec/project_set.go:312: Next()
        github.com/cockroachdb/cockroach/pkg/sql/colexec/columnarizer.go:239: Next()
...

I'm not sure why it happens - any ideas? Builder.finishBuildLastStmt seems to handle this code when I go into debugger, which adds a projection of two columns from the tuple, yet the project set processor still produces a tuple.

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice work!

Reviewed 18 of 18 files at r8, 21 of 21 files at r9, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/opt/optbuilder/create_function.go line 343 at r6 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Good point, done.

I did try adjusting the parser to support empty RETURN with this rule:

| RETURN ';'
  {
    $$.val = &plpgsqltree.Return{}
  }

but it ran into a shift/reduce conflict, so I left a TODO instead.

SGTM


pkg/ccl/logictestccl/testdata/logic_test/udf_params line 76 at r6 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Good point, added a few cases.

Adding a RAISE stmt actually exposed a problem when the function is used as a data source:

        (XX000) internal error: invalid datum type given: RECORD, expected INT8
        encoded_datum.go:195: in DatumToEncDatum()
        DETAIL: stack trace:
        github.com/cockroachdb/cockroach/pkg/sql/rowenc/encoded_datum.go:195: DatumToEncDatum()
        github.com/cockroachdb/cockroach/pkg/sql/rowexec/project_set.go:333: toEncDatum()
        github.com/cockroachdb/cockroach/pkg/sql/rowexec/project_set.go:240: nextGeneratorValues()
        github.com/cockroachdb/cockroach/pkg/sql/rowexec/project_set.go:312: Next()
        github.com/cockroachdb/cockroach/pkg/sql/colexec/columnarizer.go:239: Next()
...

I'm not sure why it happens - any ideas? Builder.finishBuildLastStmt seems to handle this code when I go into debugger, which adds a projection of two columns from the tuple, yet the project set processor still produces a tuple.

Hm, that feels like something that would be solved by #119616. Can you show the optimizer plan with the TYPES option?

This commit (mostly) adds the support for INOUT / OUT parameters in
PLpgSQL UDFs. OUT parameters are not included into the signature (for
overload resolution), but they result in uninitialized variables. Any
parameter can be left unnamed but can still be referenced via `$i`
notation which is unsupported so far. Unlike for SQL UDFs, in PLpgSQL
input and output parameters share the same namespace for duplication
check.

Additionally, this commit fixes a few issues:
- for UDFs only input parameters need to be added to the scope of the
body
- for procedures, all parameters (including OUT) are used for overload
resolution
- UDF returning VOID with empty body previously would hit an index out
of bounds internal error.

Release note: None
@yuzefovich yuzefovich requested a review from DrewKimball March 4, 2024 18:16
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)


pkg/ccl/logictestccl/testdata/logic_test/udf_params line 76 at r6 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Hm, that feels like something that would be solved by #119616. Can you show the optimizer plan with the TYPES option?

  • project set
  │ columns: (param1 int, param2 int)
  │ estimated row count: 1
  │ render 0: (f((3)[int]))[tuple{int AS param1, int AS param2}]
  │
  └── • emptyrow
        columns: ()

I rebased this PR on top of #119616, and the problem still reproduces.

It's interesting that it seems somewhat non-deterministic. For now, I commented out the couple of cases that trigger this internal error.

@yuzefovich
Copy link
Member Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 4, 2024

Build succeeded:

@craig craig bot merged commit 1d79dd8 into cockroachdb:master Mar 4, 2024
17 of 28 checks passed
@yuzefovich yuzefovich deleted the out-params branch March 4, 2024 20:13
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Reviewed 2 of 6 files at r2, 18 of 18 files at r8, 20 of 21 files at r9, 1 of 1 files at r10, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/ccl/logictestccl/testdata/logic_test/udf_params line 333 at r10 (raw file):
TIL:

Equal (=) can be used instead of PL/SQL-compliant :=.

https://www.postgresql.org/docs/current/plpgsql-statements.html#PLPGSQL-STATEMENTS-ASSIGNMENT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants