Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add json{,b}_to_record{,set}; recordtype srfs #82435

Merged
merged 1 commit into from
Aug 27, 2022

Conversation

jordanlewis
Copy link
Member

@jordanlewis jordanlewis commented Jun 3, 2022

Closes #70037

This commit adds support for the family of four json{,b}_to_record{,set}
functions, which deconstruct JSON input (either an object or an array)
and return a row or set of rows with well-typed elements from the JSON,
governed by the table alias that the function is invoked with.

For example, the query below deconstructs the input JSON, returning each
of the keys "requested" by the table alias definition. Missing keys are
replaced with NULL.

SELECT * FROM json_to_record('{"a": "b", "c": true}') AS t(a TEXT, z INT)
----
b  NULL

The logic that governs the type casting from JSON to SQL mimics the
logic in the similar json_populate_record family of functions, and
should be identical to Postgres's such logic. It's permissible to use
the virtual table type to deconstruct a sub-object into a record field
within the top level JSON, like this:

CREATE TABLE mytable (a INT, b TEXT)

SELECT * FROM json_to_record('{"foo": {"a": 3, "b": "bar"}}') AS t(foo mytable)
----
(3,bar)

Functions like these ones that return record types in PostgreSQL must be
aliased to a named and typed tuple (like AS t(a INT, b INT)) to be
usable. As a result, this commit:

  1. adds parser support for this form of alias, with types
  2. pushes the alias information for an aliased expression down the
    optimizer's call stack so that the aliased expression can access the
    alias information. Previously, the aliased expression was blind to
    any aliases that were applied to it.
  3. teaches the generator function factories to get the alias information
    plumbed in as well.

Release note (sql change): add the json{,b}_to_record{,set} builtin
functions, which transform JSON into structured SQL records.

Release justification: low risk, high reward change to existing functionality

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jordanlewis jordanlewis force-pushed the json-to-record branch 2 times, most recently from 89b0b3d to 218cdf5 Compare June 8, 2022 00:00
@jordanlewis jordanlewis changed the title [wip] sql: add support for record-returning srfs sql: add json{,b}_to_record{,set}; recordtype srfs Jun 8, 2022
@jordanlewis jordanlewis force-pushed the json-to-record branch 3 times, most recently from e5dbc3f to 895bc57 Compare June 10, 2022 21:56
@jordanlewis jordanlewis marked this pull request as ready for review June 10, 2022 21:56
@jordanlewis jordanlewis requested review from a team as code owners June 10, 2022 21:56
@jordanlewis jordanlewis requested a review from a team June 10, 2022 21:56
@jordanlewis jordanlewis requested a review from a team as a code owner June 10, 2022 21:56
@jordanlewis jordanlewis requested a review from rafiss June 10, 2022 21:57
@jordanlewis
Copy link
Member Author

Hi @mgartner, would you please take a look at the optimizer changes required for this? The crux of the difficulty was that we needed to pass in the alias information (e.g. AS t(x INT, y INT)) into the set-generating function that is being aliased, which is of course backward. So it required some sneaky behavior.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I just looked at the optimizer files for now

Reviewed 2 of 41 files at r1, 4 of 22 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @mgartner, and @rafiss)


pkg/sql/opt/optbuilder/select.go line 49 at r3 (raw file):

	}(inScope.atRoot)
	inScope.atRoot = false
	if b.lastAlias != nil {

Did you try using the scope for this field instead of Builder? Intuitively I would think scope should be the right place for it.


pkg/sql/opt/optbuilder/srfs.go line 146 at r3 (raw file):

// (SRF) such as generate_series() or unnest(). It synthesizes new columns in
// outScope for each of the SRF's output columns.
func (b *Builder) finishBuildGeneratorFunction(

You should add some tests for these functions in opt/optbuilder/testdata/srfs that exercise this logic


pkg/sql/opt/optgen/cmd/optgen/metadata.go line 210 at r3 (raw file):

		"WindowFrame":         {fullName: "memo.WindowFrame", passByVal: true},
		"FKCascades":          {fullName: "memo.FKCascades", passByVal: true},
		"ColumnDefList":       {fullName: "tree.ColumnDefList", passByVal: true},

Why do you need to add this here? Doesn't look like you've updated any opt files

@jordanlewis jordanlewis force-pushed the json-to-record branch 2 times, most recently from aaab2a1 to 8f7d7db Compare July 29, 2022 20:25
@jordanlewis jordanlewis requested a review from rytaft July 29, 2022 20:25
Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rafiss, and @rytaft)


pkg/sql/opt/optbuilder/select.go line 49 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Did you try using the scope for this field instead of Builder? Intuitively I would think scope should be the right place for it.

This is a good point. I've moved the lastAlias field into scope.


pkg/sql/opt/optbuilder/srfs.go line 146 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

You should add some tests for these functions in opt/optbuilder/testdata/srfs that exercise this logic

Done, added some tests that try to break the scoping logic.


pkg/sql/opt/optgen/cmd/optgen/metadata.go line 210 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Why do you need to add this here? Doesn't look like you've updated any opt files

Good point, I think this was leftover from a different way I was trying to solve this problem.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 13 files at r4, 3 of 5 files at r5, 1 of 7 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @mgartner, and @rafiss)


pkg/sql/opt/optbuilder/scope.go line 639 at r6 (raw file):

// statement, or nil if there was none.
func (s *scope) getLastAlias() *tree.AliasClause {
	for ; s != nil; s = s.parent {

When would we want to get the last alias from the parent scope?


pkg/sql/opt/optbuilder/select.go line 55 at r6 (raw file):

			defer func() { b.lastAlias = nil }()
		}
	*/

Looks like this is dead code now


pkg/sql/opt/optbuilder/select.go line 72 at r6 (raw file):

		outScope = b.buildDataSource(source.Expr, indexFlags, locking, inScope)

		inScope.lastAlias = previousLastAlias

This is fine, but one reason I suggested using scope was that I thought you might be able to allocate a new scope for building the data source and only use the lastAlias from the current scope (i.e., not look in the parent). Then you wouldn't need to save and restore the previous alias. Maybe that doesn't work, though?

Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, PTAL. It would definitely be nice to land this but I know it's a big change.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rafiss, and @rytaft)


pkg/sql/opt/optbuilder/scope.go line 639 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

When would we want to get the last alias from the parent scope?

Done.


pkg/sql/opt/optbuilder/select.go line 55 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Looks like this is dead code now

Done.


pkg/sql/opt/optbuilder/select.go line 72 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

This is fine, but one reason I suggested using scope was that I thought you might be able to allocate a new scope for building the data source and only use the lastAlias from the current scope (i.e., not look in the parent). Then you wouldn't need to save and restore the previous alias. Maybe that doesn't work, though?

Good point, please take another look.

@rafiss
Copy link
Collaborator

rafiss commented Aug 23, 2022

For what it's worth, the Hasura team recently reached out to us again saying that this change would improve CockroachDB support, though we haven't yet identified the priority.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimizer stuff :lgtm:
@rafiss will you be able to review the rest? Otherwise I can look in more detail at the rest...

Reviewed 1 of 22 files at r3, 6 of 21 files at r7.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rafiss)

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i reviewed the non-opt parts, and they look good to me with one nit inline, and one question -- do you know if the syntax changes will work OK in a mixed version cluster?

as for merging, my feeling is that as long as the above is fine, this seems like a purely additive change and it's known to be highly beneficial for tools we want to support, so perhaps we should make an exception and merge it during the stability period?

Reviewed 1 of 41 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @mgartner)


pkg/sql/parser/sql.y line 365 at r7 (raw file):

    return u.val.(*tree.ColumnTableDef)
}
func (u *sqlSymUnion) colDef() tree.ColumnDef {

should the return values for the two new functions here be pointers?

@jordanlewis
Copy link
Member Author

do you know if the syntax changes will work OK in a mixed version cluster?

The new generator functions will fail at runtime with a somewhat cryptic error "unexpected mismatched types/labels list in json record generator %v %v". I can fix this if you like, I don't think it's really important because it's net new functionality.

we should make an exception and merge it during the stability period?
+1

should the return values for the two new functions here be pointers?

I don't see a strong reason, do you? Lots of similar methods in the parser return values, and the thing being returned here is quite small a struct.

@jordanlewis
Copy link
Member Author

@rafiss is it expected that this PR would edit function OIDs? I'm not familiar with the latest here. Example failure:

        expected:
            upper  upper  upper  829

        but found (query options: "") :
            upper  upper  upper  833

I assume this is because this PR is adding 4 new functions. I'll just amend the PR with the updates, but hoping for confirmation this is all good.

@rafiss
Copy link
Collaborator

rafiss commented Aug 25, 2022

@rafiss
Copy link
Collaborator

rafiss commented Aug 26, 2022

do you know if the syntax changes will work OK in a mixed version cluster?

The new generator functions will fail at runtime with a somewhat cryptic error "unexpected mismatched types/labels list in json record generator %v %v". I can fix this if you like, I don't think it's really important because it's net new functionality.

I also wanted to make sure the syntax changes didn't affect any other existing commands

This commit adds support for the family of four json{,b}_to_record{,set}
functions, which deconstruct JSON input (either an object or an array)
and return a row or set of rows with well-typed elements from the JSON,
governed by the *table alias* that the function is invoked with.

For example, the query below deconstructs the input JSON, returning each
of the keys "requested" by the table alias definition. Missing keys are
replaced with `NULL`.

```
SELECT * FROM json_to_record('{"a": "b", "c": true}') AS t(a TEXT, z INT)
----
b  NULL
```

The logic that governs the type casting from JSON to SQL mimics the
logic in the similar `json_populate_record` family of functions, and
should be identical to Postgres's such logic. It's permissible to use
the virtual table type to deconstruct a sub-object into a record field
within the top level JSON, like this:

```
CREATE TABLE mytable (a INT, b TEXT)

SELECT * FROM json_to_record('{"foo": {"a": 3, "b": "bar"}}') AS t(foo mytable)
----
(3,bar)

```

Functions like these ones that return record types in PostgreSQL must be
aliased to a named and typed tuple (like `AS t(a INT, b INT)`) to be
usable. As a result, this commit:

1. adds parser support for this form of alias, with types
2. pushes the alias information for an aliased expression down the
   optimizer's call stack so that the aliased expression can access the
   alias information. Previously, the aliased expression was blind to
   any aliases that were applied to it.
3. teaches the generator function factories to get the alias information
   plumbed in as well.

Release note (sql change): add the json{,b}_to_record{,set} builtin
functions, which transform JSON into structured SQL records.
@jordanlewis
Copy link
Member Author

I also wanted to make sure the syntax changes didn't affect any other existing commands

They do not.

@jordanlewis
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 26, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 26, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 27, 2022

Build succeeded:

@craig craig bot merged commit 92b550b into cockroachdb:master Aug 27, 2022
@jordanlewis jordanlewis deleted the json-to-record branch November 1, 2022 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: support json_populate_record, json_populate_recordset, json_to_record, json_to_recordset
4 participants