Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: parser, test catalog, query support for virtual columns #57622

Merged
merged 4 commits into from
Dec 8, 2020

Conversation

RaduBerinde
Copy link
Member

@RaduBerinde RaduBerinde commented Dec 7, 2020

This set of changes add parser, testcat, and query support for virtual columns. I have come full circle and realized that even though user-defined virtual columns don't help with the optimizer-side work for expression-based indexes, they help with integrating with the existing sql code. In particular, we can carve out pieces of work and write more targeted tests related to virtual columns, without having to define them through an expression-based index.

Informs #57608.

sql: support virtual columns in parser

Support VIRTUAL keyword in the parser. We still error out if it is
ever used.

Release note: None

opt: clean up hack in test catalog

To ensure that PK columns are non-nullable, the test catalog
reinitializes them to override the nullable flag. We replace this
hacky mechanism with figuring out the PK columns beforehand and
marking them as non-nullable in the definition before creating the
table columns.

Release note: None

opt: test catalog support for virtual columns

Release note: None

opt: optbuilder support for scanning virtual columns

This change adds support for scanning tables with virtual columns. The
virtual columns are projected on top of the Scan.

Release note: None

@RaduBerinde RaduBerinde requested a review from mgartner December 7, 2020 01:28
@RaduBerinde RaduBerinde requested a review from a team as a code owner December 7, 2020 01:28
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@RaduBerinde RaduBerinde changed the title opt: optbuilder support for scanning virtual columns opt: parser, test catalog, query support for virtual columns Dec 7, 2020
Support VIRTUAL keyword in the parser. We still error out if it is
ever used.

Release note: None
To ensure that PK columns are non-nullable, the test catalog
reinitializes them to override the nullable flag. We replace this
hacky mechanism with figuring out the PK columns beforehand and
marking them as non-nullable in the definition before creating the
table columns.

Release note: None
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! :lgtm:

Reviewed 13 of 13 files at r1, 1 of 1 files at r2, 5 of 5 files at r3, 2 of 2 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde)


pkg/sql/opt/optbuilder/select.go, line 529 at r4 (raw file):

	b.addCheckConstraintsForTable(tabMeta)
	b.addComputedColsForTable(tabMeta)

b.addComputedColsForTable assumes that all ordinary columns are in mb.outScope. I think this is a safe assumption because mb.outScope has all ordinary columns except when building scans for foreign key checks and foreign keys on virtual columns will not be allowed. Does that sound right to you?

We were bit by this for partial indexes (see the long comment below) so I'm hyper-sensitive. Might be worth adding a comment here to explain why this is safe.


pkg/sql/opt/testutils/testcat/create_table.go, line 537 at r3 (raw file):

			false, /* hidden */
			defaultExpr,
			computedExpr,

[nit] make InitNonVirtual accept string arguments for computedExpr and defaultExpr rather than *string to make it consistent with InitVirtualComputed.


pkg/sql/opt/testutils/testcat/testdata/index, line 134 at r3 (raw file):

 │    └── x int not null
 ├── INDEX idx2
 │    ├── idx_expr_1 string as (lower(z)) virtual [hidden] [virtual-computed]

Is it redundant to have virtual and [virtual-computed]?

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)


pkg/sql/opt/optbuilder/select.go, line 529 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

b.addComputedColsForTable assumes that all ordinary columns are in mb.outScope. I think this is a safe assumption because mb.outScope has all ordinary columns except when building scans for foreign key checks and foreign keys on virtual columns will not be allowed. Does that sound right to you?

We were bit by this for partial indexes (see the long comment below) so I'm hyper-sensitive. Might be worth adding a comment here to explain why this is safe.

I think at first we won't allow computed/default/check expressions to refer to virtual columns (I have it on my checklist in #57608. to actually check for these).

I am not sure I understand the issue with addComputedColsForTable.. it looks like it creates its own scope that contains all ordinary columns? It doesn't even have access to outScope..


pkg/sql/opt/testutils/testcat/create_table.go, line 537 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

[nit] make InitNonVirtual accept string arguments for computedExpr and defaultExpr rather than *string to make it consistent with InitVirtualComputed.

They are optional in the non-virtual case, but computedExpr is not optional for InitVirtualComputed.


pkg/sql/opt/testutils/testcat/testdata/index, line 134 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Is it redundant to have virtual and [virtual-computed]?

Sure looks like it :) Fixed.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @RaduBerinde)


pkg/sql/opt/optbuilder/select.go, line 529 at r4 (raw file):

Previously, RaduBerinde wrote…

I think at first we won't allow computed/default/check expressions to refer to virtual columns (I have it on my checklist in #57608. to actually check for these).

I am not sure I understand the issue with addComputedColsForTable.. it looks like it creates its own scope that contains all ordinary columns? It doesn't even have access to outScope..

It will created its own scope with all ordinary columns, but the scan only output tabColIDs which are a subset of the table columns defined by the ordinals argument. So these expressions could have outer columns that can't be resolved.

@RaduBerinde
Copy link
Member Author


pkg/sql/opt/optbuilder/select.go, line 529 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

It will created its own scope with all ordinary columns, but the scan only output tabColIDs which are a subset of the table columns defined by the ordinals argument. So these expressions could have outer columns that can't be resolved.

Sure.. but this is not the concern of the code here. We are just building these expressions here "generically" to avoid building them in multiple places. They are a property of the table, not really of the scan. It's up to the code which uses them to make sure it uses them in a way that makes sense.

@RaduBerinde
Copy link
Member Author


pkg/sql/opt/optbuilder/select.go, line 529 at r4 (raw file):

Previously, RaduBerinde wrote…

Sure.. but this is not the concern of the code here. We are just building these expressions here "generically" to avoid building them in multiple places. They are a property of the table, not really of the scan. It's up to the code which uses them to make sure it uses them in a way that makes sense.

Perhaps if we don't scan all the ordinary columns, that is an indication that we won't / shouldn't need these and we shouldn't build them? But it's more error-prone than saying that every tabMeta should have them.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @RaduBerinde)


pkg/sql/opt/optbuilder/select.go, line 529 at r4 (raw file):

Previously, RaduBerinde wrote…

Perhaps if we don't scan all the ordinary columns, that is an indication that we won't / shouldn't need these and we shouldn't build them? But it's more error-prone than saying that every tabMeta should have them.

Hmm. Then maybe the fix for the similar issue with partial indexes is not the right fix? See 7ca00cc

I don't feel strongly how we do it, but we have to prevent projecting a virtual computed column expression that references columns that aren't output by the child Scan. Are you suggesting that it's the responsibility of the caller of buildScan not to pass bad ordinals?


pkg/sql/opt/testutils/testcat/create_table.go, line 537 at r3 (raw file):

Previously, RaduBerinde wrote…

They are optional in the non-virtual case, but computedExpr is not optional for InitVirtualComputed.

Oops. Nevermind then!

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)


pkg/sql/opt/optbuilder/select.go, line 529 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Hmm. Then maybe the fix for the similar issue with partial indexes is not the right fix? See 7ca00cc

I don't feel strongly how we do it, but we have to prevent projecting a virtual computed column expression that references columns that aren't output by the child Scan. Are you suggesting that it's the responsibility of the caller of buildScan not to pass bad ordinals?

Hm I see.. But unlike addComputedColsForTable, addPartialIndexPredicatesForTable uses the outScope so the situation is different.

Now that I'm getting more reacquainted with this code, can't addPartialIndexPredicatesForTable also create its own scope with all the ordinary table columns? We may be relying on not adding them to tabMeta in certain cases, but if that's the case the check should happen where we would use them (by consulting the outer cols).

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)


pkg/sql/opt/optbuilder/select.go, line 529 at r4 (raw file):

Previously, RaduBerinde wrote…

Hm I see.. But unlike addComputedColsForTable, addPartialIndexPredicatesForTable uses the outScope so the situation is different.

Now that I'm getting more reacquainted with this code, can't addPartialIndexPredicatesForTable also create its own scope with all the ordinary table columns? We may be relying on not adding them to tabMeta in certain cases, but if that's the case the check should happen where we would use them (by consulting the outer cols).

Talking about it in terms of API, ComputedCols is a member of TableMeta defined as containing ScalarExprs for computed column expressions which refer to table columns using the table column IDs for this meta entry. The buildScan code just initializes it that way. After that, you do what you want with it. If you don't remap the column IDs or use it in a Project on top of an expression that doesn't produce all the necessary IDs, that's on you - it has nothing to do with the context of why the TableMeta was initialized in the first place.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @RaduBerinde)


pkg/sql/opt/optbuilder/select.go, line 529 at r4 (raw file):

Previously, RaduBerinde wrote…

Talking about it in terms of API, ComputedCols is a member of TableMeta defined as containing ScalarExprs for computed column expressions which refer to table columns using the table column IDs for this meta entry. The buildScan code just initializes it that way. After that, you do what you want with it. If you don't remap the column IDs or use it in a Project on top of an expression that doesn't produce all the necessary IDs, that's on you - it has nothing to do with the context of why the TableMeta was initialized in the first place.

Ahh I see. I like this definition. I'll work on making that change for addPartialIndexPredicatesForTable and checking the outer cols where it's used. I think the only place is in scan_index_iter.go. I think that will clean things up and be less error prone.

So nothing needs to change for addComputedColsForTable, but just below here tabMeta.ComputedCols is accessed, so I guess that's the place we'd check the outer cols, if it's necessary. And I don't think its necessary unless FKs can reference virtual columns, correct? I guess that's originally what I was trying to call out, but explained my thoughts poorly.

@RaduBerinde RaduBerinde force-pushed the virtual-columns branch 2 times, most recently from 34374de to 5c6d900 Compare December 7, 2020 21:37
Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)


pkg/sql/opt/optbuilder/select.go, line 529 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Ahh I see. I like this definition. I'll work on making that change for addPartialIndexPredicatesForTable and checking the outer cols where it's used. I think the only place is in scan_index_iter.go. I think that will clean things up and be less error prone.

So nothing needs to change for addComputedColsForTable, but just below here tabMeta.ComputedCols is accessed, so I guess that's the place we'd check the outer cols, if it's necessary. And I don't think its necessary unless FKs can reference virtual columns, correct? I guess that's originally what I was trying to call out, but explained my thoughts poorly.

Ohh, yeah, good point. I added a comment to the function and added an assertion (and constructed the ProjectionsItem properly while I'm at it).

I'll think about how to improve this API. I think maybe the cases where we scan specific ordinals should use a separate simplified path.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 13 of 13 files at r1, 1 of 1 files at r2, 2 of 5 files at r3, 5 of 5 files at r5, 2 of 2 files at r6.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @RaduBerinde)


pkg/sql/opt/optbuilder/select.go, line 544 at r6 (raw file):

		virtualColIDs.ForEach(func(col opt.ColumnID) {
			item := b.factory.ConstructProjectionsItem(tabMeta.ComputedCols[col], col)
			if !item.ScalarProps().OuterCols.SubsetOf(tabColIDs) {

Is it not possible for one virtual column to depend on another? (Maybe you discussed this above, but wouldn't hurt to add an explicit comment or TODO)

This change adds support for scanning tables with virtual columns. The
virtual columns are projected on top of the Scan.

Release note: None
Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner and @rytaft)


pkg/sql/opt/optbuilder/select.go, line 544 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Is it not possible for one virtual column to depend on another? (Maybe you discussed this above, but wouldn't hurt to add an explicit comment or TODO)

Added a TODO. It will be an internal error for now. The checklist in #57608 tracks either supporting this case, or disallowing the creation of such columns in the first place.

@RaduBerinde
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Dec 8, 2020

Build succeeded:

@craig craig bot merged commit 09aa7c8 into cockroachdb:master Dec 8, 2020
mgartner added a commit to mgartner/cockroach that referenced this pull request Dec 8, 2020
Previously, partial index predicates were only built and added to
`TableMeta.PartialIndexPredicates` if the scope of the scan in
`Builder.buildScan` included all ordinary table columns. This behavior
prevented errors when building partial index predicate expressions for
foriegn key check scans. These scans do not include all columns, so when
building a predicate expression that referenced a table column not
included, optbuilder would err with "column does not exist".

This commit changes this behavior so that partial index predicates are
always built and added to `TableMeta.PartialIndexPredicates`. In order
to do this, `Builder.addPartialIndexPredicatesForTable` creates its own
table scope. It also constructs a new scan rather than relying on the
already-built scan. Constructing a scan is required for considering
logical properties while normalizing partial index predicates.

See discussion at cockroachdb#57622
for more context on this change.

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this pull request Dec 10, 2020
Previously, partial index predicates were only built and added to
`TableMeta.PartialIndexPredicates` if the scope of the scan in
`Builder.buildScan` included all ordinary table columns. This behavior
prevented errors when building partial index predicate expressions for
foriegn key check scans. These scans do not include all columns, so when
building a predicate expression that referenced a table column not
included, optbuilder would err with "column does not exist".

This commit changes this behavior so that partial index predicates are
always built and added to `TableMeta.PartialIndexPredicates`. In order
to do this, `Builder.addPartialIndexPredicatesForTable` creates its own
table scope. It also constructs a new scan if the provided scan does not
output all column in the table scope. A scan on the table and its
logical properties are required in order to fully normalize partial
index predicates.

See discussion at cockroachdb#57622
for more context on this change.

Release note: None
craig bot pushed a commit that referenced this pull request Dec 11, 2020
57707: opt: build all partial index predicate expressions in TableMeta r=mgartner a=mgartner

#### opt: build all partial index predicate expressions in TableMeta

Previously, partial index predicates were only built and added to
`TableMeta.PartialIndexPredicates` if the scope of the scan in
`Builder.buildScan` included all ordinary table columns. This behavior
prevented errors when building partial index predicate expressions for
foriegn key check scans. These scans do not include all columns, so when
building a predicate expression that referenced a table column not
included, optbuilder would err with "column does not exist".

This commit changes this behavior so that partial index predicates are
always built and added to `TableMeta.PartialIndexPredicates`. In order
to do this, `Builder.addPartialIndexPredicatesForTable` creates its own
table scope. It also constructs a new scan rather than relying on the
already-built scan. Constructing a scan is required for considering
logical properties while normalizing partial index predicates.

See discussion at #57622
for more context on this change.

Release note: None

#### optbuilder: reduce redundant building of arbiter filter expressions

Previously, the arbiter predicate provided in the `WHERE` clause of an
`INSERT ... ON CONFLICT` statement was rebuilt as a `memo.FiltersExpr`
repetitively while the optbuilder determined arbiter indexes. Now, the
expression is built only once, reducing work.

Release note: None

Co-authored-by: Marcus Gartner <[email protected]>
@RaduBerinde RaduBerinde deleted the virtual-columns branch December 16, 2020 22:58
mgartner added a commit to mgartner/cockroach that referenced this pull request Dec 28, 2020
Previously, partial index predicates were only built and added to
`TableMeta.PartialIndexPredicates` if the scope of the scan in
`Builder.buildScan` included all ordinary table columns. This behavior
prevented errors when building partial index predicate expressions for
foriegn key check scans. These scans do not include all columns, so when
building a predicate expression that referenced a table column not
included, optbuilder would err with "column does not exist".

This commit changes this behavior so that partial index predicates are
always built and added to `TableMeta.PartialIndexPredicates`. In order
to do this, `Builder.addPartialIndexPredicatesForTable` creates its own
table scope. It also constructs a new scan if the provided scan does not
output all column in the table scope. A scan on the table and its
logical properties are required in order to fully normalize partial
index predicates.

See discussion at cockroachdb#57622
for more context on this change.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants