Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt, cat: cleaner API for mutation columns #51400

Merged
merged 3 commits into from
Jul 15, 2020

Conversation

RaduBerinde
Copy link
Member

NOTE I used AllColumnCount so that the compiler forces me to look at all callsites, and left it like this so they are also apparent in the diff. I intend to rename it to ColumnCount before merging.

The catalog currently relies on a particular ordering to separate
different kinds of columns (ordinary, write-only, delete-only). This
is unfortunate because it restricts the underlying implementation; in
particular, it makes it tricky to add a new kind of column (like the
mvcc_timestamp system column) without potentially breaking existing
code in subtle ways.

This change reworks the cat API to have a single ColumnCount method
and a separate ColumnKind method used to figure out the column kind.
This forces all code that cares about the kind to check it explicitly
rather than relying on a potentially faulty column count.

Fixes #51323.

Release note: None

@RaduBerinde RaduBerinde requested a review from andy-kimball July 14, 2020 00:19
@RaduBerinde RaduBerinde requested a review from a team as a code owner July 14, 2020 00:19
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with some nits.

@@ -122,15 +122,15 @@ func ResolveTableIndex(
}

// ConvertColumnIDsToOrdinals converts a list of ColumnIDs (such as from a
// tree.TableRef), to a list of ordinal positions of columns within the given
// table. See tree.Table for more information on column ordinals.
// tree.TableRef), to a list of ordinal positions of non-mutation columns within
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like an API bug that this only looks at non-mutation cols. Why not let it handle mutation cols as well, since the API is generic? I don't think any existing caller calls it with mutation cols, so there should be no panics.

@@ -147,8 +147,8 @@ func ConvertColumnIDsToOrdinals(tab Table, columns []tree.ColumnID) (ordinals []
// FindTableColumnByName returns the ordinal of the non-mutation column having
// the given name, if one exists in the given table. Otherwise, it returns -1.
func FindTableColumnByName(tab Table, name tree.Name) int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this return mutation cols as well and require caller to test for mutation/system cols? The more col types we add, the more confusing it is when some generic-sounding APIs filter cols and others don't.

for i, n := 0, tab.ColumnCount(); i < n; i++ {
if private.IsColumnOutput(i) {
for i, n := 0, tab.AllColumnCount(); i < n; i++ {
if !cat.IsMutationColumn(tab, i) && private.IsColumnOutput(i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think we can just get rid of the comment and the check for IsMutationColumn. private.IsColumnOutput should always return false for mutation columns, or else there's a bug somewhere else in the code. We always set private.ReturnCols entries to zero for mutation cols.

@@ -1289,7 +1289,10 @@ func (b *logicalPropsBuilder) buildMutationProps(mutation RelExpr, rel *props.Re
// null or the corresponding insert and fetch/update columns are not null. In
// other words, if either the source or destination column is not null, then
// the column must be not null.
for i, n := 0, tab.ColumnCount(); i < n; i++ {
for i, n := 0, tab.AllColumnCount(); i < n; i++ {
if cat.IsMutationColumn(tab, i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OutputCols check should already take care of this case, since mutation cols will never be output cols by the check above. I think it's good to let the mutation builder decide which cols are output or not, and let this props builder code just blindly process the cols without checking again.

for i, cnt := 0, baseTable.ColumnCount(); i < cnt; i++ {
unfilteredCols.Add(t.Table.ColumnID(i))
for i, cnt := 0, baseTable.AllColumnCount(); i < cnt; i++ {
if !cat.IsMutationColumn(baseTable, i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any harm in adding mutation cols here. I think Drew just used ColumnCount b/c it seemed like the right thing. But really we want all columns.

@@ -99,7 +99,7 @@ func TestGetStatsFromConstraint(t *testing.T) {
t.Helper()

var cols opt.ColSet
for i := 0; i < tab.ColumnCount(); i++ {
for i := 0; i < tab.AllColumnCount(); i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how mutation cols play with stats. A good example of a case where using ColumnCount can cause us to not consider a case that is potentially interesting to think about, or at least document.

@@ -1054,9 +1057,9 @@ func (mb *mutationBuilder) mapColumnNamesToOrdinals(names tree.NameList) util.Fa
var ords util.FastIntSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add to the mapColumnNamesToOrdinals to make it clear that it skips mutation columns?

@@ -825,11 +829,14 @@ func (mb *mutationBuilder) makeMutationPrivate(needResults bool) *memo.MutationP
}

if needResults {
// Only non-mutation columns are output columns. ReturnCols needs to have
// DeletableColumnCount entries, but only the first ColumnCount entries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update the comment to remove ref to DeletableColumnCount.

@@ -577,8 +578,8 @@ func (b *Builder) addCheckConstraintsForTable(tabMeta *opt.TableMeta) {

// Find the non-nullable table columns.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add parenthetical to the comment, saying that mutation cols can be NULL during backfill, so they should be excluded.

@@ -626,9 +627,9 @@ func (b *Builder) addCheckConstraintsForTable(tabMeta *opt.TableMeta) {
func (b *Builder) addComputedColsForTable(tabMeta *opt.TableMeta) {
var tableScope *scope
tab := tabMeta.Table
for i, n := 0, tab.ColumnCount(); i < n; i++ {
for i, n := 0, tab.AllColumnCount(); i < n; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment to explain why mutation cols should not be included here.

@rohany
Copy link
Contributor

rohany commented Jul 14, 2020

From an outside perspective, I like this a lot. It's much more explicit about what count to use/what columns are considered in an operation, and will also make it easier for me to catch all of the cases where we should or shouldn't include system columns.

The catalog currently relies on a particular ordering to separate
different kinds of columns (ordinary, write-only, delete-only). This
is unfortunate because it restricts the underlying implementation; in
particular, it makes it tricky to add a new kind of column (like the
`mvcc_timestamp` system column) without potentially breaking existing
code in subtle ways.

This change reworks the cat API to have a single `ColumnCount` method
and a separate `ColumnKind` method used to figure out the column kind.
This forces all code that cares about the kind to check it explicitly
rather than relying on a potentially faulty column count.

Fixes cockroachdb#51323.

Release note: None
Also add some tests that fail if we don't ignore mutation columns.

Release note: None
Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @mgartner, @RaduBerinde, @rohany, and @rytaft)


pkg/sql/opt/cat/utils.go, line 125 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

It seems like an API bug that this only looks at non-mutation cols. Why not let it handle mutation cols as well, since the API is generic? I don't think any existing caller calls it with mutation cols, so there should be no panics.

The caller does care - this is used when referencing a table by ID. Without the check, we allow a mutation column (which in practice leads to a more obscure error a bit later). I added some tests and I moved this into the optbuilder as it's specific to building numeric references.


pkg/sql/opt/cat/utils.go, line 149 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Should we make this return mutation cols as well and require caller to test for mutation/system cols? The more col types we add, the more confusing it is when some generic-sounding APIs filter cols and others don't.

Moved to optbuilder, renamed to findPublicTableColumnByName.


pkg/sql/opt/memo/logical_props_builder.go, line 1272 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Actually, I think we can just get rid of the comment and the check for IsMutationColumn. private.IsColumnOutput should always return false for mutation columns, or else there's a bug somewhere else in the code. We always set private.ReturnCols entries to zero for mutation cols.

Done.


pkg/sql/opt/memo/logical_props_builder.go, line 1293 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

The OutputCols check should already take care of this case, since mutation cols will never be output cols by the check above. I think it's good to let the mutation builder decide which cols are output or not, and let this props builder code just blindly process the cols without checking again.

Done.


pkg/sql/opt/memo/multiplicity_builder.go, line 122 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I don't see any harm in adding mutation cols here. I think Drew just used ColumnCount b/c it seemed like the right thing. But really we want all columns.

Done.


pkg/sql/opt/optbuilder/insert.go, line 1057 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Can you add to the mapColumnNamesToOrdinals to make it clear that it skips mutation columns?

Done.


pkg/sql/opt/optbuilder/mutation_builder.go, line 832 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Need to update the comment to remove ref to DeletableColumnCount.

Done.


pkg/sql/opt/optbuilder/select.go, line 579 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Add parenthetical to the comment, saying that mutation cols can be NULL during backfill, so they should be excluded.

Done.


pkg/sql/opt/optbuilder/select.go, line 630 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Add comment to explain why mutation cols should not be included here.

Done.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I think we currently do something similar with the Index catalog, so it'd be great to apply similar changes there at some point.

Reviewed 10 of 20 files at r1, 4 of 8 files at r2, 3 of 4 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @mgartner, @RaduBerinde, @rohany, and @rytaft)

@RaduBerinde
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 15, 2020

Build failed

@RaduBerinde
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 15, 2020

Build succeeded

@craig craig bot merged commit 8c01a02 into cockroachdb:master Jul 15, 2020
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this pull request Jul 15, 2020
This change renames AllColumnCount to ColumnCount. In cockroachdb#51400 I meant
to rename before merging but I forgot.

Release note: None
craig bot pushed a commit that referenced this pull request Jul 16, 2020
51462: storage: optimize MVCCGarbageCollect for large number of undeleted version r=itsbilal a=ajwerner

This is a better-tested take-2 of #51184.

* The first commit introduces a `SupportsPrev` method to the `Iterator` interface.
* The second commit adjusts the benchmark to use a regular batch.
* The third commit adjusts the logic in #51184 to fix the bugs there and adds testing.

The win is now only pronounced on pebble, as might be expected. 

51489: opt, cat: rename Table.AllColumnCount to Table.ColumnCount r=RaduBerinde a=RaduBerinde

This change renames AllColumnCount to ColumnCount. In #51400 I meant
to rename before merging but I forgot.

Release note: None

Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
@RaduBerinde RaduBerinde deleted the special-cols branch July 18, 2020 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: investigate clean-up of special column handling
5 participants