Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: prune partial index columns and simplify partial index projections #58358

Merged
merged 6 commits into from
Jan 7, 2021

Conversation

mgartner
Copy link
Collaborator

Fixes #51623

opt: do not derive prune columns for Upsert, Update, Delete

We no longer derive output prune columns for Upsert, Update, and Delete
ops in DerivePruneCols. There are no PruneCols rules for these
operators, so deriving their prune columns was only performing
unnecessary work. There are other rules that prune the fetch and return
columns for these operators. These rules do not rely on
DerivePruneCols.

Release note: None

sql: remove logic to determine fetch cols in row updater

Previously, the row.MakeUpdater function had logic to determine the
fetch columns required for an update operation. This is not necessary
because the cost based optimizer already determines the necessary fetch
columns and plumbs them to MakeUpdater as the requestedCols
argument.

Release note: None

opt: safer access to partial index predicates in TableMeta

Previously, partial index predicate expressions in TableMeta were the
source-of-truth used within the optimizer to determine if an index is a
partial index. However, partial index predicates are not added to
TableMeta for all types of statements in optbuilder. Therefore, it was
not safe to assume this was a source-of-truth.

This commit unexports the map of partial index predicates in TableMeta.
Access to partial index predicates must now be done via
TableMeta.PartialIndexPredicate. This function checks the catalog to
determine if an index is a partial index, and panics if there is not a
corresponding predicate expression in the partial index predicate map.
This makes the function an actual a source-of-truth.

Release note: None

opt: move addPartialIndexPredicatesForTable to optbuilder/partial_index.go

Release note: None

opt: prune update/upsert fetch columns not needed for partial indexes

Indexed columns of partial indexes are now only fetched for UPDATE and
UPSERT operations when needed. They are pruned in cases where it is
guaranteed that they are not needed to build old or new index entries.
For example, consider the table and UPDATE:

CREATE TABLE t (
  a INT PRIMARY KEY,
  b INT,
  c INT,
  d INT,
  INDEX (b) WHERE c > 0,
  FAMILY (a), FAMILY (b), FAMILY (c), FAMILY (d)
)

UPDATE t SET d = d + 1 WHERE a = 1

The partial index is guaranteed not to change with this UPDATE because
neither its indexed columns not the columns referenced in its predicate
are mutating. Therefore, the existing values of b do not need to be
fetched to maintain the state of the partial index. Furthermore, the
primary index does require the existing values of b because no columns
in b's family are mutating. So, b can be pruned from the UPDATE's fetch
columns.

Release note (performance improvement): Previously, indexed columns of
partial indexes were always fetched for UPDATEs and UPSERTs. Now they
are only fetched if they are required for maintaining the state of the
index. If an UPDATE or UPSERT mutates columns that are neither indexed by a
partial index nor referenced in a partial index predicate, they will no
longer be fetched (assuming that they are not needed to maintain the
state of other indexes, including the primary index).

opt: normalize partial index PUT/DEL projections to false

The SimplifyPartialIndexProjections normalization rule has been added
that normalizes synthesized partial index PUT and DEL columns to False
when it is guaranteed that a mutation will not require changed to the
associated partial index. This normalization can lead to further
normalizations, such as pruning columns that the synthesized projections
relied on.

The motivation for this change is to allow fully disjoint updates to
different columns in the same row, when the columns are split across
different families. By pruning columns not needed to maintain a partial
index, we're not forced to scan all column families. This can ultimately
reduce contention during updates.

Release note (performance improvement): UPDATE and UPSERT operations on
tables with partial indexes no longer evaluate partial index predicate
expressions when it is guaranteed that the operation will not alter the
state of the partial index. In some cases, this can eliminate fetching
the existing value of columns that are referenced in partial index
predicates.

@mgartner mgartner requested a review from a team as a code owner December 30, 2020 01:03
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@mgartner mgartner changed the title Norm partial index prune opt: prune partial index columns and simplify partial index projections Dec 30, 2020
@mgartner mgartner force-pushed the norm-partial-index-prune branch from b5b578a to bde9ef1 Compare December 30, 2020 01:56
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

[nit] in the commit/PR message: "neither its indexed columns not the columns referenced" -> "neither its indexed columns nor the columns referenced"; "will not require changed to the associated partial index." -> "will not require changes to the associated partial index."

Reviewed 2 of 2 files at r1, 2 of 2 files at r2, 9 of 9 files at r3, 2 of 2 files at r4, 7 of 7 files at r5, 5 of 5 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @mgartner, and @RaduBerinde)


pkg/sql/opt/table_meta.go, line 285 at r3 (raw file):

// partial index expressions that have been built for a table when formatting
// opt expressions. Use PartialIndexPredicate in all other cases.
func (tm *TableMeta) PartialIndexPredicatesForFormattingOnly() map[cat.IndexOrdinal]ScalarExpr {

Nice name!


pkg/sql/opt/norm/mutation_funcs.go, line 37 at r6 (raw file):

			updateCols.Add(tabMeta.MetaID.ColumnID(ord))
		}
	}

Isn't there a ToSet function on OptionalColList that @RaduBerinde added recently to do this?


pkg/sql/opt/norm/mutation_funcs.go, line 68 at r6 (raw file):

update requires may update

^ garbled sentence


pkg/sql/opt/norm/mutation_funcs.go, line 95 at r6 (raw file):

		}

		ord++

should this ordinal be incremented for all partial indexes, or only those that are eligible to be simplified? this seems error prone...

would be good to add some test cases with multiple partial indexes, some of which are eligible for simplification and some of which are not


pkg/sql/opt/norm/prune_cols_funcs.go, line 150 at r5 (raw file):

			// update does not require changes to the index. Partial indexes may be
			// updated (even when a column in the index is not changing) when the
			// predicate reference columns that are being updated. For example, rows

[nit] reference -> references


pkg/sql/opt/norm/testdata/rules/mutation, line 62 at r6 (raw file):

# and columns referenced in predicates are not mutating.
norm expect=SimplifyPartialIndexProjections
UPSERT INTO t (k, a, b) VALUES (1, 2, 3)

can you add an upsert test like this but with one of the columns referenced in a predicate that has a default value that satisfies the predicate? (so therefore it would get added to the index on the insert case.)

@mgartner
Copy link
Collaborator Author


pkg/sql/opt/norm/testdata/rules/mutation, line 62 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

can you add an upsert test like this but with one of the columns referenced in a predicate that has a default value that satisfies the predicate? (so therefore it would get added to the index on the insert case.)

Hmm, great point. I added this case, but it's broken because the default column is an insert column, not an update column. I think this proves that this normalization isn't possible for Upserts—the Insert columns will always contain all target table columns. I need to think about whether the prune cols changes in the 5th commit are also restricted to Updates...

Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @RaduBerinde, and @rytaft)


pkg/sql/opt/table_meta.go, line 285 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Nice name!

😄


pkg/sql/opt/norm/mutation_funcs.go, line 37 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Isn't there a ToSet function on OptionalColList that @RaduBerinde added recently to do this?

We're collect the target table columns here, not the update column IDs. Notice how we are converting the ordinal ord to a column id, rather than collecting col. This is tricky, so I've expanded the comment to explain a bit more.


pkg/sql/opt/norm/mutation_funcs.go, line 68 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…
update requires may update

^ garbled sentence

Done.


pkg/sql/opt/norm/mutation_funcs.go, line 95 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

should this ordinal be incremented for all partial indexes, or only those that are eligible to be simplified? this seems error prone...

would be good to add some test cases with multiple partial indexes, some of which are eligible for simplification and some of which are not

Great catch! I've expanded the tests to catch this. Added some non-partial indexes into the mix too for the hell of it. I've moved the incrementing of ord to directly after the partial index check, which hopefully makes this less error-prone to work with in the future.


pkg/sql/opt/norm/prune_cols_funcs.go, line 150 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] reference -> references

Done.


pkg/sql/opt/norm/testdata/rules/mutation, line 62 at r6 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Hmm, great point. I added this case, but it's broken because the default column is an insert column, not an update column. I think this proves that this normalization isn't possible for Upserts—the Insert columns will always contain all target table columns. I need to think about whether the prune cols changes in the 5th commit are also restricted to Updates...

I've updated this rule so it no longer matches Upserts—these projections cannot be simplified because if the Upsert results in an "insert", then the evaluated predicate expressions are required.

The prune cols changes in the 5th commit look to be fine.

@mgartner mgartner force-pushed the norm-partial-index-prune branch 2 times, most recently from 2ff0060 to 952a1b9 Compare January 5, 2021 19:41
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 5 files at r8.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @mgartner, and @RaduBerinde)


pkg/sql/opt/norm/testdata/rules/mutation, line 69 at r8 (raw file):

           └── 2 [as=a_new:21]

# Simplify UPDATE partial index put/del column to false of second partial index

of -> for ?

@mgartner mgartner force-pushed the norm-partial-index-prune branch from 952a1b9 to a70d7aa Compare January 6, 2021 18:14
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @RaduBerinde, and @rytaft)


pkg/sql/opt/norm/testdata/rules/mutation, line 69 at r8 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

of -> for ?

Done.

@mgartner
Copy link
Collaborator Author

mgartner commented Jan 6, 2021

TFTR!

bors r=rytaft

@craig
Copy link
Contributor

craig bot commented Jan 6, 2021

Build failed:

@mgartner mgartner force-pushed the norm-partial-index-prune branch from a70d7aa to e92a8a7 Compare January 6, 2021 19:50
@mgartner
Copy link
Collaborator Author

mgartner commented Jan 6, 2021

bors r=rytaft

@craig
Copy link
Contributor

craig bot commented Jan 6, 2021

Build failed:

We no longer derive output prune columns for Upsert, Update, and Delete
ops in `DerivePruneCols`. There are no PruneCols rules for these
operators, so deriving their prune columns was only performing
unnecessary work. There are other rules that prune the fetch and return
columns for these operators. These rules do not rely on
`DerivePruneCols`.

Release note: None
Previously, the `row.MakeUpdater` function had logic to determine the
fetch columns required for an update operation. This is not necessary
because the cost based optimizer already determines the necessary fetch
columns and plumbs them to `MakeUpdater` as the `requestedCols`
argument.

Release note: None
Previously, partial index predicate expressions in TableMeta were the
source-of-truth used within the optimizer to determine if an index is a
partial index. However, partial index predicates are not added to
TableMeta for all types of statements in optbuilder. Therefore, it was
not safe to assume this was a source-of-truth.

This commit unexports the map of partial index predicates in TableMeta.
Access to partial index predicates must now be done via
`TableMeta.PartialIndexPredicate`. This function checks the catalog to
determine if an index is a partial index, and panics if there is not a
corresponding predicate expression in the partial index predicate map.
This makes the function an actual a source-of-truth.

Release note: None
Indexed columns of partial indexes are now only fetched for UPDATE and
UPSERT operations when needed. They are pruned in cases where it is
guaranteed that they are not needed to build old or new index entries.
For example, consider the table and UPDATE:

    CREATE TABLE t (
      a INT PRIMARY KEY,
      b INT,
      c INT,
      d INT,
      INDEX (b) WHERE c > 0,
      FAMILY (a), FAMILY (b), FAMILY (c), FAMILY (d)
    )

    UPDATE t SET d = d + 1 WHERE a = 1

The partial index is guaranteed not to change with this UPDATE because
neither its indexed columns nor the columns referenced in its predicate
are mutating. Therefore, the existing values of b do not need to be
fetched to maintain the state of the partial index. Furthermore, the
primary index does require the existing values of b because no columns
in b's family are mutating. So, b can be pruned from the UPDATE's fetch
columns.

Release note (performance improvement): Previously, indexed columns of
partial indexes were always fetched for UPDATEs and UPSERTs. Now they
are only fetched if they are required for maintaining the state of the
index. If an UPDATE or UPSERT mutates columns that are neither indexed by a
partial index nor referenced in a partial index predicate, they will no
longer be fetched (assuming that they are not needed to maintain the
state of other indexes, including the primary index).
The `SimplifyPartialIndexProjections` normalization rule has been added
that normalizes synthesized partial index PUT and DEL columns to False
when it is guaranteed that a mutation will not require changes to the
associated partial index. This normalization can lead to further
normalizations, such as pruning columns that the synthesized projections
relied on.

The motivation for this change is to allow fully disjoint updates to
different columns in the same row, when the columns are split across
different families. By pruning columns not needed to maintain a partial
index, we're not forced to scan all column families. This can ultimately
reduce contention during updates.

Release note (performance improvement): UPDATE operations on tables with
partial indexes no longer evaluate partial index predicate expressions
when it is guaranteed that the operation will not alter the state of the
partial index. In some cases, this can eliminate fetching the existing
value of columns that are referenced in partial index predicates.
@mgartner mgartner force-pushed the norm-partial-index-prune branch from e92a8a7 to 039fb1b Compare January 7, 2021 16:53
@mgartner
Copy link
Collaborator Author

mgartner commented Jan 7, 2021

bors r=rytaft

@craig
Copy link
Contributor

craig bot commented Jan 7, 2021

Build succeeded:

@craig craig bot merged commit d8b5cb0 into cockroachdb:master Jan 7, 2021
@mgartner mgartner deleted the norm-partial-index-prune branch January 7, 2021 21:28
Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Late to the party, but :lgtm: Nice work!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @mgartner, @RaduBerinde, and @rytaft)


pkg/sql/opt/optbuilder/partial_index.go, line 35 at r13 (raw file):

// normalize the partial index predicates.
func (b *Builder) addPartialIndexPredicatesForTable(
	tabMeta *opt.TableMeta, scan memo.RelExpr, includeDeletable bool,

Is there any harm to always including any deletable indexes? It would be one less thing to worry about (currently the code that calls this function, and the code that uses index predicates must "match" w.r.t this).

@mgartner
Copy link
Collaborator Author


pkg/sql/opt/optbuilder/partial_index.go, line 35 at r13 (raw file):

Previously, RaduBerinde wrote…

Is there any harm to always including any deletable indexes? It would be one less thing to worry about (currently the code that calls this function, and the code that uses index predicates must "match" w.r.t this).

Good point: #58714

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: prune columns that are guaranteed not to be needed for partial index mutations
4 participants