Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: derive constant computed columns for index selection #43450

Merged
merged 2 commits into from
Dec 26, 2019

Conversation

andy-kimball
Copy link
Contributor

The optimizer uses explicitly specified filter constraints to qualify
available indexes during the exploration phase. It also uses implicit
filter constraints derived from table check constraints.

This commit adds new implicit filter constraints based on constant
computed columns. Constant computed columns are based on other columns in
the table that are constrained to be constant by other filters. For
example:

CREATE TABLE hashed (
k STRING,
hash INT AS (fnv32(k) % 4) STORED,
INDEX hash_index (hash, k)
)

SELECT * FROM hashed WHERE k = 'andy'

Here, the value of the hash column can be computed at query build time,
and therefore "hash_index" selected as the lowest cost index. The resulting
plan would be:

scan hashed@secondary
├── columns: k:1(string!null) hash:2(int)
├── constraint: /2/1/3: [/1/'andy' - /1/'andy']
└── fd: ()-->(1)

This improved ability to select indexes is useful for implementing HASH
indexes, which scatter keys across N buckets (see Issue #39340).

Release note (sql change): The optimizer can now derive constant computed
columns during index selection. This enables more efficient HASH indexes.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Isolate method movements in this commit so it makes subsequent
commit easier to understand.

Release note: None
@andy-kimball andy-kimball requested a review from a team as a code owner December 22, 2019 02:20
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool functionality and the code is very clean! :lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @justinj, @RaduBerinde, and @rytaft)


pkg/sql/opt/constraint/span.go, line 76 at r2 (raw file):

// the start key is the same as the end key, and both boundaries are inclusive.
func (sp *Span) HasSingleKey(evalCtx *tree.EvalContext) bool {
	if sp.start.IsEmpty() || sp.end.IsEmpty() {

[nit] if we move the Length check first, we only need to check one of these


pkg/sql/opt/xform/custom_funcs.go, line 567 at r2 (raw file):

	}

	var replace func(e opt.Expr) opt.Expr

Calling Replace when we don't end up folding to a constant can create a bunch of garbage. We could do these checks first using the OuterCols of the scalar expression, and only then call Replace (when we know for sure that all outer cols are in constCols)

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some interaction between this and #43405 (we need to handle these new expressions). I'm happy to update my PR once this goes in.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @justinj, and @rytaft)

Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @justinj, @RaduBerinde, and @rytaft)


pkg/sql/opt/xform/custom_funcs.go, line 567 at r2 (raw file):

Previously, RaduBerinde wrote…

Calling Replace when we don't end up folding to a constant can create a bunch of garbage. We could do these checks first using the OuterCols of the scalar expression, and only then call Replace (when we know for sure that all outer cols are in constCols)

Checking outer columns is a bit of a pain in this context, because we don't have props for the expression; we'd need to walk the expression to gather them.

I'll point out that there's only unnecessary garbage in the case where we perform at least one variable replacement, but then can't fully fold the entire expression. I'd think that was an edge case, like if there are multiple variables, or if there is a side-effecting function. If we can fold at least one variable in the expression, then it's likely the entire expression can be folded (since vast majority of check expressions ref a single variable).

@RaduBerinde
Copy link
Member


pkg/sql/opt/xform/custom_funcs.go, line 567 at r2 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Checking outer columns is a bit of a pain in this context, because we don't have props for the expression; we'd need to walk the expression to gather them.

I'll point out that there's only unnecessary garbage in the case where we perform at least one variable replacement, but then can't fully fold the entire expression. I'd think that was an edge case, like if there are multiple variables, or if there is a side-effecting function. If we can fold at least one variable in the expression, then it's likely the entire expression can be folded (since vast majority of check expressions ref a single variable).

It will try to construct existing expressions, right? Last time I looked at the factory code, it seemed like it would allocate in that case as well (but maybe I'm wrong?)

Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @justinj, @RaduBerinde, and @rytaft)


pkg/sql/opt/xform/custom_funcs.go, line 567 at r2 (raw file):

Previously, RaduBerinde wrote…

It will try to construct existing expressions, right? Last time I looked at the factory code, it seemed like it would allocate in that case as well (but maybe I'm wrong?)

The CopyAndReplace methods will first copy, then replace. The Replace methods will only copy the parent if at least one child has been replaced.

@RaduBerinde
Copy link
Member


pkg/sql/opt/xform/custom_funcs.go, line 567 at r2 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

The CopyAndReplace methods will first copy, then replace. The Replace methods will only copy the parent if at least one child has been replaced.

Ah, nice.

The optimizer uses explicitly specified filter constraints to qualify
available indexes during the exploration phase. It also uses implicit
filter constraints derived from table check constraints.

This commit adds new implicit filter constraints based on constant
computed columns. Constant computed columns are based on other columns in
the table that are constrained to be constant by other filters. For
example:

  CREATE TABLE hashed (
    k STRING,
    hash INT AS (fnv32(k) % 4) STORED,
    INDEX hash_index (hash, k)
  )

  SELECT * FROM hashed WHERE k = 'andy'

Here, the value of the hash column can be computed at query build time,
and therefore "hash_index" selected as the lowest cost index. The resulting
plan would be:

  scan hashed@secondary
   ├── columns: k:1(string!null) hash:2(int)
   ├── constraint: /2/1/3: [/1/'andy' - /1/'andy']
   └── fd: ()-->(1)

This improved ability to select indexes is useful for implementing HASH
indexes, which scatter keys across N buckets (see Issue cockroachdb#39340).

Release note (sql change): The optimizer can now derive constant computed
columns during index selection. This enables more efficient HASH indexes.
@andy-kimball
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request Dec 26, 2019
43450: opt: derive constant computed columns for index selection r=andy-kimball a=andy-kimball

The optimizer uses explicitly specified filter constraints to qualify
available indexes during the exploration phase. It also uses implicit
filter constraints derived from table check constraints.

This commit adds new implicit filter constraints based on constant
computed columns. Constant computed columns are based on other columns in
the table that are constrained to be constant by other filters. For
example:

  CREATE TABLE hashed (
    k STRING,
    hash INT AS (fnv32(k) % 4) STORED,
    INDEX hash_index (hash, k)
  )

  SELECT * FROM hashed WHERE k = 'andy'

Here, the value of the hash column can be computed at query build time,
and therefore "hash_index" selected as the lowest cost index. The resulting
plan would be:

  scan hashed@secondary
   ├── columns: k:1(string!null) hash:2(int)
   ├── constraint: /2/1/3: [/1/'andy' - /1/'andy']
   └── fd: ()-->(1)

This improved ability to select indexes is useful for implementing HASH
indexes, which scatter keys across N buckets (see Issue #39340).

Release note (sql change): The optimizer can now derive constant computed
columns during index selection. This enables more efficient HASH indexes.

Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Dec 26, 2019

Build succeeded

@craig craig bot merged commit 2cfa5e8 into cockroachdb:master Dec 26, 2019
@andy-kimball andy-kimball deleted the computed branch December 26, 2019 15:46
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @justinj, and @rytaft)


pkg/sql/opt/xform/custom_funcs.go, line 558 at r3 (raw file):

// ID into a constant value, by evaluating it with respect to a set of other
// columns that are constant. If the computed column is constant, enter it into
// the constCols map and return false. Otherwise, return false.

[nit] I think you meant "enter it into the constCols map and return true"


pkg/sql/opt/xform/testdata/rules/computed, line 102 at r3 (raw file):

      └── (k_int = 2) OR (k_int = 3) [type=bool, outer=(1), constraints=(/1: [/2 - /2] [/3 - /3])]

# Don't constrain the index for a NULL value.

Why not?

mgartner added a commit to mgartner/cockroach that referenced this pull request Jul 1, 2022
The optimizer can generate constrained scans over indexes on computed
columns when columns referenced in the computed column expression are
held constant. Consider this example:

    CREATE TABLE t (a INT, v INT AS (a + 1) STORED, INDEX v_idx (v))
    SELECT * FROM t WHERE a = 1

A constrained scan can be generated over `v_idx` because `v` depends on
`a` and the query filter holds `a` constant.

This commit lifts a restriction that prevented this optimization when
columns referenced in the computed column expression were held constant
to the `NULL` value. As far as I can tell, this restriction is not
necessary. In fact, @rytaft had questioned its purpose originally, but
the question was never answered:

cockroachdb#43450 (review)

By lifting this restriction, the optimizer can explore constrained scans
over both indexed computed columns with `IS NULL` expressions and
expression indexes with `IS NULL` expressions.

Fixes cockroachdb#83390

Release note (performance improvement): The optimizer now explores more
efficient query plans when index computed columns and expressions have
`IS NULL` expressions.
craig bot pushed a commit that referenced this pull request Jul 11, 2022
83619: opt: constrain expression indexes with IS NULL expressions r=mgartner a=mgartner

The optimizer can generate constrained scans over indexes on computed
columns when columns referenced in the computed column expression are
held constant. Consider this example:

    CREATE TABLE t (a INT, v INT AS (a + 1) STORED, INDEX v_idx (v))
    SELECT * FROM t WHERE a = 1

A constrained scan can be generated over `v_idx` because `v` depends on
`a` and the query filter holds `a` constant.

This commit lifts a restriction that prevented this optimization when
columns referenced in the computed column expression were held constant
to the `NULL` value. As far as I can tell, this restriction is not
necessary. In fact, `@rytaft` had questioned its purpose originally, but
the question was never answered:

#43450 (review)

By lifting this restriction, the optimizer can explore constrained scans
over both indexed computed columns with `IS NULL` expressions and
expression indexes with `IS NULL` expressions.

Fixes #83390

Release note (performance improvement): The optimizer now explores more
efficient query plans when index computed columns and expressions have
`IS NULL` expressions.

84084: bazel: new versions of prebuilt `c-deps` r=srosenberg a=rickystewart

Rebuild these archives to pull in
`52a3a0aa8a707f9bb03802186da0c60b715ed9ce` (change to `jemalloc` to
build without `MADV_FREE`).

Release note: None

84088: ui: fix alignment on custom scale r=maryliag a=maryliag

The check for valid options with the
removal of some options on #83229 didn't took
the custom values into consideration.
This commit add the option back, allowing the alignment
on custom values.

Release note (bug fix): Custom time period selection is now aligning
between Metrics and SQL Activity page.

84155: sql/schemachanger/scbuild: minor cleanup r=ajwerner a=ajwerner

Improves the error handling a tad to make runtime errors and assertion failure.
Fixes a typo.

Release note: None

Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
Co-authored-by: Marylia Gutierrez <marylia@cockroachlabs.com>
Co-authored-by: Andrew Werner <awerner32@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants