-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: derive constant computed columns for index selection #43450
Conversation
Isolate method movements in this commit so it makes subsequent commit easier to understand. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool functionality and the code is very clean!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @justinj, @RaduBerinde, and @rytaft)
pkg/sql/opt/constraint/span.go, line 76 at r2 (raw file):
// the start key is the same as the end key, and both boundaries are inclusive. func (sp *Span) HasSingleKey(evalCtx *tree.EvalContext) bool { if sp.start.IsEmpty() || sp.end.IsEmpty() {
[nit] if we move the Length check first, we only need to check one of these
pkg/sql/opt/xform/custom_funcs.go, line 567 at r2 (raw file):
} var replace func(e opt.Expr) opt.Expr
Calling Replace
when we don't end up folding to a constant can create a bunch of garbage. We could do these checks first using the OuterCols of the scalar expression, and only then call Replace
(when we know for sure that all outer cols are in constCols
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some interaction between this and #43405 (we need to handle these new expressions). I'm happy to update my PR once this goes in.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @justinj, and @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @justinj, @RaduBerinde, and @rytaft)
pkg/sql/opt/xform/custom_funcs.go, line 567 at r2 (raw file):
Previously, RaduBerinde wrote…
Calling
Replace
when we don't end up folding to a constant can create a bunch of garbage. We could do these checks first using the OuterCols of the scalar expression, and only then callReplace
(when we know for sure that all outer cols are inconstCols
)
Checking outer columns is a bit of a pain in this context, because we don't have props for the expression; we'd need to walk the expression to gather them.
I'll point out that there's only unnecessary garbage in the case where we perform at least one variable replacement, but then can't fully fold the entire expression. I'd think that was an edge case, like if there are multiple variables, or if there is a side-effecting function. If we can fold at least one variable in the expression, then it's likely the entire expression can be folded (since vast majority of check expressions ref a single variable).
bdc43a5
to
fc4670c
Compare
pkg/sql/opt/xform/custom_funcs.go, line 567 at r2 (raw file): Previously, andy-kimball (Andy Kimball) wrote…
It will try to construct existing expressions, right? Last time I looked at the factory code, it seemed like it would allocate in that case as well (but maybe I'm wrong?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @justinj, @RaduBerinde, and @rytaft)
pkg/sql/opt/xform/custom_funcs.go, line 567 at r2 (raw file):
Previously, RaduBerinde wrote…
It will try to construct existing expressions, right? Last time I looked at the factory code, it seemed like it would allocate in that case as well (but maybe I'm wrong?)
The CopyAndReplace
methods will first copy, then replace. The Replace
methods will only copy the parent if at least one child has been replaced.
pkg/sql/opt/xform/custom_funcs.go, line 567 at r2 (raw file): Previously, andy-kimball (Andy Kimball) wrote…
Ah, nice. |
The optimizer uses explicitly specified filter constraints to qualify available indexes during the exploration phase. It also uses implicit filter constraints derived from table check constraints. This commit adds new implicit filter constraints based on constant computed columns. Constant computed columns are based on other columns in the table that are constrained to be constant by other filters. For example: CREATE TABLE hashed ( k STRING, hash INT AS (fnv32(k) % 4) STORED, INDEX hash_index (hash, k) ) SELECT * FROM hashed WHERE k = 'andy' Here, the value of the hash column can be computed at query build time, and therefore "hash_index" selected as the lowest cost index. The resulting plan would be: scan hashed@secondary ├── columns: k:1(string!null) hash:2(int) ├── constraint: /2/1/3: [/1/'andy' - /1/'andy'] └── fd: ()-->(1) This improved ability to select indexes is useful for implementing HASH indexes, which scatter keys across N buckets (see Issue cockroachdb#39340). Release note (sql change): The optimizer can now derive constant computed columns during index selection. This enables more efficient HASH indexes.
fc4670c
to
2cfa5e8
Compare
bors r+ |
43450: opt: derive constant computed columns for index selection r=andy-kimball a=andy-kimball The optimizer uses explicitly specified filter constraints to qualify available indexes during the exploration phase. It also uses implicit filter constraints derived from table check constraints. This commit adds new implicit filter constraints based on constant computed columns. Constant computed columns are based on other columns in the table that are constrained to be constant by other filters. For example: CREATE TABLE hashed ( k STRING, hash INT AS (fnv32(k) % 4) STORED, INDEX hash_index (hash, k) ) SELECT * FROM hashed WHERE k = 'andy' Here, the value of the hash column can be computed at query build time, and therefore "hash_index" selected as the lowest cost index. The resulting plan would be: scan hashed@secondary ├── columns: k:1(string!null) hash:2(int) ├── constraint: /2/1/3: [/1/'andy' - /1/'andy'] └── fd: ()-->(1) This improved ability to select indexes is useful for implementing HASH indexes, which scatter keys across N buckets (see Issue #39340). Release note (sql change): The optimizer can now derive constant computed columns during index selection. This enables more efficient HASH indexes. Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com>
Build succeeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @justinj, and @rytaft)
pkg/sql/opt/xform/custom_funcs.go, line 558 at r3 (raw file):
// ID into a constant value, by evaluating it with respect to a set of other // columns that are constant. If the computed column is constant, enter it into // the constCols map and return false. Otherwise, return false.
[nit] I think you meant "enter it into the constCols map and return true"
pkg/sql/opt/xform/testdata/rules/computed, line 102 at r3 (raw file):
└── (k_int = 2) OR (k_int = 3) [type=bool, outer=(1), constraints=(/1: [/2 - /2] [/3 - /3])] # Don't constrain the index for a NULL value.
Why not?
The optimizer can generate constrained scans over indexes on computed columns when columns referenced in the computed column expression are held constant. Consider this example: CREATE TABLE t (a INT, v INT AS (a + 1) STORED, INDEX v_idx (v)) SELECT * FROM t WHERE a = 1 A constrained scan can be generated over `v_idx` because `v` depends on `a` and the query filter holds `a` constant. This commit lifts a restriction that prevented this optimization when columns referenced in the computed column expression were held constant to the `NULL` value. As far as I can tell, this restriction is not necessary. In fact, @rytaft had questioned its purpose originally, but the question was never answered: cockroachdb#43450 (review) By lifting this restriction, the optimizer can explore constrained scans over both indexed computed columns with `IS NULL` expressions and expression indexes with `IS NULL` expressions. Fixes cockroachdb#83390 Release note (performance improvement): The optimizer now explores more efficient query plans when index computed columns and expressions have `IS NULL` expressions.
83619: opt: constrain expression indexes with IS NULL expressions r=mgartner a=mgartner The optimizer can generate constrained scans over indexes on computed columns when columns referenced in the computed column expression are held constant. Consider this example: CREATE TABLE t (a INT, v INT AS (a + 1) STORED, INDEX v_idx (v)) SELECT * FROM t WHERE a = 1 A constrained scan can be generated over `v_idx` because `v` depends on `a` and the query filter holds `a` constant. This commit lifts a restriction that prevented this optimization when columns referenced in the computed column expression were held constant to the `NULL` value. As far as I can tell, this restriction is not necessary. In fact, `@rytaft` had questioned its purpose originally, but the question was never answered: #43450 (review) By lifting this restriction, the optimizer can explore constrained scans over both indexed computed columns with `IS NULL` expressions and expression indexes with `IS NULL` expressions. Fixes #83390 Release note (performance improvement): The optimizer now explores more efficient query plans when index computed columns and expressions have `IS NULL` expressions. 84084: bazel: new versions of prebuilt `c-deps` r=srosenberg a=rickystewart Rebuild these archives to pull in `52a3a0aa8a707f9bb03802186da0c60b715ed9ce` (change to `jemalloc` to build without `MADV_FREE`). Release note: None 84088: ui: fix alignment on custom scale r=maryliag a=maryliag The check for valid options with the removal of some options on #83229 didn't took the custom values into consideration. This commit add the option back, allowing the alignment on custom values. Release note (bug fix): Custom time period selection is now aligning between Metrics and SQL Activity page. 84155: sql/schemachanger/scbuild: minor cleanup r=ajwerner a=ajwerner Improves the error handling a tad to make runtime errors and assertion failure. Fixes a typo. Release note: None Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com> Co-authored-by: Marylia Gutierrez <marylia@cockroachlabs.com> Co-authored-by: Andrew Werner <awerner32@gmail.com>
The optimizer uses explicitly specified filter constraints to qualify
available indexes during the exploration phase. It also uses implicit
filter constraints derived from table check constraints.
This commit adds new implicit filter constraints based on constant
computed columns. Constant computed columns are based on other columns in
the table that are constrained to be constant by other filters. For
example:
CREATE TABLE hashed (
k STRING,
hash INT AS (fnv32(k) % 4) STORED,
INDEX hash_index (hash, k)
)
SELECT * FROM hashed WHERE k = 'andy'
Here, the value of the hash column can be computed at query build time,
and therefore "hash_index" selected as the lowest cost index. The resulting
plan would be:
scan hashed@secondary
├── columns: k:1(string!null) hash:2(int)
├── constraint: /2/1/3: [/1/'andy' - /1/'andy']
└── fd: ()-->(1)
This improved ability to select indexes is useful for implementing HASH
indexes, which scatter keys across N buckets (see Issue #39340).
Release note (sql change): The optimizer can now derive constant computed
columns during index selection. This enables more efficient HASH indexes.