-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: enable usage of inverted index with OR clauses #47340
Comments
Do you know what the SQL query would be that takes advantage of this feature on inverted indexes? Would it be:
|
Initially, this feature would support using inverted indexes when there is an explicit SELECT * FROM t WHERE t.int_arr @> ARRAY[1] OR t.int_arr @> ARRAY[2]
Ideally the new "InvertedExpr" (work in progress) can handle an arbitrary number of disjunctions, which should make this straight forward. If necessary, we could normalize |
This commit adds support for planning and executing an inverted index scan + invertedFilter when there is a contains (@>) predicate on a JSON or array column that has an inverted index. Similar to spatial columns, the optimizer may be able to plan an inverted index scan + invertedFilter for JSON and Array columns even if the filter condition is complex, with multiple predicates combined with AND and OR expressions. Informs cockroachdb#47340 Release note (performance improvement): The optimizer now supports using an inverted index on JSON or Array columns for a wider variety of filter predicates. Previously, inverted index usage was only supported for simple predicates (e.g., `WHERE a @> '{"foo": "bar"}'), but now more complicated predicates are supported by combining simple contains (@>) expressions with AND and OR (e.g., `WHERE a @> '{"foo": "bar"}' OR a @> '{"foo": "baz"}'`). An inverted index will be used if it is available on the filtered column and the optimizer expects it to be more efficient than any alternative plan. This may result in performance improvements for queries involving JSON and Array columns.
This commit adds support for planning and executing an inverted index scan + invertedFilter when there is a contains (@>) predicate on a JSON or array column that has an inverted index. Similar to spatial columns, the optimizer may be able to plan an inverted index scan + invertedFilter for JSON and Array columns even if the filter condition is complex, with multiple predicates combined with AND and OR expressions. Informs cockroachdb#47340 Release note (performance improvement): The optimizer now supports using an inverted index on JSON or Array columns for a wider variety of filter predicates. Previously, inverted index usage was only supported for simple predicates (e.g., `WHERE a @> '{"foo": "bar"}'), but now more complicated predicates are supported by combining simple contains (@>) expressions with AND and OR (e.g., `WHERE a @> '{"foo": "bar"}' OR a @> '{"foo": "baz"}'`). An inverted index will be used if it is available on the filtered column and the optimizer expects it to be more efficient than any alternative plan. This may result in performance improvements for queries involving JSON and Array columns.
This commit adds support for planning and executing an inverted index scan + invertedFilter when there is a contains (@>) predicate on a JSON or array column that has an inverted index. Similar to spatial columns, the optimizer may be able to plan an inverted index scan + invertedFilter for JSON and Array columns even if the filter condition is complex, with multiple predicates combined with AND and OR expressions. Informs cockroachdb#47340 Release note (performance improvement): The optimizer now supports using an inverted index on JSON or Array columns for a wider variety of filter predicates. Previously, inverted index usage was only supported for simple predicates (e.g., `WHERE a @> '{"foo": "bar"}'), but now more complicated predicates are supported by combining simple contains (@>) expressions with AND and OR (e.g., `WHERE a @> '{"foo": "bar"}' OR a @> '{"foo": "baz"}'`). An inverted index will be used if it is available on the filtered column and the optimizer expects it to be more efficient than any alternative plan. This may result in performance improvements for queries involving JSON and Array columns.
This commit adds support for planning and executing an inverted index scan + invertedFilter when there is a contains (@>) predicate on a JSON or array column that has an inverted index. Similar to spatial columns, the optimizer may be able to plan an inverted index scan + invertedFilter for JSON and Array columns even if the filter condition is complex, with multiple predicates combined with AND and OR expressions. Informs cockroachdb#47340 Release note (performance improvement): The optimizer now supports using an inverted index on JSON or Array columns for a wider variety of filter predicates. Previously, inverted index usage was only supported for simple predicates (e.g., `WHERE a @> '{"foo": "bar"}'), but now more complicated predicates are supported by combining simple contains (@>) expressions with AND and OR (e.g., `WHERE a @> '{"foo": "bar"}' OR a @> '{"foo": "baz"}'`). An inverted index will be used if it is available on the filtered column and the optimizer expects it to be more efficient than any alternative plan. This may result in performance improvements for queries involving JSON and Array columns.
This commit adds support for planning and executing an inverted index scan + invertedFilter when there is a contains (@>) predicate on a JSON or array column that has an inverted index. Similar to spatial columns, the optimizer may be able to plan an inverted index scan + invertedFilter for JSON and Array columns even if the filter condition is complex, with multiple predicates combined with AND and OR expressions. Informs cockroachdb#47340 Release note (performance improvement): The optimizer now supports using an inverted index on JSON or Array columns for a wider variety of filter predicates. Previously, inverted index usage was only supported for simple predicates (e.g., `WHERE a @> '{"foo": "bar"}'), but now more complicated predicates are supported by combining simple contains (@>) expressions with AND and OR (e.g., `WHERE a @> '{"foo": "bar"}' OR a @> '{"foo": "baz"}'`). An inverted index will be used if it is available on the filtered column and the optimizer expects it to be more efficient than any alternative plan. This may result in performance improvements for queries involving JSON and Array columns.
This commit adds support for planning and executing an inverted index scan + invertedFilter when there is a contains (@>) predicate on a JSON or array column that has an inverted index. Similar to spatial columns, the optimizer may be able to plan an inverted index scan + invertedFilter for JSON and Array columns even if the filter condition is complex, with multiple predicates combined with AND and OR expressions. Informs cockroachdb#47340 Release note (performance improvement): The optimizer now supports using an inverted index on JSON or Array columns for a wider variety of filter predicates. Previously, inverted index usage was only supported for simple predicates (e.g., `WHERE a @> '{"foo": "bar"}'), but now more complicated predicates are supported by combining simple contains (@>) expressions with AND and OR (e.g., `WHERE a @> '{"foo": "bar"}' OR a @> '{"foo": "baz"}'`). An inverted index will be used if it is available on the filtered column and the optimizer expects it to be more efficient than any alternative plan. This may result in performance improvements for queries involving JSON and Array columns.
56732: sql,util,opt: add support for inverted filter with JSON and Array columns r=rytaft a=rytaft **sql,util: return `unique` boolean from `EncodeContainingInvertedIndexSpans`** This commit adds a result parameter for `EncodeContainingInvertedIndexSpans` called `unique`. The function returns `unique=true` if the spans are guaranteed not to produce duplicate primary keys. Otherwise, returns `unique=false`. The logic is based on the fact that spans for a given JSON or Array datum will not produce duplicate primary keys if there is exactly one JSON or Array path ending in a scalar (i.e., not an empty object or array). **sql,opt: add support for inverted filter with JSON and Array columns** This commit adds support for planning and executing an inverted index scan + invertedFilter when there is a contains (`@>`) predicate on a JSON or array column that has an inverted index. Similar to spatial columns, the optimizer may be able to plan an inverted index scan + invertedFilter for JSON and Array columns even if the filter condition is complex, with multiple predicates combined with AND and OR expressions. Informs #47340 Release note (performance improvement): The optimizer now supports using an inverted index on JSON or Array columns for a wider variety of filter predicates. Previously, inverted index usage was only supported for simple predicates (e.g., `WHERE a @> '{"foo": "bar"}'`), but now more complicated predicates are supported by combining simple contains (@>) expressions with AND and OR (e.g., `WHERE a @> '{"foo": "bar"}' OR a @> '{"foo": "baz"}'`). An inverted index will be used if it is available on the filtered column and the optimizer expects it to be more efficient than any alternative plan. This may result in performance improvements for queries involving JSON and Array columns. 57050: storage: delete mvccGetInternal, use mvccGet directly r=nvanbenschoten a=nvanbenschoten This commit deletes a major chunk of legacy code in MVCC mutation code. `mvccPutInternal` was avoiding using mvccGet because this used to result in a cgo call. This led to a large amount of duplicate logic. Now that `mvccGet` calls into the `pebbleIterator` in Go, using it is much less expensive, so we can remove the re-implementation. With a bit of tuning in follow-on commits, this results in a negligible performance hit which is well worth the elimination of duplicate code: ``` name old time/op new time/op delta MVCCConditionalPut_Pebble/Create/valueSize=10-16 7.33µs ± 8% 7.04µs ± 7% ~ (p=0.075 n=10+10) MVCCConditionalPut_Pebble/Create/valueSize=100-16 7.99µs ± 8% 7.89µs ± 8% ~ (p=0.631 n=10+10) MVCCConditionalPut_Pebble/Create/valueSize=1000-16 8.51µs ± 7% 8.47µs ± 4% ~ (p=0.529 n=10+10) MVCCConditionalPut_Pebble/Create/valueSize=10000-16 23.7µs ± 6% 23.8µs ± 5% ~ (p=0.897 n=10+8) MVCCConditionalPut_Pebble/Replace/valueSize=10-16 12.0µs ± 6% 12.0µs ± 7% ~ (p=0.704 n=10+9) MVCCConditionalPut_Pebble/Replace/valueSize=100-16 13.2µs ± 7% 13.0µs ± 9% ~ (p=0.604 n=9+10) MVCCConditionalPut_Pebble/Replace/valueSize=10000-16 28.8µs ± 7% 30.2µs ± 3% +4.91% (p=0.008 n=10+9) MVCCConditionalPut_Pebble/Replace/valueSize=1000-16 11.5µs ± 6% 12.1µs ± 6% +5.22% (p=0.007 n=10+10) name old speed new speed delta MVCCConditionalPut_Pebble/Create/valueSize=10-16 1.37MB/s ± 8% 1.42MB/s ± 8% ~ (p=0.085 n=10+10) MVCCConditionalPut_Pebble/Create/valueSize=100-16 12.5MB/s ± 7% 12.7MB/s ± 9% ~ (p=0.645 n=10+10) MVCCConditionalPut_Pebble/Create/valueSize=1000-16 118MB/s ± 8% 118MB/s ± 4% ~ (p=0.529 n=10+10) MVCCConditionalPut_Pebble/Create/valueSize=10000-16 423MB/s ± 6% 421MB/s ± 5% ~ (p=0.897 n=10+8) MVCCConditionalPut_Pebble/Replace/valueSize=10-16 835kB/s ± 5% 833kB/s ± 7% ~ (p=0.734 n=10+9) MVCCConditionalPut_Pebble/Replace/valueSize=100-16 7.56MB/s ± 7% 7.69MB/s ±10% ~ (p=0.619 n=9+10) MVCCConditionalPut_Pebble/Replace/valueSize=10000-16 347MB/s ± 7% 331MB/s ± 3% -4.80% (p=0.008 n=10+9) MVCCConditionalPut_Pebble/Replace/valueSize=1000-16 87.1MB/s ± 6% 82.8MB/s ± 6% -4.96% (p=0.007 n=10+10) name old alloc/op new alloc/op delta MVCCConditionalPut_Pebble/Create/valueSize=10-16 366B ± 3% 366B ± 2% ~ (p=1.000 n=10+10) MVCCConditionalPut_Pebble/Create/valueSize=100-16 909B ± 2% 905B ± 2% ~ (p=0.492 n=10+10) MVCCConditionalPut_Pebble/Create/valueSize=1000-16 6.42kB ± 2% 6.46kB ± 1% ~ (p=0.197 n=10+10) MVCCConditionalPut_Pebble/Create/valueSize=10000-16 79.8kB ± 8% 77.7kB ±10% ~ (p=0.190 n=10+10) MVCCConditionalPut_Pebble/Replace/valueSize=1000-16 8.37kB ± 6% 9.05kB ± 4% +8.20% (p=0.000 n=10+9) MVCCConditionalPut_Pebble/Replace/valueSize=10-16 390B ± 3% 435B ± 4% +11.41% (p=0.000 n=10+10) MVCCConditionalPut_Pebble/Replace/valueSize=10000-16 72.0kB ± 7% 81.0kB ± 3% +12.38% (p=0.000 n=9+10) MVCCConditionalPut_Pebble/Replace/valueSize=100-16 1.02kB ± 3% 1.16kB ± 2% +14.05% (p=0.000 n=10+10) name old allocs/op new allocs/op delta MVCCConditionalPut_Pebble/Create/valueSize=10-16 2.00 ± 0% 2.00 ± 0% ~ (all equal) MVCCConditionalPut_Pebble/Create/valueSize=100-16 2.00 ± 0% 2.00 ± 0% ~ (all equal) MVCCConditionalPut_Pebble/Create/valueSize=1000-16 2.00 ± 0% 2.00 ± 0% ~ (all equal) MVCCConditionalPut_Pebble/Create/valueSize=10000-16 3.00 ± 0% 3.00 ± 0% ~ (all equal) MVCCConditionalPut_Pebble/Replace/valueSize=10-16 4.00 ± 0% 4.00 ± 0% ~ (all equal) MVCCConditionalPut_Pebble/Replace/valueSize=100-16 4.00 ± 0% 4.00 ± 0% ~ (all equal) MVCCConditionalPut_Pebble/Replace/valueSize=1000-16 4.00 ± 0% 4.00 ± 0% ~ (all equal) MVCCConditionalPut_Pebble/Replace/valueSize=10000-16 6.00 ± 0% 6.00 ± 0% ~ (all equal) ``` If we want to continue pulling on this – 2 of the 4 remaining heap allocations here are due to the key copy in [`pebble.Iterator.SeekPrefixGE`](https://github.com/cockroachdb/pebble/blob/2da62788eedefce8dfbeef56a2324603606e18fb/iterator.go#L386). Caching this buffer across calls to `SeekPrefixGE` seems trivial. Caching the buffer across other method calls seems trickier due to the way we use the existence of the slice to indicate "prefix iteration". Maybe someone closer to this code will have good ideas about what to do about this. Co-authored-by: Rebecca Taft <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>
GenerateInvertedIndexScans now generates InvertedConstraints from equality expressions with a JSON FetchVal operator on the left side and a constant on the right side, like `j->'a' = '1'`. Previously Constraints were built for these expressions with the `idxconstraint` package. By building InvertedConstraints from FetchVal operators, the optimizer now has the ability to generate inverted index scans with conjunctive and disjunctive filters containing FetchVal operators. For example, the optimizer will plan an inverted index scan with the query below. CREATE TABLE t (k INT PRIMARY KEY, j JSON, INVERTED INDEX (j)) SELECT k FROM t WHERE j->'a' = '1' AND j->'b' = 2 This change also brings us a step closer to cleaning up the `idxconstraint` by removing code related to inverted indexes. Informs cockroachdb#47340 Release note (performance improvement): The query optimizer now plans scans over inverted indexes on JSON columns for query filters that constrain the JSON column with equality and fetch value operators (`->`) inside conjunctions and disjunctions, like `j->'a' = '1' AND j->'b' = '2'`.
GenerateInvertedIndexScans now generates InvertedConstraints from equality expressions with a JSON FetchVal operator on the left side and a constant on the right side, like `j->'a' = '1'`. Previously Constraints were built for these expressions with the `idxconstraint` package. By building InvertedConstraints from FetchVal operators, the optimizer now has the ability to generate inverted index scans with conjunctive and disjunctive filters containing FetchVal operators. For example, the optimizer will plan an inverted index scan with the query below. CREATE TABLE t (k INT PRIMARY KEY, j JSON, INVERTED INDEX (j)) SELECT k FROM t WHERE j->'a' = '1' AND j->'b' = 2 This change also brings us a step closer to cleaning up the `idxconstraint` by removing code related to inverted indexes. Informs cockroachdb#47340 Release note (performance improvement): The query optimizer now plans scans over inverted indexes on JSON columns for query filters that constrain the JSON column with equality and fetch value operators (`->`) inside conjunctions and disjunctions, like `j->'a' = '1' AND j->'b' = '2'`.
59266: opt: build scan InvertedConstraints from JSON FetchVal operator r=mgartner a=mgartner #### opt: rename inverted constraint contains extraction function The function name now reflects that it specifically deals with the Contains expression, `@>`. Release note: None #### opt: add rule expectations for GenerateInvertedIndexScan tests Release note: None #### invertedidx: add convenience function for extracting index column variable Release note: None #### opt: build inverted index scan constraints from JSON fetch val operator GenerateInvertedIndexScans now generates InvertedConstraints from equality expressions with a JSON FetchVal operator on the left side and a constant on the right side, like `j->'a' = '1'`. Previously Constraints were built for these expressions with the `idxconstraint` package. By building InvertedConstraints from FetchVal operators, the optimizer now has the ability to generate inverted index scans with conjunctive and disjunctive filters containing FetchVal operators. For example, the optimizer will plan an inverted index scan with the query below. CREATE TABLE t (k INT PRIMARY KEY, j JSON, INVERTED INDEX (j)) SELECT k FROM t WHERE j->'a' = '1' AND j->'b' = 2 This change also brings us a step closer to cleaning up the `idxconstraint` by removing code related to inverted indexes. Informs #47340 Release note (performance improvement): The query optimizer now plans scans over inverted indexes on JSON columns for query filters that constrain the JSON column with equality and fetch value operators (`->`) inside conjunctions and disjunctions, like `j->'a' = '1' AND j->'b' = '2'`. #### opt: build InvertedConstraints for inverted index scans on computed columns GenerateInvertedIndexScans now builds InvertedConstraints over inverted indexes on computed columns when a filter constrains an expression that is equivalent to the computed column expression. Previously Constraints were built for these expressions with the `idxconstraint` package. This change provides no functional change, but brings us a step closer to cleaning up the `idxconstraint` by removing code related to inverted indexes. Release note: None #### opt: do not build Constraints in GenerateInvertedIndexScans InvertedConstraints can be built for inverted index scans in all cases that Constraints could be built. Therefore there is no longer a need to build Constraints in GenerateInvertedIndexScans. This commit removes the code related to this. Release note: None 59408: *:upgrade vendored pgconn to v1.8.0 r=gaoxk a=gaoxk The old version of pgconn asynchronously cancelled requests to the server. However, .connect synchronously closes connections in responses to errors. This causes race conditions for goroutine leaks. The newer pgconn synchronously closes connections to the server, ideally eliminating this race condition. See jackc/pgconn@340bfec Release note (general change): Upgrade to v1.8.0 `pgconn` to patch potential race condition errors. Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Kristy Gao <[email protected]>
There are a handful of improvement to inverted indexes that will be released in 21.1, including index accelerating disjunctions on the same inverted-indexed column, e.g. However, we do not currently index accelerate expressions of the form |
This is a new issue derived from comments made by @Bessonov on issue #2142.
EXPECTED: The
ix_perms
index should be used.ACTUAL: The primary index is used instead.
We could fix this in one of two ways:
The text was updated successfully, but these errors were encountered: