sql: better query planning for OR expressions #2142

petermattis · 2015-08-18T12:48:58Z

Consider a query like SELECT * FROM foo WHERE a > 1 OR b > 2. Even if there are appropriate indexes to satisfy both a > 1 and b > 2, the current query planner will perform a full table or index scan because it can't use both conditions at once. Instead, the query planner should check to see if 2 separate index scans can be used and their results merged using a "union" operation.

This is essentially a generalization of #2140.

The text was updated successfully, but these errors were encountered:

Working towards #2140 and #2142.

Index selection now creates multiple independent spans of the index that must be scanned. Multiple spans are created for expressions like "a IN (1, 2, 3)" and "a = 1 OR a = 2 or a = 3" (these are equivalent). Fixes #2140. See #2142.

petermattis · 2015-09-10T18:54:58Z

We really want UNION before implementing this. I'm going to punt it to 1.0 for now.

spencerkimball · 2017-03-28T18:55:03Z

@RaduBerinde @knz is this still an active issue? Should it be 1.0 if so? If not, please close.

knz · 2017-03-28T19:07:06Z

It's still active. We don't optimize OR. Probably won't make it for 1.0 though. Meta: what about we don't reclassify active but postponed issues blindly but instead ensure first there's a doc issue/PR to mark the thing as known limitation? -- Sent from my Android device with K-9 Mail. Please excuse my brevity.

jseldess · 2017-05-10T02:34:48Z

Document this as known limitation with cockroachdb/docs#1381 (review). @knz, @petermattis, please review my language there and let me know if we need to add a workaround.

knz · 2017-06-07T17:31:40Z

Deassigning myself as I am not working on this before a suitable IR.
The item is now listed on #12418 though.

This commit adds a new exploration rule that can produce better query plans for disjunctions (e.g. a = 1 OR b = 2). The rule transforms some Select + Scan expressions with a disjunction filter into a Union of two Select expressions, each with one side of the disjuction as a filter. This can result in faster query plans in cases where two indexes cover each side of the disjunction. This rule only applies for Scan expressions that contain a strict key. Fixes cockroachdb#2142 Release note (performance improvement): The query optimizer now produces faster query plans for some disjunctions (OR expressions) by utilizing multiple indexes.

46923: ui: Link from statement diagnostics to details page r=dhartunian a=koorosh Resolves: #46559 We have Statements diagnostics history page with list of all requested diagnostics. Before, statement fingerprint were represented as a simple text and now it is a links to statement details page. One notion, that it is possible to have diagnostics for statements which is already cleared. In this case statement is displayed as a text instead of link. Release note (admin ui change): Add links to statement details from Statement Diagnostics history page. Release justification: bug fixes and low-risk updates to new functionality ![out](https://user-images.githubusercontent.com/3106437/78255335-d824e300-74ff-11ea-9203-233ac8ba67fc.gif) 47094: opt: add GenerateUnionSelects exploration rule for disjunction r=mgartner a=mgartner #### opt: add GenerateUnionSelects exploration rule for disjunction This commit adds a new exploration rule that can produce better query plans for disjunctions (e.g. a = 1 OR b = 2). The rule transforms some Select + Scan expressions with a disjunction filter into a Union of two Select expressions, each with one side of the disjuction as a filter. This can result in faster query plans in cases where two indexes cover each side of the disjunction. This rule only applies for Scan expressions that contain a strict key. Fixes #2142 Release note (performance improvement): The query optimizer now produces faster query plans for some disjunctions (OR expressions) by utilizing multiple indexes. #### sql: allow UNION with hidden and non-hidden columns This commit removes an assertion that required corresponding columns on each side of a UNION to be both hidden or non-hidden. Prior to this change, the following statements would yield the error: "UNION types cannot be matched". CREATE TABLE ab (a INT, b INT); SELECT a, b, rowid FROM ab UNION VALUES (1, 2, 3); With this commit, the above statements are executed without error. Release note (bug fix): Fixed an incorrect error the ocurred when executing UNION statements with hidden and non-hidden columns. Co-authored-by: Andrii Vorobiov <[email protected]> Co-authored-by: Marcus Gartner <[email protected]>

andy-kimball · 2020-04-09T22:34:16Z

2nd oldest open issue in the project finally closed!

Bessonov · 2020-04-09T22:50:44Z

@andy-kimball cool! But it depends on the interpretation. The issue mention the use case of SELECT * FROM foo WHERE a > 1 OR b > 2, but the fix is about SELECT * FROM foo WHERE a = 1 OR b = 2 only. Well it's better, but not for the majority of real use cases.

Our use case is the JSONB comparison: 'scope'->'name' @> '[{"type":"column"}]' OR 'scope'->'name2' @> '[{"type":"column"}]'.

andy-kimball · 2020-04-09T23:55:52Z

This change fixes the inequality case as well, as long as the optimizer estimates that there are fewer rows to scan in the two index case than the one index case.

As for the JSON case, can you give a complete example, including schema, some rows of sample data, and full SQL query? We can see if the new code handles that case.

andy-kimball · 2020-04-10T00:00:33Z

Here's an inequality example:

root@:26257/defaultdb> create table t (x int primary key, y int, z int, index(y), index(z));
CREATE TABLE

root@:26257/defaultdb> insert into t select a, a, a from generate_series(1,100000) temp(a);
INSERT 100000

root@:26257/defaultdb> explain (opt) select * from t where y>90000 OR z>95000;
                      text
+----------------------------------------------+
  union
   ├── index-join t
   │    └── scan t@t_y_idx
   │         └── constraint: /2/1: [/90001 - ]
   └── index-join t
        └── scan t@t_z_idx
             └── constraint: /7/5: [/95001 - ]

In this case, the optimizer estimates that the first index scan will return ~10K rows, and then second index scan will return ~5K rows. It decides that the UNION of these two is better than scanning all 100K rows of the primary index, and so uses the new plan enabled by this work.

Bessonov · 2020-04-10T12:37:37Z

Great, I've looked at the description and the tests and couldn't see the inequality case.

It's not a complete query (I can provide the complete query, but it has another problems with query planer: full table scan on left join, where look up should be more cheaper), but I think it's a good starting point and makes a huge difference:

select version();
                                          version                                           
+------------------------------------------------------------------------------------------+
  CockroachDB CCL v19.2.5 (x86_64-unknown-linux-gnu, built 2020/03/16 18:27:12, go1.12.12)


CREATE TABLE t (
	id UUID NOT NULL DEFAULT gen_random_uuid(),
	perms JSONB NOT NULL,
	CONSTRAINT "primary" PRIMARY KEY (id ASC),
	INVERTED INDEX ix_perms (perms)
);

insert into t (perms) select jsonb_build_object('scope', jsonb_build_object('xx' || a::string, jsonb_build_array(jsonb_build_object('type', 'column')))) from generate_series(1,100000) temp(a);

> explain (opt) select * from t where perms->'scope'->'xx2000' @> '[{"type":"column"}]' or perms->'scope'->'xx4000' @> '[{"type":"column"}]';
                                                               text                                                               
+--------------------------------------------------------------------------------------------------------------------------------+
  select                                                                                                                          
   ├── scan t                                                                                                                     
   └── filters                                                                                                                    
        └── (perms @> '{"scope": {"xx2000": [{"type": "column"}]}}') OR (perms @> '{"scope": {"xx4000": [{"type": "column"}]}}')

> select * from t where perms->'scope'->'xx2000' @> '[{"type":"column"}]' or perms->'scope'->'xx4000' @> '[{"type":"column"}]';
                   id                  |                    perms                     
+--------------------------------------+---------------------------------------------+
  6815b87d-4af4-4a58-aee0-de00dc4375f4 | {"scope": {"xx4000": [{"type": "column"}]}}  
  9102f2a6-a26a-4139-b085-7b33948b4652 | {"scope": {"xx2000": [{"type": "column"}]}}  
(2 rows)

Time: 668.315636ms

> select * from t where perms->'scope'->'xx9000' @> '[{"type":"column"}]';
                   id                  |                    perms                     
+--------------------------------------+---------------------------------------------+
  93025901-3c48-4d79-94ba-ee32bf4f0a2a | {"scope": {"xx9000": [{"type": "column"}]}}  
(1 row)

Time: 1.05896ms

This one could be fixed in 20.1 (not tested yet):

> select * from t where perms->'scope'->'xx20000' @> '[{"type":"column"}]' union select * from t where perms->'scope'->'xx41000' @> '[{"type":"col
umn"}]';
pq: unable to encode table key: *tree.DJSON

andy-kimball · 2020-04-10T16:30:36Z

It appears that 20.1 fixes #35260 and #46709, but not #35706.

Also, the newly merged UNION rule (which will go into 20.2 since it missed cutoff for 20.1) does not handle this JSON case you give, because it only splits OR clauses that operate over different columns from the input table. I opened an issue to track the inverted index case: #47340.

petermattis added the SQL label Aug 18, 2015

petermattis added this to the Beta milestone Aug 18, 2015

petermattis self-assigned this Aug 18, 2015

jess-edwards mentioned this issue Aug 20, 2015

Product Roadmap #2132

Closed

78 tasks

petermattis mentioned this issue Aug 27, 2015

Replace qvalueRange and qvalueRangeMap with analyzeExpr. #2285

Merged

petermattis added a commit that referenced this issue Aug 28, 2015

Replace qvalueRange and qvalueRangeMap with analyzeExpr.

cc7731b

Working towards #2140 and #2142.

petermattis mentioned this issue Aug 28, 2015

Add support for scanning multiple spans for a single index. #2293

Merged

petermattis modified the milestones: 1.0, Beta Sep 10, 2015

petermattis removed their assignment Oct 30, 2015

petermattis added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) and removed SQL labels Feb 13, 2016

tamird assigned knz Jun 30, 2016

spencerkimball removed this from the 1.0 milestone Mar 28, 2017

spencerkimball added the docs-known-limitation label Mar 28, 2017

dianasaur323 added this to the 1.1 milestone Apr 20, 2017

jseldess added the docs-done label May 10, 2017

knz added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Jun 7, 2017

knz mentioned this issue Jun 7, 2017

sql: tracking issue, proposed optimizations #12418

Closed

14 tasks

knz removed their assignment Jun 7, 2017

mgartner mentioned this issue Apr 6, 2020

opt: add GenerateUnionSelects exploration rule for disjunction #47094

Merged

craig bot closed this as completed in 06c1077 Apr 9, 2020

andy-kimball mentioned this issue Apr 10, 2020

opt: enable usage of inverted index with OR clauses #47340

Closed

mgartner mentioned this issue Sep 8, 2020

Partial indexes cockroachdb/docs#8242

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: better query planning for OR expressions #2142

sql: better query planning for OR expressions #2142

petermattis commented Aug 18, 2015

petermattis commented Sep 10, 2015

spencerkimball commented Mar 28, 2017

knz commented Mar 28, 2017 via email

jseldess commented May 10, 2017

knz commented Jun 7, 2017

andy-kimball commented Apr 9, 2020

Bessonov commented Apr 9, 2020

andy-kimball commented Apr 9, 2020

andy-kimball commented Apr 10, 2020 •

edited

Loading

Bessonov commented Apr 10, 2020 •

edited

Loading

andy-kimball commented Apr 10, 2020 •

edited

Loading

sql: better query planning for OR expressions #2142

sql: better query planning for OR expressions #2142

Comments

petermattis commented Aug 18, 2015

petermattis commented Sep 10, 2015

spencerkimball commented Mar 28, 2017

knz commented Mar 28, 2017 via email

jseldess commented May 10, 2017

knz commented Jun 7, 2017

andy-kimball commented Apr 9, 2020

Bessonov commented Apr 9, 2020

andy-kimball commented Apr 9, 2020

andy-kimball commented Apr 10, 2020 • edited Loading

Bessonov commented Apr 10, 2020 • edited Loading

andy-kimball commented Apr 10, 2020 • edited Loading

andy-kimball commented Apr 10, 2020 •

edited

Loading

Bessonov commented Apr 10, 2020 •

edited

Loading

andy-kimball commented Apr 10, 2020 •

edited

Loading