Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: better query planning for OR expressions #2142

Closed
petermattis opened this issue Aug 18, 2015 · 13 comments · Fixed by #47094
Closed

sql: better query planning for OR expressions #2142

petermattis opened this issue Aug 18, 2015 · 13 comments · Fixed by #47094
Assignees
Labels
A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. docs-done docs-known-limitation E-starter Might be suitable for a starter project for new employees or team members. good first issue

Comments

@petermattis
Copy link
Collaborator

Consider a query like SELECT * FROM foo WHERE a > 1 OR b > 2. Even if there are appropriate indexes to satisfy both a > 1 and b > 2, the current query planner will perform a full table or index scan because it can't use both conditions at once. Instead, the query planner should check to see if 2 separate index scans can be used and their results merged using a "union" operation.

This is essentially a generalization of #2140.

@petermattis petermattis added this to the Beta milestone Aug 18, 2015
@petermattis petermattis self-assigned this Aug 18, 2015
@jess-edwards jess-edwards mentioned this issue Aug 20, 2015
78 tasks
petermattis added a commit that referenced this issue Aug 28, 2015
petermattis added a commit that referenced this issue Aug 28, 2015
Index selection now creates multiple independent spans of the index that
must be scanned. Multiple spans are created for expressions like "a
IN (1, 2, 3)" and "a = 1 OR a = 2 or a = 3" (these are equivalent).

Fixes #2140.
See #2142.
@petermattis
Copy link
Collaborator Author

We really want UNION before implementing this. I'm going to punt it to 1.0 for now.

@petermattis petermattis modified the milestones: 1.0, Beta Sep 10, 2015
@petermattis petermattis removed their assignment Oct 30, 2015
@petermattis petermattis added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) and removed SQL labels Feb 13, 2016
@spencerkimball
Copy link
Member

@RaduBerinde @knz is this still an active issue? Should it be 1.0 if so? If not, please close.

@knz
Copy link
Contributor

knz commented Mar 28, 2017 via email

@spencerkimball spencerkimball removed this from the 1.0 milestone Mar 28, 2017
@dianasaur323 dianasaur323 added this to the 1.1 milestone Apr 20, 2017
@jseldess
Copy link
Contributor

Document this as known limitation with cockroachdb/docs#1381 (review). @knz, @petermattis, please review my language there and let me know if we need to add a workaround.

@knz knz added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Jun 7, 2017
@knz
Copy link
Contributor

knz commented Jun 7, 2017

Deassigning myself as I am not working on this before a suitable IR.
The item is now listed on #12418 though.

@knz knz removed their assignment Jun 7, 2017
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 2, 2020
This commit adds a new exploration rule that can produce better query
plans for disjunctions (e.g. a = 1 OR b = 2). The rule transforms some
Select + Scan expressions with a disjunction filter into a Union of two
Select expressions, each with one side of the disjuction as a filter.
This can result in faster query plans in cases where two indexes cover
each side of the disjunction.

This rule only applies for Scan expressions that contain a strict key.

Fixes cockroachdb#2142

Release note (performance improvement): The query optimizer now produces
faster query plans for some disjunctions (OR expressions) by utilizing
multiple indexes.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 4, 2020
This commit adds a new exploration rule that can produce better query
plans for disjunctions (e.g. a = 1 OR b = 2). The rule transforms some
Select + Scan expressions with a disjunction filter into a Union of two
Select expressions, each with one side of the disjuction as a filter.
This can result in faster query plans in cases where two indexes cover
each side of the disjunction.

This rule only applies for Scan expressions that contain a strict key.

Fixes cockroachdb#2142

Release note (performance improvement): The query optimizer now produces
faster query plans for some disjunctions (OR expressions) by utilizing
multiple indexes.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 7, 2020
This commit adds a new exploration rule that can produce better query
plans for disjunctions (e.g. a = 1 OR b = 2). The rule transforms some
Select + Scan expressions with a disjunction filter into a Union of two
Select expressions, each with one side of the disjuction as a filter.
This can result in faster query plans in cases where two indexes cover
each side of the disjunction.

This rule only applies for Scan expressions that contain a strict key.

Fixes cockroachdb#2142

Release note (performance improvement): The query optimizer now produces
faster query plans for some disjunctions (OR expressions) by utilizing
multiple indexes.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 8, 2020
This commit adds a new exploration rule that can produce better query
plans for disjunctions (e.g. a = 1 OR b = 2). The rule transforms some
Select + Scan expressions with a disjunction filter into a Union of two
Select expressions, each with one side of the disjuction as a filter.
This can result in faster query plans in cases where two indexes cover
each side of the disjunction.

This rule only applies for Scan expressions that contain a strict key.

Fixes cockroachdb#2142

Release note (performance improvement): The query optimizer now produces
faster query plans for some disjunctions (OR expressions) by utilizing
multiple indexes.
craig bot pushed a commit that referenced this issue Apr 9, 2020
46923: ui: Link from statement diagnostics to details page r=dhartunian a=koorosh

Resolves: #46559

We have Statements diagnostics history page with list
of all requested diagnostics.
Before, statement fingerprint were represented as a
simple text and now it is a links to statement details
page.

One notion, that it is possible to have diagnostics for
statements which is already cleared. In this case
statement is displayed as a text instead of link.

Release note (admin ui change): Add links to statement
details from Statement Diagnostics history page.

Release justification: bug fixes and low-risk updates to new functionality

![out](https://user-images.githubusercontent.com/3106437/78255335-d824e300-74ff-11ea-9203-233ac8ba67fc.gif)


47094: opt: add GenerateUnionSelects exploration rule for disjunction r=mgartner a=mgartner

#### opt: add GenerateUnionSelects exploration rule for disjunction

This commit adds a new exploration rule that can produce better query
plans for disjunctions (e.g. a = 1 OR b = 2). The rule transforms some
Select + Scan expressions with a disjunction filter into a Union of two
Select expressions, each with one side of the disjuction as a filter.
This can result in faster query plans in cases where two indexes cover
each side of the disjunction.

This rule only applies for Scan expressions that contain a strict key.

Fixes #2142

Release note (performance improvement): The query optimizer now produces
faster query plans for some disjunctions (OR expressions) by utilizing
multiple indexes.

#### sql: allow UNION with hidden and non-hidden columns

This commit removes an assertion that required corresponding columns on
each side of a UNION to be both hidden or non-hidden.

Prior to this change, the following statements would yield the error:
"UNION types cannot be matched".

  CREATE TABLE ab (a INT, b INT);
  SELECT a, b, rowid FROM ab UNION VALUES (1, 2, 3);

With this commit, the above statements are executed without error.

Release note (bug fix): Fixed an incorrect error the ocurred when
executing UNION statements with hidden and non-hidden columns.


Co-authored-by: Andrii Vorobiov <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
@craig craig bot closed this as completed in 06c1077 Apr 9, 2020
@andy-kimball
Copy link
Contributor

2nd oldest open issue in the project finally closed!

@Bessonov
Copy link

Bessonov commented Apr 9, 2020

@andy-kimball cool! But it depends on the interpretation. The issue mention the use case of SELECT * FROM foo WHERE a > 1 OR b > 2, but the fix is about SELECT * FROM foo WHERE a = 1 OR b = 2 only. Well it's better, but not for the majority of real use cases.

Our use case is the JSONB comparison: 'scope'->'name' @> '[{"type":"column"}]' OR 'scope'->'name2' @> '[{"type":"column"}]'.

@andy-kimball
Copy link
Contributor

This change fixes the inequality case as well, as long as the optimizer estimates that there are fewer rows to scan in the two index case than the one index case.

As for the JSON case, can you give a complete example, including schema, some rows of sample data, and full SQL query? We can see if the new code handles that case.

@andy-kimball
Copy link
Contributor

andy-kimball commented Apr 10, 2020

Here's an inequality example:

root@:26257/defaultdb> create table t (x int primary key, y int, z int, index(y), index(z));
CREATE TABLE

root@:26257/defaultdb> insert into t select a, a, a from generate_series(1,100000) temp(a);
INSERT 100000

root@:26257/defaultdb> explain (opt) select * from t where y>90000 OR z>95000;
                      text
+----------------------------------------------+
  union
   ├── index-join t
   │    └── scan t@t_y_idx
   │         └── constraint: /2/1: [/90001 - ]
   └── index-join t
        └── scan t@t_z_idx
             └── constraint: /7/5: [/95001 - ]

In this case, the optimizer estimates that the first index scan will return ~10K rows, and then second index scan will return ~5K rows. It decides that the UNION of these two is better than scanning all 100K rows of the primary index, and so uses the new plan enabled by this work.

@Bessonov
Copy link

Bessonov commented Apr 10, 2020

Great, I've looked at the description and the tests and couldn't see the inequality case.

It's not a complete query (I can provide the complete query, but it has another problems with query planer: full table scan on left join, where look up should be more cheaper), but I think it's a good starting point and makes a huge difference:

select version();
                                          version                                           
+------------------------------------------------------------------------------------------+
  CockroachDB CCL v19.2.5 (x86_64-unknown-linux-gnu, built 2020/03/16 18:27:12, go1.12.12)


CREATE TABLE t (
	id UUID NOT NULL DEFAULT gen_random_uuid(),
	perms JSONB NOT NULL,
	CONSTRAINT "primary" PRIMARY KEY (id ASC),
	INVERTED INDEX ix_perms (perms)
);

insert into t (perms) select jsonb_build_object('scope', jsonb_build_object('xx' || a::string, jsonb_build_array(jsonb_build_object('type', 'column')))) from generate_series(1,100000) temp(a);

> explain (opt) select * from t where perms->'scope'->'xx2000' @> '[{"type":"column"}]' or perms->'scope'->'xx4000' @> '[{"type":"column"}]';
                                                               text                                                               
+--------------------------------------------------------------------------------------------------------------------------------+
  select                                                                                                                          
   ├── scan t                                                                                                                     
   └── filters                                                                                                                    
        └── (perms @> '{"scope": {"xx2000": [{"type": "column"}]}}') OR (perms @> '{"scope": {"xx4000": [{"type": "column"}]}}')

> select * from t where perms->'scope'->'xx2000' @> '[{"type":"column"}]' or perms->'scope'->'xx4000' @> '[{"type":"column"}]';
                   id                  |                    perms                     
+--------------------------------------+---------------------------------------------+
  6815b87d-4af4-4a58-aee0-de00dc4375f4 | {"scope": {"xx4000": [{"type": "column"}]}}  
  9102f2a6-a26a-4139-b085-7b33948b4652 | {"scope": {"xx2000": [{"type": "column"}]}}  
(2 rows)

Time: 668.315636ms

> select * from t where perms->'scope'->'xx9000' @> '[{"type":"column"}]';
                   id                  |                    perms                     
+--------------------------------------+---------------------------------------------+
  93025901-3c48-4d79-94ba-ee32bf4f0a2a | {"scope": {"xx9000": [{"type": "column"}]}}  
(1 row)

Time: 1.05896ms

This one could be fixed in 20.1 (not tested yet):

> select * from t where perms->'scope'->'xx20000' @> '[{"type":"column"}]' union select * from t where perms->'scope'->'xx41000' @> '[{"type":"col
umn"}]';
pq: unable to encode table key: *tree.DJSON

@andy-kimball
Copy link
Contributor

andy-kimball commented Apr 10, 2020

It appears that 20.1 fixes #35260 and #46709, but not #35706.

Also, the newly merged UNION rule (which will go into 20.2 since it missed cutoff for 20.1) does not handle this JSON case you give, because it only splits OR clauses that operate over different columns from the input table. I opened an issue to track the inverted index case: #47340.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. docs-done docs-known-limitation E-starter Might be suitable for a starter project for new employees or team members. good first issue
Projects
None yet
9 participants