Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

constant is not propagated into subquery. #15082

Closed
coocood opened this issue Mar 3, 2020 · 11 comments · Fixed by #46544
Closed

constant is not propagated into subquery. #15082

coocood opened this issue Mar 3, 2020 · 11 comments · Fixed by #46544
Labels
challenge-program help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. sig/planner SIG: Planner type/feature-request Categorizes issue or PR as related to a new feature.

Comments

@coocood
Copy link
Member

coocood commented Mar 3, 2020

Description

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    If possible, provide a recipe for reproducing the error.
create table t (a int, b int, c int, primary key(a, b));
insert t values (1, 1, 1), (2, 2, 2), (3, 3, 3);
create table s (a int, b int, c int, primary key(a, b));
insert s values (1, 1, 1), (2, 2, 2), (3, 3, 3);
explain select t.a, t.b, (select sum(c) from s where s.a = t.a) from t where t.a = 2;
  1. What did you expect to see?
    The explain result should be the same as
explain select t.a, t.b, (select sum(c) from s where s.a = 2) from t where t.a = 2;

+---------------------------+-------+-----------+------------------------------------------------------------------+
| id                        | count | task      | operator info                                                    |
+---------------------------+-------+-----------+------------------------------------------------------------------+
| Projection_41             | 0.00  | root      | test.t.a, test.t.b, 2->Column#20                                 |
| └─IndexReader_43          | 0.00  | root      | index:IndexRangeScan_42                                          |
|   └─IndexRangeScan_42     | 0.00  | cop[tikv] | table:t, index:a, b, range:[2,2], keep order:false, stats:pseudo |
+---------------------------+-------+-----------+------------------------------------------------------------------+
  1. What did you see instead?
| StreamAgg_15                           | 1.00  | root      | group by:Column#26, Column#27, funcs:firstrow(Column#23)->test.t.a, funcs:firstrow(Column#24)->test.t.b, funcs:sum(Column#25)->Column#9 |
| └─Projection_56                        | 0.00  | root      | test.t.a, test.t.b, cast(test.s.c, decimal(65,0) BINARY)->Column#25, test.t.a, test.t.b                                                 |
|   └─IndexMergeJoin_51                  | 0.00  | root      | left outer join, inner:Projection_49, outer key:test.t.a, inner key:test.s.a                                                            |
|     ├─IndexReader_55(Build)            | 0.00  | root      | index:IndexRangeScan_54                                                                                                                 |
|     │ └─IndexRangeScan_54              | 0.00  | cop[tikv] | table:t, index:a, b, range:[2,2], keep order:true, stats:pseudo                                                                         |
|     └─Projection_49(Probe)             | 1.25  | root      | test.s.a, test.s.c                                                                                                                      |
|       └─IndexLookUp_48                 | 1.25  | root      |                                                                                                                                         |
|         ├─IndexRangeScan_46(Build)     | 1.25  | cop[tikv] | table:s, index:a, b, range: decided by [eq(test.s.a, test.t.a)], keep order:true, stats:pseudo                                          |
|         └─TableRowIDScan_47(Probe)     | 1.25  | cop[tikv] | table:s, keep order:false, stats:pseudo
  1. What version of TiDB are you using (tidb-server -V or run select tidb_version(); on TiDB)?

SIG slack channel

#sig-planner

Score

  • 300

Mentor

@coocood coocood added type/bug The issue is confirmed as a bug. sig/planner SIG: Planner labels Mar 3, 2020
@eurekaka
Copy link
Contributor

eurekaka commented Mar 3, 2020

The reason is that we don't have optimization rule for predicate pull-up now. Constant propagation is imposed during the process of predicate pushdown, while for this query, t.a = 2 is built below the Apply operator from the scratch. After changing the query to be like:

explain select tmp.a, tmp.b, tmp.sum from (select t.a as a, t.b as b, (select sum(c) from s where s.a = t.a) as sum from t) tmp where tmp.a = 2;

then the plan would be:

+----------------------------------+---------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------+
| id                               | estRows | task      | operator info                                                                                                                           |
+----------------------------------+---------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------+
| HashAgg_14                       | 1.00    | root      | group by:Column#15, Column#16, funcs:firstrow(Column#12)->test.t.a, funcs:firstrow(Column#13)->test.t.b, funcs:sum(Column#14)->Column#9 |
| └─Projection_27                  | 0.00    | root      | test.t.a, test.t.b, cast(test.s.c, decimal(65,0) BINARY)->Column#14, test.t.a, test.t.b                                                 |
|   └─HashLeftJoin_17              | 0.00    | root      | CARTESIAN left outer join, inner:TableReader_23                                                                                         |
|     ├─TableReader_23(Build)      | 0.00    | root      | data:Selection_22                                                                                                                       |
|     │ └─Selection_22             | 0.00    | cop[tikv] | eq(2, test.s.a)                                                                                                                         |
|     │   └─TableFullScan_21       | 3.00    | cop[tikv] | table:s, keep order:false, stats:pseudo                                                                                                 |
|     └─IndexReader_20(Probe)      | 0.00    | root      | index:IndexRangeScan_19                                                                                                                 |
|       └─IndexRangeScan_19        | 0.00    | cop[tikv] | table:t, index:a, b, range:[2,2], keep order:false, stats:pseudo                                                                        |
+----------------------------------+---------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------+

it is still not as optimized as what you expect though, that is because the evaluation of uncorrelated subquery is in plan building phase, and predicate pushdown is not applied at that time.

It is pretty hard to optimize this query in current planner framework, but Cascades is good at solving this kind of cases. @francis0407 PTAL.

@francis0407
Copy link
Member

One solution is to support pushing predicates which contains the correlated columns into the inner side of Apply.

@lzmhhh123 lzmhhh123 added challenge-program help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Oct 30, 2020
@XuHuaiyu XuHuaiyu added type/feature-request Categorizes issue or PR as related to a new feature. and removed severity/moderate type/bug The issue is confirmed as a bug. labels Dec 16, 2020
@liusf1993
Copy link

Is there a fix schedule for the bug? There are some performane issue in my project becuase of this case (currently is in test for tidb), I'll have to modify my business code if too long.

@chrysan
Copy link
Contributor

chrysan commented Oct 27, 2022

A similar case:

create table pt (a int, b int) partition by range (a) (partition p0 values less than (100), partition p1 values less than (200));
create table t1 (a int, b int);
set tidb_partition_prune_mode=static;
explain select t2.b from t2 join (select t1.a,t1.b from t1 where t1.a=99) t on t.a=t2.a and t.b=t2.b;

Partition is not pruned with propagated t2.a=99:

+------------------------------+----------+-----------+---------------+-----------------------------------------------------------------------+
| id                           | estRows  | task      | access object | operator info                                                         |
+------------------------------+----------+-----------+---------------+-----------------------------------------------------------------------+
| HashJoin_13                  | 12.49    | root      |               | inner join, equal:[eq(test.t1.a, test.t2.a) eq(test.t1.b, test.t2.b)] |
| ├─TableReader_16(Build)      | 9.99     | root      |               | data:Selection_15                                                     |
| │ └─Selection_15             | 9.99     | cop[tikv] |               | eq(test.t1.a, 99), not(isnull(test.t1.b))                             |
| │   └─TableFullScan_14       | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo                                        |
| └─TableReader_19(Probe)      | 9980.01  | root      |               | data:Selection_18                                                     |
|   └─Selection_18             | 9980.01  | cop[tikv] |               | not(isnull(test.t2.a)), not(isnull(test.t2.b))                        |
|     └─TableFullScan_17       | 10000.00 | cop[tikv] | table:t2      | keep order:false, stats:pseudo                                        |
+------------------------------+----------+-----------+---------------+-----------------------------------------------------------------------+

@elsa0520
Copy link
Contributor

elsa0520 commented Aug 2, 2023

@ghazalfamilyusa Is this a kind of optimizer rule of Functional Dependency ? Is there a relationship between constant propagated and FD ?

@winoros
Copy link
Member

winoros commented Aug 2, 2023

@ghazalfamilyusa Is this a kind of optimizer rule of Functional Dependency ? Is there a relationship between constant propagated and FD ?

If the subquery cannot be de-correlated, we must need fd to pass the const property.

@ghazalfamilyusa
Copy link
Contributor

Correct. We need a general rule for constant propagation as a logical rewrite for all cases (within and across sub-blocks). I the literature, this is called predicate move around.

@elsa0520
Copy link
Contributor

@ghazalfamilyusa I found a problem about the order of constant propagation and subquery unnest .
In our arch, we unnest the subquery in the select list firstly. So, it appears a left join in the plan tree.
Secondly, we optimize query by constant propagation rule.
So the plan can not change to the user wants. Because if we unnest the subquery firstly, the left join will not be removed in the subsequent optimizer process.

So how could we get the expect plan? It seems that the order of optimization needs to be adjusted to achieve it.
How are other systems implemented? Is there a more correct implementation method for reference ? (It would be better if relevant authoritative documents, or other system)

@ghazalfamilyusa
Copy link
Contributor

@elsa0520 : I think the constant propagation and push down should work on nested or unnested subquery. Is the outer join a limitation we have? In the literature there are two concepts: predicate derivation (constant propagation is one case of that) and predicate move around (push down is one special case of that). Both should work on different query shapes. Commerical DBMS do not publish much about it but I think IMB DB2 and Oracle may have good into about it. Teradata docs talks about that.

@elsa0520
Copy link
Contributor

elsa0520 commented Aug 15, 2023

constant propagation and push down should work on nested or unnested subquery.

Yes, I agree with you. We currently only support constant propagated that does not cross query blocks, and it does not appear in logic optimization as a separate logical rule.
Instead, it is used as an auxiliary optimization of predicate pushdown (the auxiliary here refers to the optimization function that calls the constant transfer in the predicate pushdown optimization).
My idea is that if we want to support constant propagated, we need to add a independency rule of constant propagated.

@ghazalfamilyusa
Copy link
Contributor

Agree and tha tis the right way to do it. Teradata have this and we called SAT-TC which is satisfiability & transtive closure. The rule checks if the predicates are unsatisfiable (derive FALSE) and if is not then it produce derived predicates through equality and non-equality of columns. See https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/SQL-Request-and-Transaction-Processing/Query-Rewrite-Statistics-and-Optimization/Examples-of-Query-Rewrites/Predicate-Simplification?section=xcg1472241575102__application_of_transitive_closure_section

elsa0520 added a commit to elsa0520/tidb that referenced this issue Sep 4, 2023
This new logical rule of constant propagation is meanly used for
`subquery` in From List
For example: select * from t, (select * from s where s.id >1) tmp where t.id = s.id

Fixed pingcap#15082
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
challenge-program help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. sig/planner SIG: Planner type/feature-request Categorizes issue or PR as related to a new feature.
Projects
Status: Finished
Development

Successfully merging a pull request may close this issue.