You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please, track first design draft and discussion in #1752
TL;DR
Current behavior:
defOptimize:
fornodeinPlanTree: # Traverse the Logical Plan Treeforruleinrules: # Enumerate rulestryApplyRule()
New behavior:
defOptimize:
forruleinrules: # Enumerate rulesfornodeinPlanTree: # Traverse the Logical Plan TreetryApplyRule()
No new features, all tests pass, nothing changed for the end-user.
Background
Currently each storage engine adds its own logical operator as concrete implementation for TableScanOperator abstraction. Typically each data source needs to add 2 logical operators for table scan with without aggregation. Take OpenSearch for example, there are OpenSearchLogicalIndexScan and OpenSearchLogicalIndexAgg and a bunch of pushdown optimization rules for each accordingly.
classLogicalPlanOptimizer:
/**OpenSearchrulesinclude:
*PUSH_DOWN_PAGE_SIZE*PUSH_DOWN_FILTER*PUSH_DOWN_AGGREGATION*PUSH_DOWN_SORT*PUSH_DOWN_HIGHLIGHT*PUSH_DOWN_NESTED*PUSH_DOWN_PROJECT*PUSH_DOWN_LIMIT**thatreturn*OpenSearchLogicalIndexAgg**or*OpenSearchLogicalIndexScan*finally*/valrules: List<Rule>defoptimize(plan: LogicalPlan):
forruleinrules: # Enumerate rulesfornodeinplan: # Traverse the Logical Plan TreetryApplyRule()
Optimization Protocol
There are optimizaion guidelines which should be strictly followed to ensure that search query built completely matches user request.
1.
Optimizer should apply rules in the strict order they are defined. For example, PUSH_DOWN_LIMIT should be applied last.
Violation of that causes bugs, for example #1764, #1774, #1788.
TODO ❗ expected behavior once LIMIT supported in pagination.
stateDiagram-v2
state "Before" as Before {
state "LogicalPaginate" as Paginate
state "LogicalProject" as ProjectB
state "LogicalLimit" as LimitB
state "LogicalRelation" as Relation
Paginate --> ProjectB
ProjectB --> LimitB
LimitB --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalLimit" as LimitA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> LimitA
LimitA --> TableScanBuilder
}
Loading
NESTED with LIMIT
SELECT nested(message.*) from nested-type limit4
TODO ❗ expected behavior once #1764 fixed - need to reorder tree nodes
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalNested" as NestedB
state "LogicalLimit" as LimitB
state "LogicalRelation" as Relation
ProjectB --> NestedB
NestedB --> LimitB
LimitB --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalLimit" as LimitA
state "LogicalNested" as NestedA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> LimitA
LimitA --> NestedA
NestedA --> TableScanBuilder
}
Loading
2.
Optimizer should be able to apply a rule matching Something even when plan tree has something in between of LogicalSomething and TableScanBuilder, unless exception specified. TableScanBuilder could be wrapped by another tree node, for example in join implementation (see #1623).
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalLimit" as Limit
state "LogicalSort" as Sort
state "LogicalRelation" as Relation
ProjectB --> Limit
Limit --> Sort
Sort --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> TableScanBuilder
}
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalSort" as Sort
state "LogicalLimit" as Limit
state "LogicalRelation" as Relation
ProjectB --> Sort
Sort --> Limit
Limit --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> TableScanBuilder
}
Loading
An SQL query; likely plan trees of all SQL queries are always in the same order
select*from calcs where int0 >0order by int2 limit10;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalLimit" as Limit
state "LogicalSort" as Sort
state "LogicalFilter" as Filter
state "LogicalRelation" as Relation
ProjectB --> Limit
Limit --> Sort
Sort --> Filter
Filter --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> TableScanBuilder
}
Loading
3.
A query might have multiple highlights backed by LogicalHighlight (and filters and sorts) - all of them should be pushed down.
A corresponding rule should be attempted multiple times.
Sample queries
Multiple highlights
SELECT highlight(Title), highlight(Body, pre_tags='<mark style="background-color: green;">', post_tags='</mark>') FROMbeer.stackexchangeWHERE multi_match([Title, Body], 'IPA') ORDER BY Id LIMIT1;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalHighlight" as Highlight1
state "LogicalHighlight" as Highlight2
state "LogicalLimit" as Limit
state "LogicalSort" as Sort
state "LogicalFilter" as Filter
state "LogicalRelation" as Relation
ProjectB --> Highlight1
Highlight1 --> Highlight2
Highlight2 --> Limit
Limit --> Sort
Sort --> Filter
Filter --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> TableScanBuilder
}
Loading
Multiple filters
source=account | where age >30 | where age <35 | fields age;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalFilter" as Filter1
state "LogicalFilter" as Filter2
state "LogicalRelation" as Relation
ProjectB --> Filter1
Filter1 --> Filter2
Filter2 --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> TableScanBuilder
}
Loading
Multiple sorts
source=account | sort age | sort lastname | head 20;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalLimit" as Limit
state "LogicalSort" as Sort1
state "LogicalSort" as Sort2
state "LogicalRelation" as Relation
ProjectB --> Limit
Limit --> Sort1
Sort1 --> Sort2
Sort2 --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> TableScanBuilder
}
Loading
4.
Dislike 3 most of the rules should be applied once only.
TODO ❗ optimization doesn't work correctly in that cases, #917
Sample queries
PPL
source=account | fields firstname, lastname | head 10;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as Project1B
state "LogicalLimit" as LimitB
state "LogicalProject" as Project2B
state "LogicalRelation" as Relation
Project1B --> LimitB
LimitB --> Project2B
Project2B --> Relation
}
state "After" as After {
state "LogicalProject" as Project1A
state "LogicalLimit" as LimitA
state "LogicalProject" as Project2A
state "TableScanBuilder" as TableScanBuilder
Project1A --> LimitA
LimitA --> Project2A
Project2A --> TableScanBuilder
}
Loading
5.
PUSH_DOWN_PROJECT should not happen if there is a LogicalEval between LogicalProject and TableScanBuilder.
Sample queries
source=bank | eval f = abs(age) | fields f;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalEval" as EvalB
state "LogicalRelation" as Relation
ProjectB --> EvalB
EvalB --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalEval" as EvalA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> EvalA
EvalA --> TableScanBuilder
}
Loading
6.
Similar to 5, LogicalWindow in the plan tree between LogicalProject and TableScanBuilder should block PUSH_DOWN_PROJECT operation.
Sample queries
SELECTavg(date0) OVER(PARTITION BY datetime1) from calcs;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalWindow" as WindowB
state "LogicalSort" as Sort
state "LogicalRelation" as Relation
ProjectB --> WindowB
WindowB --> Sort
Sort --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalWindow" as WindowA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> WindowA
WindowA --> TableScanBuilder
}
Loading
7.
Some push down operations could be rejected (e.g. pushDownWhatever returns false), so corresponding LogicalSomething node remains in the tree. Avoid infinite re-trying to apply a rule for that node.
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalFilter" as FilterB
state "LogicalAggregation" as Aggregation
state "LogicalRelation" as Relation
ProjectB --> FilterB
FilterB --> Aggregation
Aggregation --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalFilter" as FilterA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> FilterA
FilterA --> TableScanBuilder
}
Loading
OpenSearchIndexScanAggregationBuilder rejects pushDownSort and pushDownLimit
SELECTCOUNT(*) FROM account GROUP BY age ORDER BYCOUNT(*) LIMIT5;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalLimit" as LimitB
state "LogicalSort" as SortB
state "LogicalAggregation" as Aggregation
state "LogicalRelation" as Relation
ProjectB --> LimitB
LimitB --> SortB
SortB --> Aggregation
Aggregation --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalLimit" as LimitA
state "LogicalSort" as SortA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> LimitA
LimitA --> SortA
SortA --> TableScanBuilder
}
Loading
8.
PUSH_DOWN_LIMIT should be blocked if there is a LogicalSort or LogicalFilter between LogicalLimit and TableScanBuilder.
Sample queries
PUSH_DOWN_SORT can't be performed due to implementation restrictions (#1471), so PUSH_DOWN_LIMIT shouldn't be performed too.
SELECT CAST(balance AS FLOAT) AS jdbc_float_alias FROM account ORDER BY jdbc_float_alias LIMIT1;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalLimit" as LimitB
state "LogicalSort" as SortB
state "LogicalRelation" as Relation
ProjectB --> LimitB
LimitB --> SortB
SortB --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalLimit" as LimitA
state "LogicalSort" as SortA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> LimitA
LimitA --> SortA
SortA --> TableScanBuilder
}
Loading
OpenSearchIndexScanAggregationBuilder rejects pushDownFilter, so LogicalFilter remains in the tree and PUSH_DOWN_LIMIT shouldn't be performed.
SELECT gender from account GROUP BY gender HAVINGcount(*) >5LIMIT1;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalLimit" as LimitB
state "LogicalFilter" as FilterB
state "LogicalAggregation" as Aggregation
state "LogicalRelation" as Relation
ProjectB --> LimitB
LimitB --> FilterB
FilterB --> Aggregation
Aggregation --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalLimit" as LimitA
state "LogicalFilter" as FilterA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> LimitA
LimitA --> FilterA
FilterA --> TableScanBuilder
}
Loading
9.
PUSH_DOWN_SORT and PUSH_DOWN_FILTER should be after PUSH_DOWN_AGGREGATION if LogicalSort or LogicalFilter are on top of LogicalAggregation, see rules in OpenSearchIndexScanAggregationBuilder.
Sample queries
SQL: LogicalFilter is on top of LogicalAggregation
SELECT gender from account GROUP BY gender HAVINGcount(*) >500;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalFilter" as FilterB
state "LogicalAggregation" as Aggregation
state "LogicalRelation" as Relation
ProjectB --> FilterB
FilterB --> Aggregation
Aggregation --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalFilter" as FilterA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> FilterA
FilterA --> TableScanBuilder
}
Loading
PPL: LogicalFilter is on top of LogicalAggregation
source=account | stats sum(balance) as a by state | where a >780000;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalFilter" as FilterB
state "LogicalAggregation" as Aggregation
state "LogicalRelation" as Relation
ProjectB --> FilterB
FilterB --> Aggregation
Aggregation --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalFilter" as FilterA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> FilterA
FilterA --> TableScanBuilder
}
Loading
10.
PUSH_DOWN_FILTER should be before PUSH_DOWN_AGGREGATION if LogicalFilteris under of LogicalAggregation.
Sample queries
LogicalFilter is under LogicalAggregation
SELECT gender from account WHERE age >20GROUP BY gender;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalAggregation" as Aggregation
state "LogicalFilter" as Filter
state "LogicalRelation" as Relation
ProjectB --> Aggregation
Aggregation --> Filter
Filter --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> TableScanBuilder
}
Loading
11.
As combination of 9 and 10, PUSH_DOWN_FILTER should be attempted before and after PUSH_DOWN_AGGREGATION.
Sample queries
LogicalAggregation surrounded by two LogicalFilters
SELECT gender from account WHERE age >20GROUP BY gender HAVINGcount(*) >80;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalFilter" as Filter1B
state "LogicalAggregation" as Aggregation
state "LogicalFilter" as Filter2B
state "LogicalRelation" as Relation
ProjectB --> Filter1B
Filter1B --> Aggregation
Aggregation --> Filter2B
Filter2B --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalFilter" as FilterA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> FilterA
FilterA --> TableScanBuilder
}
Loading
12.
Subqueries: don’t PUSH_DOWN_AGGREGATION, PUSH_DOWN_LIMIT, PUSH_DOWN_FILTER, PUSH_DOWN_SORT for the outer query, only for inner one (under most bottom LogicalProject).
Sample queries
Aggregation
SELECTCOUNT(*) FILTER(WHERE age >35) FROM (SELECT*FROM bank) as a;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as Project1B
state "LogicalAggregation" as AggergationB
state "LogicalProject" as Project2B
state "LogicalRelation" as Relation
Project1B --> AggergationB
AggergationB --> Project2B
Project2B --> Relation
}
state "After" as After {
state "LogicalProject" as Project1A
state "LogicalAggregation" as AggergationA
state "LogicalProject" as Project2A
state "TableScanBuilder" as TableScanBuilder
Project1A --> AggergationA
AggergationA --> Project2A
Project2A --> TableScanBuilder
}
Loading
Filter
SELECT origin FROM (SELECT Origin AS origin, AvgTicketPrice AS price FROM flights) AS f WHEREf.price>100;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as Project1B
state "LogicalFilter" as FilterB
state "LogicalProject" as Project2B
state "LogicalRelation" as Relation
Project1B --> FilterB
FilterB --> Project2B
Project2B --> Relation
}
state "After" as After {
state "LogicalProject" as Project1A
state "LogicalFilter" as FilterA
state "LogicalProject" as Project2A
state "TableScanBuilder" as TableScanBuilder
Project1A --> FilterA
FilterA --> Project2A
Project2A --> TableScanBuilder
}
Loading
Sort
SELECT origin FROM (SELECT Origin AS origin, AvgTicketPrice AS price FROM flights) AS f ORDER BYf.price;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as Project1B
state "LogicalSort" as SortB
state "LogicalProject" as Project2B
state "LogicalRelation" as Relation
Project1B --> SortB
SortB --> Project2B
Project2B --> Relation
}
state "After" as After {
state "LogicalProject" as Project1A
state "LogicalSort" as SortA
state "LogicalProject" as Project2A
state "TableScanBuilder" as TableScanBuilder
Project1A --> SortA
SortA --> Project2A
Project2A --> TableScanBuilder
}
Loading
Sort and Aggregation
SELECT Origin, MIN(AvgTicketPrice) FROM (SELECT*FROM flights) AS flights GROUP BY Origin ORDER BYMAX(AvgTicketPrice)
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as Project1B
state "LogicalSort" as SortB
state "LogicalAggregation" as AggergationB
state "LogicalProject" as Project2B
state "LogicalRelation" as Relation
Project1B --> SortB
SortB --> AggergationB
AggergationB --> Project2B
Project2B --> Relation
}
state "After" as After {
state "LogicalProject" as Project1A
state "LogicalSort" as SortA
state "LogicalAggregation" as AggergationA
state "LogicalProject" as Project2A
state "TableScanBuilder" as TableScanBuilder
Project1A --> SortA
SortA --> AggergationA
AggergationA --> Project2A
Project2A --> TableScanBuilder
}
Loading
Sort and Limit
SELECT price FROM (SELECT AvgTicketPrice AS price FROM flights LIMIT10) AS flights ORDER BY price LIMIT5, 5;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as Project1B
state "LogicalLimit" as Limit1B
state "LogicalSort" as SortB
state "LogicalProject" as Project2B
state "LogicalLimit" as Limit2B
state "LogicalRelation" as Relation
Project1B --> Limit1B
Limit1B --> SortB
SortB --> Project2B
Project2B --> Limit2B
Limit2B --> Relation
}
state "After" as After {
state "LogicalProject" as Project1A
state "LogicalLimit" as LimitA
state "LogicalSort" as SortA
state "LogicalProject" as Project2A
state "TableScanBuilder" as TableScanBuilder
Project1A --> LimitA
LimitA --> SortA
SortA --> Project2A
Project2A --> TableScanBuilder
}
Loading
❗ TODO add IT
❗ TODO nested, pagination, highlight, window
13.
Push down absolutely identical tree nodes (LogicalSort, LogicalFilter).
Sample queries
Sort
SELECT FlightDelayMin, AvgTicketPrice, STDDEV_SAMP(AvgTicketPrice) OVER (ORDER BY FlightDelayMin) AS num FROM flights ORDER BY FlightDelayMin;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalWindow" as WindowB
state "LogicalSort" as Sort1
state "LogicalSort" as Sort2
state "LogicalRelation" as Relation
ProjectB --> WindowB
WindowB --> Sort1
Sort1 --> Sort2
Sort2 --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalWindow" as WindowA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> WindowA
WindowA --> TableScanBuilder
}
Loading
Filter
source=account | where age >38 | where age >38 | fields firstname, age;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalFilter" as Filter1
state "LogicalFilter" as Filter2
state "LogicalRelation" as Relation
ProjectB --> Filter1
Filter1 --> Filter2
Filter2 --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> TableScanBuilder
}
Loading
❗ TBD merge them (remove duplicates) or push down (current behavior - push down)?
❗ TODO add ITs (includehighlight)
14.
Optimize subqueries for SQL and complex queries in PPL. ❗ TODO not implemented.
Sample queries
Example from 4 - not optimized:
source=account | fields firstname, lastname | head 10;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as Project1B
state "LogicalLimit" as LimitB
state "LogicalProject" as Project2B
state "LogicalRelation" as Relation
Project1B --> LimitB
LimitB --> Project2B
Project2B --> Relation
}
state "After" as After {
state "LogicalProject" as Project1A
state "LogicalLimit" as LimitA
state "LogicalProject" as Project2A
state "TableScanBuilder" as TableScanBuilder
Project1A --> LimitA
LimitA --> Project2A
Project2A --> TableScanBuilder
}
Loading
Relevance search with subquery:
SELECT*, highlight(origin), _score
FROM (SELECT Origin AS origin, AvgTicketPrice AS price FROM flights WHERE AvgTicketPrice >100) AS f
WHERE score(origin = match_query('Base'));
This query fails because LogicalFilter from outer query wasn’t pushed down, so V2 tried to do apply relevance search in memory.
UnsupportedOperationException: OpenSearch defined function [match_query] is only supported in WHERE and HAVING clause.
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as Project1B
state "LogicalHighlight" as Highlight
state "LogicalFilter" as Filter1B
state "LogicalProject" as Project2B
state "LogicalFilter" as Filter2B
state "LogicalRelation" as Relation
Project1B --> Highlight
Highlight --> Filter1B
Filter1B --> Project2B
Project2B --> Filter2B
Filter2B --> Relation
}
state "After" as After {
state "LogicalProject" as Project1A
state "LogicalFilter" as FilterA
state "LogicalProject" as Project2A
state "TableScanBuilder" as TableScanBuilder
Project1A --> FilterA
FilterA --> Project2A
Project2A --> TableScanBuilder
}
Loading
Complex PPL query returns incorrect results
source=account | where age >30 | head 1000 | sort +balance | where age <40 | head 100 | sort -balance | where balance >10000 | fields age;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalFilter" as Filter1
state "LogicalSort" as Sort1
state "LogicalLimit" as Limit1
state "LogicalFilter" as Filter2
state "LogicalSort" as Sort2
state "LogicalLimit" as Limit2
state "LogicalFilter" as Filter3
state "LogicalRelation" as Relation
ProjectB --> Filter1
Filter1 --> Sort1
Sort1 --> Limit1
Limit1 --> Filter2
Filter2 --> Sort2
Sort2 --> Limit2
Limit2 --> Filter3
Filter3 --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalLimit" as LimitA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> LimitA
LimitA --> TableScanBuilder
}
Loading
Optimizer rule update
To satisfy requirement listed in 2, new format rule format was created. See example for PUSH_DOWN_FILTER below; ... matches any amount of Logical Plan Tree nodes of any types, except LogicalFilter.
stateDiagram-v2
state "Old PUSH_DOWN_FILTER implementation" as OldFilter {
state "LogicalFilter" as FilterOld
state "ScanBuilder" as RelationOld
FilterOld --> RelationOld
}
state "New PUSH_DOWN_FILTER implementation" as NewFilter {
state "LogicalFilter" as FilterNew
state "..." as dots
state "ScanBuilder" as RelationNew
FilterNew --> dots
dots --> RelationNew
}
Loading
This new format was applied to PUSH_DOWN_PAGE_SIZE, PUSH_DOWN_FILTER, PUSH_DOWN_AGGREGATION, PUSH_DOWN_FILTER, PUSH_DOWN_SORT, PUSH_DOWN_HIGHLIGHT, PUSH_DOWN_NESTED, PUSH_DOWN_PROJECT and PUSH_DOWN_LIMIT. CreateTableScanBuilder, CreateTableWriteBuilder and Prometheum related rules are not changed.
stateDiagram-v2
state "CreateTableScanBuilder" as Builder {
state "LogicalRelation" as Scan
}
Loading
PushDownRule class used to build PUSH_DOWN_* rules. The class architecture follows:
Please, track first design draft and discussion in #1752
TL;DR
Current behavior:
New behavior:
No new features, all tests pass, nothing changed for the end-user.
Background
Currently each storage engine adds its own logical operator as concrete implementation for
TableScanOperator
abstraction. Typically each data source needs to add 2 logical operators for table scan with without aggregation. Take OpenSearch for example, there areOpenSearchLogicalIndexScan
andOpenSearchLogicalIndexAgg
and a bunch of pushdown optimization rules for each accordingly.Optimization Protocol
There are optimizaion guidelines which should be strictly followed to ensure that search query built completely matches user request.
1.
Optimizer
should apply rules in the strict order they are defined. For example,PUSH_DOWN_LIMIT
should be applied last.Violation of that causes bugs, for example #1764, #1774, #1788.
Sample queries
LIMIT
TODO ❗ expected behavior once
LIMIT
supported in pagination.NESTED
withLIMIT
TODO ❗ expected behavior once #1764 fixed - need to reorder tree nodes
2.
Optimizer should be able to apply a rule matching
Something
even when plan tree has something in between ofLogicalSomething
andTableScanBuilder
, unless exception specified.TableScanBuilder
could be wrapped by another tree node, for example in join implementation (see #1623).Sample queries
PPL
withSORT
thenLIMIT
PPL
withLIMIT
thenSORT
SQL
query; likely plan trees of all SQL queries are always in the same order3.
A query might have multiple
highlight
s backed byLogicalHighlight
(andfilter
s andsort
s) - all of them should be pushed down.A corresponding rule should be attempted multiple times.
Sample queries
highlight
sfilter
ssort
s4.
Dislike 3 most of the rules should be applied once only.
TODO ❗ optimization doesn't work correctly in that cases, #917
Sample queries
PPL
5.
PUSH_DOWN_PROJECT
should not happen if there is aLogicalEval
betweenLogicalProject
andTableScanBuilder
.Sample queries
6.
Similar to 5,
LogicalWindow
in the plan tree betweenLogicalProject
andTableScanBuilder
should blockPUSH_DOWN_PROJECT
operation.Sample queries
7.
Some push down operations could be rejected (e.g.
pushDownWhatever
returns false), so correspondingLogicalSomething
node remains in the tree. Avoid infinite re-trying to apply a rule for that node.Sample queries
OpenSearchIndexScanAggregationBuilder
rejectspushDownFilter
OpenSearchIndexScanAggregationBuilder
rejectspushDownSort
andpushDownLimit
8.
PUSH_DOWN_LIMIT
should be blocked if there is aLogicalSort
orLogicalFilter
betweenLogicalLimit
andTableScanBuilder
.Sample queries
PUSH_DOWN_SORT
can't be performed due to implementation restrictions (#1471), soPUSH_DOWN_LIMIT
shouldn't be performed too.OpenSearchIndexScanAggregationBuilder
rejectspushDownFilter
, soLogicalFilter
remains in the tree andPUSH_DOWN_LIMIT
shouldn't be performed.9.
PUSH_DOWN_SORT
andPUSH_DOWN_FILTER
should be afterPUSH_DOWN_AGGREGATION
ifLogicalSort
orLogicalFilter
are on top ofLogicalAggregation
, see rules inOpenSearchIndexScanAggregationBuilder
.Sample queries
SQL
:LogicalFilter
is on top ofLogicalAggregation
PPL
:LogicalFilter
is on top ofLogicalAggregation
10.
PUSH_DOWN_FILTER
should be beforePUSH_DOWN_AGGREGATION
ifLogicalFilteris
under ofLogicalAggregation
.Sample queries
LogicalFilter
is underLogicalAggregation
11.
As combination of 9 and 10,
PUSH_DOWN_FILTER
should be attempted before and afterPUSH_DOWN_AGGREGATION
.Sample queries
LogicalAggregation
surrounded by twoLogicalFilter
s12.
Subqueries: don’t
PUSH_DOWN_AGGREGATION
,PUSH_DOWN_LIMIT
,PUSH_DOWN_FILTER
,PUSH_DOWN_SORT
for the outer query, only for inner one (under most bottomLogicalProject
).Sample queries
❗ TODO add IT
❗ TODO nested, pagination, highlight, window
13.
Push down absolutely identical tree nodes (
LogicalSort
,LogicalFilter
).Sample queries
❗ TBD merge them (remove duplicates) or push down (current behavior - push down)?
❗ TODO add ITs (includehighlight)
14.
Optimize subqueries for
SQL
and complex queries inPPL
. ❗ TODO not implemented.Sample queries
This query fails because
LogicalFilter
from outer query wasn’t pushed down, so V2 tried to do apply relevance search in memory.Optimizer rule update
To satisfy requirement listed in 2, new format rule format was created. See example for
PUSH_DOWN_FILTER
below;...
matches any amount of Logical Plan Tree nodes of any types, exceptLogicalFilter
.This new format was applied to
PUSH_DOWN_PAGE_SIZE
,PUSH_DOWN_FILTER
,PUSH_DOWN_AGGREGATION
,PUSH_DOWN_FILTER
,PUSH_DOWN_SORT
,PUSH_DOWN_HIGHLIGHT
,PUSH_DOWN_NESTED
,PUSH_DOWN_PROJECT
andPUSH_DOWN_LIMIT
.CreateTableScanBuilder
,CreateTableWriteBuilder
and Prometheum related rules are not changed.PushDownRule
class used to buildPUSH_DOWN_*
rules. The class architecture follows:The following rule configurations are created:
PUSH_DOWN_FILTER
LogicalFilter
true
pushDownFilter
LogicalAggregation
,LogicalProject
PUSH_DOWN_AGGREGATION
LogicalAggregation
false
pushDownAggregation
LogicalProject
PUSH_DOWN_SORT
LogicalSort
true
pushDownSort
LogicalProject
PUSH_DOWN_LIMIT
LogicalLimit
false
pushDownLimit
LogicalSort
,LogicalFilter
,LogicalProject
PUSH_DOWN_PROJECT
LogicalProject
false
pushDownProject
LogicalEval
,LogicalWindow
PUSH_DOWN_HIGHLIGHT
LogicalHighlight
true
pushDownHighlight
PUSH_DOWN_NESTED
LogicalNested
false
pushDownNested
PUSH_DOWN_PAGE_SIZE
LogicalPaginate
false
pushDownPageSize
Note that default exception is applied to all rules.
Rules are defined, checked and applied in the following order:
To satisfy requirement 11,
PUSH_DOWN_FILTER
is listed twice.Code
Please, see code MVP PoC in
Bit-Quill:dev-optimizer-rework
branch: https://github.com/opensearch-project/sql/compare/main..Bit-Quill:opensearch-project-sql:dev-optimizer-rework?expand=1This code passes all IT and contains only Optimizer rework. There're no new features, so things marked as not implemented are still not implemented.
References
Current optimizer doc: https://github.com/opensearch-project/sql/blob/main/docs/dev/query-optimizer-improvement.md
Last optimizer update was done in #1091
Purpose
Proposed changes will unblock other fixes and features:
LIMIT
in pagination. #1752LIMIT
works incorrectly withNESTED
#1764LIMIT
#1774The text was updated successfully, but these errors were encountered: