From 78bdf018ebb4b5b7401d28bca54a299c5b929c08 Mon Sep 17 00:00:00 2001 From: Yury-Fridlyand Date: Thu, 27 Apr 2023 18:24:37 -0700 Subject: [PATCH 1/8] Add newer docs for pagination. Signed-off-by: Yury-Fridlyand --- docs/dev/Pagination-v2.md | 84 +++++++++ docs/dev/index.md | 2 +- docs/dev/query-optimizer-improvement.md | 212 +++++++++++++++++++++++ docs/dev/query-optimizier-improvement.md | 106 ------------ 4 files changed, 297 insertions(+), 107 deletions(-) create mode 100644 docs/dev/query-optimizer-improvement.md delete mode 100644 docs/dev/query-optimizier-improvement.md diff --git a/docs/dev/Pagination-v2.md b/docs/dev/Pagination-v2.md index 6e2f3f36d8..0b33ab2d40 100644 --- a/docs/dev/Pagination-v2.md +++ b/docs/dev/Pagination-v2.md @@ -285,3 +285,87 @@ OpenSearchExecutionEngine->>+ProjectOperator: getTotalHits ResourceMonitorPlan-->>-ProjectOperator: value ProjectOperator-->>-OpenSearchExecutionEngine: value ``` + +### Plan Tree changes + +#### Abstract Plan tree + +Changes: +1. New Plan node -- `Paginate` -- added into the tree. +2. `QueryPlan` got new optional field: page size. When it is set, `Paginate` is being added. It is converted to `LogicalPaginate` later. For non-paging requests the tree remains unchanged. + +```mermaid +classDiagram + direction LR + class QueryPlan { + <> + -int pageSize + } + class Paginate { + <> + } + class UnresolvedPlanTree { + <> + } + QueryPlan --* Paginate + Paginate --* UnresolvedPlanTree +``` +#### Logical Plan tree + +Changes: +1. `LogicalPaginate` is added to the top of the tree. It has a private field `pageSize` being pushed down in the `Optimizer`. + +```mermaid +classDiagram + direction LR + class LogicalPaginate { + <> + -int pageSize + } + class LogicalPlanTree { + <> + } + class LogicalRelation { + <> + } + LogicalPaginate --* LogicalPlanTree + LogicalPlanTree --* LogicalRelation +``` + +#### Optimized Plan tree + +Changes: +1. There is a `OpenSearchPagedIndexScanBuilder` instead of `OpenSearchIndexScanQueryBuilder` on the bottom of the tree for paging requests. Non-paging requests are not affected. Both are instances of `TableScanBuilder` which extends `PhysicalPlan` interface. +2. `LogicalPaginate` is removed from the tree during push down operation in `Optimizer`. + +See [article about `TableScanBuilder`](query-optimizer-improvement.md#TableScanBuilder) for more details. + +```mermaid +classDiagram + class LogicalProject { + <> + } + class OpenSearchPagedIndexScanBuilder { + <> + } + + LogicalProject --* OpenSearchPagedIndexScanBuilder +``` + +#### Physical Plan tree + +Changes: +1. `OpenSearchPagedIndexScanBuilder` is converted to `OpenSearchPagedIndexScan` by `Implementor`. + +```mermaid +classDiagram + direction LR + class ProjectOperator { + <> + } + class OpenSearchPagedIndexScan { + <> + } + + ProjectOperator --* OpenSearchPagedIndexScan +``` diff --git a/docs/dev/index.md b/docs/dev/index.md index 96248ecf48..bff1d9c7bb 100644 --- a/docs/dev/index.md +++ b/docs/dev/index.md @@ -43,7 +43,7 @@ + [Semantic Analysis](query-semantic-analysis.md): performs semantic analysis to ensure semantic correctness + [Type Conversion](query-type-conversion.md): implement implicit data type conversion + **Query Planning** - + [Logical Optimization](query-optimizier-improvement.md): improvement on logical optimizer and physical implementer + + [Logical Optimization](query-optimizer-improvement.md): improvement on logical optimizer and physical implementer + **Query Execution** + [Query Manager](query-manager.md): query management + **Query Acceleration** diff --git a/docs/dev/query-optimizer-improvement.md b/docs/dev/query-optimizer-improvement.md new file mode 100644 index 0000000000..4c59cdc378 --- /dev/null +++ b/docs/dev/query-optimizer-improvement.md @@ -0,0 +1,212 @@ +### Background + +This section introduces the current architecture of logical optimizer and physical transformation. + +#### Logical-to-Logical Optimization + +Currently each storage engine adds its own logical operator as concrete implementation for `TableScanOperator` abstraction. Typically each data source needs to add 2 logical operators for table scan with and without aggregation. Take OpenSearch for example, there are `OpenSearchLogicalIndexScan` and `OpenSearchLogicalIndexAgg` and a bunch of pushdown optimization rules for each accordingly. + +``` +class LogicalPlanOptimizer: + /* + * OpenSearch rules include: + * MergeFilterAndRelation + * MergeAggAndIndexScan + * MergeAggAndRelation + * MergeSortAndRelation + * MergeSortAndIndexScan + * MergeSortAndIndexAgg + * MergeSortAndIndexScan + * MergeLimitAndRelation + * MergeLimitAndIndexScan + * PushProjectAndRelation + * PushProjectAndIndexScan + * + * that return *OpenSearchLogicalIndexAgg* + * or *OpenSearchLogicalIndexScan* finally + */ + val rules: List + + def optimize(plan: LogicalPlan): + for rule in rules: + if rule.match(plan): + plan = rules.apply(plan) + return plan.children().forEach(this::optimize) +``` + +#### Logical-to-Physical Transformation + +After logical transformation, planner will let the `Table` in `LogicalRelation` (identified before logical transformation above) transform the logical plan to physical plan. + +``` +class OpenSearchIndex: + + def implement(plan: LogicalPlan): + return plan.accept( + DefaultImplementor(): + def visitNode(node): + if node is OpenSearchLogicalIndexScan: + return OpenSearchIndexScan(...) + else if node is OpenSearchLogicalIndexAgg: + return OpenSearchIndexScan(...) +``` + +### Problem Statement + +The current planning architecture causes 2 serious problems: + +1. Each data source adds special logical operator and explode the optimizer rule space. For example, Prometheus also has `PrometheusLogicalMetricAgg` and `PrometheusLogicalMetricScan` accordingly. They have the exactly same pattern to match query plan tree as OpenSearch. +2. A bigger problem is the difficulty of transforming from logical to physical when there are 2 `Table`s in query plan. Because only 1 of them has the chance to do the `implement()`. This is a blocker for supporting `INSERT ... SELECT ...` statement or JOIN query. See code below. + +``` + public PhysicalPlan plan(LogicalPlan plan) { + Table table = findTable(plan); + if (table == null) { + return plan.accept(new DefaultImplementor<>(), null); + } + return table.implement( + table.optimize(optimize(plan))); + } +``` + +### Solution + +#### TableScanBuilder + +A new abstraction `TableScanBuilder` is added as a transition operator during logical planning and optimization. Each data source provides its implementation class by `Table` interface. The push down difference in non-aggregate and aggregate query is hidden inside specific scan builder, for example `OpenSearchIndexScanBuilder` rather than exposed to core module. + +```mermaid +classDiagram +%% Mermaid fails to render `LogicalPlanNodeVisitor~R, C~` https://github.com/mermaid-js/mermaid/issues/3188, using `<R, C>` as a workaround + class LogicalPlan { + -List~LogicalPlan~ childPlans + +LogicalPlan(List~LogicalPlan~) + +accept(LogicalPlanNodeVisitor<R, C>, C)* R + +replaceChildPlans(List~LogicalPlan~ childPlans) LogicalPlan + } + class TableScanBuilder { + +TableScanBuilder() + +build()* TableScanOperator + +pushDownFilter(LogicalFilter) boolean + +pushDownAggregation(LogicalAggregation) boolean + +pushDownSort(LogicalSort) boolean + +pushDownLimit(LogicalLimit) boolean + +pushDownProject(LogicalProject) boolean + +pushDownHighlight(LogicalHighlight) boolean + +pushDownNested(LogicalNested) boolean + +accept(LogicalPlanNodeVisitor<R, C>, C) R + } + class OpenSearchIndexScanQueryBuilder { + OpenSearchIndexScanQueryBuilder(OpenSearchIndexScan) + +build() TableScanOperator + +pushDownFilter(LogicalFilter) boolean + +pushDownAggregation(LogicalAggregation) boolean + +pushDownSort(LogicalSort) boolean + +pushDownLimit(LogicalLimit) boolean + +pushDownProject(LogicalProject) boolean + +pushDownHighlight(LogicalHighlight) boolean + +pushDownNested(LogicalNested) boolean + +findReferenceExpression(NamedExpression)$ List~ReferenceExpression~ + +findReferenceExpressions(List~NamedExpression~)$ Set~ReferenceExpression~ + } + class OpenSearchPagedIndexScanBuilder { + +OpenSearchPagedIndexScanBuilder(OpenSearchPagedIndexScan) + +build() TableScanOperator + } + class OpenSearchIndexScanBuilder { + -TableScanBuilder delegate + -boolean isLimitPushedDown + +OpenSearchIndexScanBuilder(OpenSearchIndexScan) + OpenSearchIndexScanBuilder(TableScanBuilder) + +build() TableScanOperator + +pushDownFilter(LogicalFilter) boolean + +pushDownAggregation(LogicalAggregation) boolean + +pushDownSort(LogicalSort) boolean + +pushDownLimit(LogicalLimit) boolean + +pushDownProject(LogicalProject) boolean + +pushDownHighlight(LogicalHighlight) boolean + +pushDownNested(LogicalNested) boolean + -sortByFieldsOnly(LogicalSort) boolean + } + + LogicalPlan <|-- TableScanBuilder + TableScanBuilder <|-- OpenSearchIndexScanQueryBuilder + TableScanBuilder <|-- OpenSearchPagedIndexScanBuilder + TableScanBuilder <|-- OpenSearchIndexScanBuilder + OpenSearchIndexScanBuilder *-- "1" TableScanBuilder : delegate + OpenSearchIndexScanBuilder <.. OpenSearchIndexScanQueryBuilder : creates +``` + +#### Table Push Down Rules + +In this way, `LogicalPlanOptimizer` in core module always have the same set of rule for all push down optimization. + +```mermaid +classDiagram + class LogicalPlanOptimizer { + +create()$ LogicalPlanOptimizer + +optimize(LogicalPlan) LogicalPlan + -internalOptimize(LogicalPlan) LogicalPlan + } + class CreateTableScanBuilder { + +apply(LogicalRelation, Captures) LogicalPlan + -pattern() Pattern~LogicalRelation~ + } + class CreatePagingTableScanBuilder { + +apply(LogicalPaginate, Captures) LogicalPlan + -pattern() Pattern~LogicalRelation~ + -findLogicalRelation(LogicalPaginate) boolean + } + class Table { + +TableScanBuilder createScanBuilder() + +TableScanBuilder createPagedScanBuilder(int) + } + class TableScanPushDown~T~ { + +Rule~T~ PUSH_DOWN_FILTER$ + +Rule~T~ PUSH_DOWN_AGGREGATION$ + +Rule~T~ PUSH_DOWN_SORT$ + +Rule~T~ PUSH_DOWN_LIMIT$ + +Rule~T~ PUSH_DOWN_PROJECT$ + +Rule~T~ PUSH_DOWN_HIGHLIGHT$ + +Rule~T~ PUSH_DOWN_NESTED$ + +apply(T, Captures) LogicalPlan + +pattern() Pattern~T~ + } + class TableScanBuilder { + +pushDownFilter(LogicalFilter) boolean + +pushDownAggregation(LogicalAggregation) boolean + +pushDownSort(LogicalSort) boolean + +pushDownLimit(LogicalLimit) boolean + +pushDownProject(LogicalProject) boolean + +pushDownHighlight(LogicalHighlight) boolean + +pushDownNested(LogicalNested) boolean + } + TableScanPushDown~T~ -- TableScanBuilder + LogicalPlanOptimizer ..> CreateTableScanBuilder : creates + LogicalPlanOptimizer ..> CreatePagingTableScanBuilder : creates + CreateTableScanBuilder ..> Table + CreatePagingTableScanBuilder ..> Table + LogicalPlanOptimizer ..* TableScanPushDown~T~ + Table ..> TableScanBuilder : creates +``` + +### Examples + +The following diagram illustrates how `TableScanBuilder` along with `TablePushDownRule` solve the problem aforementioned. + +![optimizer-Page-1](https://user-images.githubusercontent.com/46505291/203645359-3f2fff73-a210-4bc0-a582-951a27de684d.jpg) + + +Similarly, `TableWriteBuilder` will be added and work in the same way in separate PR: https://github.com/opensearch-project/sql/pull/1094 + +![optimizer-Page-2](https://user-images.githubusercontent.com/46505291/203645380-5155fd22-71b4-49ca-8ed7-9652b005f761.jpg) + +### TODO + +1. Refactor Prometheus optimize rule and enforce table scan builder +2. Figure out how to implement AD commands +4. Deprecate `optimize()` and `implement()` if item 1 and 2 complete +5. Introduce fixed point or maximum iteration limit for iterative optimization +6. Investigate if CBO should be part of current optimizer or distributed planner in future +7. Remove `pushdownHighlight` once it's moved to OpenSearch storage +8. Move `TableScanOperator` to the new `read` package (leave it in this PR to avoid even more file changed) diff --git a/docs/dev/query-optimizier-improvement.md b/docs/dev/query-optimizier-improvement.md deleted file mode 100644 index 753abcc844..0000000000 --- a/docs/dev/query-optimizier-improvement.md +++ /dev/null @@ -1,106 +0,0 @@ -### Background - -This section introduces the current architecture of logical optimizer and physical transformation. - -#### Logical-to-Logical Optimization - -Currently each storage engine adds its own logical operator as concrete implementation for `TableScanOperator` abstraction. Typically each data source needs to add 2 logical operators for table scan with and without aggregation. Take OpenSearch for example, there are `OpenSearchLogicalIndexScan` and `OpenSearchLogicalIndexAgg` and a bunch of pushdown optimization rules for each accordingly. - -``` -class LogicalPlanOptimizer: - /* - * OpenSearch rules include: - * MergeFilterAndRelation - * MergeAggAndIndexScan - * MergeAggAndRelation - * MergeSortAndRelation - * MergeSortAndIndexScan - * MergeSortAndIndexAgg - * MergeSortAndIndexScan - * MergeLimitAndRelation - * MergeLimitAndIndexScan - * PushProjectAndRelation - * PushProjectAndIndexScan - * - * that return *OpenSearchLogicalIndexAgg* - * or *OpenSearchLogicalIndexScan* finally - */ - val rules: List - - def optimize(plan: LogicalPlan): - for rule in rules: - if rule.match(plan): - plan = rules.apply(plan) - return plan.children().forEach(this::optimize) -``` - -#### Logical-to-Physical Transformation - -After logical transformation, planner will let the `Table` in `LogicalRelation` (identified before logical transformation above) transform the logical plan to physical plan. - -``` -class OpenSearchIndex: - - def implement(plan: LogicalPlan): - return plan.accept( - DefaultImplementor(): - def visitNode(node): - if node is OpenSearchLogicalIndexScan: - return OpenSearchIndexScan(...) - else if node is OpenSearchLogicalIndexAgg: - return OpenSearchIndexScan(...) -``` - -### Problem Statement - -The current planning architecture causes 2 serious problems: - -1. Each data source adds special logical operator and explode the optimizer rule space. For example, Prometheus also has `PrometheusLogicalMetricAgg` and `PrometheusLogicalMetricScan` accordingly. They have the exactly same pattern to match query plan tree as OpenSearch. -2. A bigger problem is the difficulty of transforming from logical to physical when there are 2 `Table`s in query plan. Because only 1 of them has the chance to do the `implement()`. This is a blocker for supporting `INSERT ... SELECT ...` statement or JOIN query. See code below. - -``` - public PhysicalPlan plan(LogicalPlan plan) { - Table table = findTable(plan); - if (table == null) { - return plan.accept(new DefaultImplementor<>(), null); - } - return table.implement( - table.optimize(optimize(plan))); - } -``` - -### Solution - -#### TableScanBuilder - -A new abstraction `TableScanBuilder` is added as a transition operator during logical planning and optimization. Each data source provides its implementation class by `Table` interface. The push down difference in non-aggregate and aggregate query is hidden inside specific scan builder, for example `OpenSearchIndexScanBuilder` rather than exposed to core module. - -![TableScanBuilder](https://user-images.githubusercontent.com/46505291/204355538-e54f7679-3585-423e-97d5-5832b2038cc1.png) - -#### TablePushDownRules - -In this way, `LogicalOptimizier` in core module always have the same set of rule for all push down optimization. - -![LogicalPlanOptimizer](https://user-images.githubusercontent.com/46505291/203142195-9b38f1e9-1116-469d-9709-3cbf893ec522.png) - - -### Examples - -The following diagram illustrates how `TableScanBuilder` along with `TablePushDownRule` solve the problem aforementioned. - -![optimizer-Page-1](https://user-images.githubusercontent.com/46505291/203645359-3f2fff73-a210-4bc0-a582-951a27de684d.jpg) - - -Similarly, `TableWriteBuilder` will be added and work in the same way in separate PR: https://github.com/opensearch-project/sql/pull/1094 - -![optimizer-Page-2](https://user-images.githubusercontent.com/46505291/203645380-5155fd22-71b4-49ca-8ed7-9652b005f761.jpg) - -### TODO - -1. Refactor Prometheus optimize rule and enforce table scan builder -2. Figure out how to implement AD commands -4. Deprecate `optimize()` and `implement()` if item 1 and 2 complete -5. Introduce fixed point or maximum iteration limit for iterative optimization -6. Investigate if CBO should be part of current optimizer or distributed planner in future -7. Remove `pushdownHighlight` once it's moved to OpenSearch storage -8. Move `TableScanOperator` to the new `read` package (leave it in this PR to avoid even more file changed) \ No newline at end of file From b522c5135da9b873f0b7b670d9bcb77e6b090c83 Mon Sep 17 00:00:00 2001 From: Yury-Fridlyand Date: Fri, 28 Apr 2023 13:20:03 -0700 Subject: [PATCH 2/8] Address PR feedback. Signed-off-by: Yury-Fridlyand --- docs/dev/Pagination-v2.md | 72 ++++++++++++++++++++++++++++----------- 1 file changed, 53 insertions(+), 19 deletions(-) diff --git a/docs/dev/Pagination-v2.md b/docs/dev/Pagination-v2.md index 0b33ab2d40..422384b706 100644 --- a/docs/dev/Pagination-v2.md +++ b/docs/dev/Pagination-v2.md @@ -6,7 +6,6 @@ A cursor is a SQL abstraction for pagination. A client can open a cursor, retrie Currently, SQL plugin does not provide SQL cursor syntax. However, the SQL REST endpoint can return result a page at a time. This feature is used by JDBC and ODBC drivers. - # Scope Currenty, V2 engine supports pagination only for simple `SELECT * FROM ` queries without any other clauses like `WHERE` or `ORDER BY`. @@ -15,6 +14,9 @@ https://user-images.githubusercontent.com/88679692/224208630-8d38d833-abf8-4035- # REST API ## Initial Query Request + +Initial query request contains the search request and page size. It can't be changed later while scrolling through pages issued by this request. Search query to OpenSearch engine is built during processing the initial request. + ```json POST /_plugins/_sql { @@ -26,12 +28,12 @@ POST /_plugins/_sql Response: ```json { - "cursor": /* cursor_id */, + "cursor": "", "datarows": [ - // ... + ... ], "schema" : [ - // ... + ... ] } ``` @@ -39,11 +41,14 @@ Response: If `query` is a DML statement then pagination does not apply, the `fetch_size` parameter is ignored and a cursor is not created. This is existing behaviour in v1 engine. -The client receives an (error response](#error-response) if: +The client receives an [error response](#error-response) if: - `fetch_size` is not a positive integer, or - evaluating `query` results in a server-side error. ## Next Page Request + +Subsequent page request contains a cursor only. + ```json POST /_plugins/_sql { @@ -54,29 +59,35 @@ Similarly to v1 engine, the response object is the same as initial response if t `cursor_id` will be different with each request. -If this is the last page, the `cursor` property is ommitted. The cursor is closed automatically. +## End of scrolling/paging + +When scrolling is finished, SQL plugin still returns a cursor. This cursor leads to the final page, which as no cursor and no data. Receiving that page means all data was properly queried and returned to user. Cursor is closed automatically on processing that page. The client will receive an [error response](#error-response) if executing this request results in an OpenSearch or SQL plug-in error. ## Cursor Keep Alive Timeout -Each cursor has a keep alive timer associated with it. When the timer runs out, the cursor is closed by OpenSearch. + +Each cursor has a keep alive timer associated with it. When the timer runs out, the cursor is automatically closed by OpenSearch. This timer is reset every time a page is retrieved. The client will receive an [error response](#error-response) if it sends a cursor request for an expired cursor. +Keep alive timeout is [configurable](../user/admin/settings.rst#plugins.sql.cursor.keep_alive) by setting `plugins.sql.cursor.keep_alive` and has default value 1 minute. + ## Error Response + The client will receive an error response if any of the above REST calls result in an server-side error. The response object has the following format: ```json { "error": { - "details": , - "reason": , - "type": + "details": "", + "reason": "", + "type": "" }, - "status": + "status": "" } ``` @@ -103,7 +114,7 @@ V2 SQL engine supports *sql node load balancing* -- a cursor request can be rout ## Design Diagrams New code workflows are highlighted. -### First page +### Initial Query Request ```mermaid sequenceDiagram participant SQLService @@ -237,6 +248,7 @@ sequenceDiagram participant ContinuePageRequest Note over PlanSerializer: Unzip +Note over PlanSerializer: Validate cursor integrity PlanSerializer->>+Deserialization Stream: deserialize Deserialization Stream->>+ProjectOperator: create new Note over ProjectOperator: load private fields @@ -288,9 +300,11 @@ OpenSearchExecutionEngine->>+ProjectOperator: getTotalHits ### Plan Tree changes +There are different plan trees are built during request processing. See more about their purpose and stages [here](query-optimizer-improvement.md#Examples). Article below describes what changes are introduced in these trees by pagination feature. + #### Abstract Plan tree -Changes: +Changes to plan tree for Initial Query Request with pagination: 1. New Plan node -- `Paginate` -- added into the tree. 2. `QueryPlan` got new optional field: page size. When it is set, `Paginate` is being added. It is converted to `LogicalPaginate` later. For non-paging requests the tree remains unchanged. @@ -299,7 +313,7 @@ classDiagram direction LR class QueryPlan { <> - -int pageSize + -Optional~int~ pageSize } class Paginate { <> @@ -310,17 +324,23 @@ classDiagram QueryPlan --* Paginate Paginate --* UnresolvedPlanTree ``` + +Non-paging requests have the same plan tree, but `pageSize` value in `QueryPlan` is unset. + +TODO: +Add graph for `ContinuePaginatedPlan`. + #### Logical Plan tree -Changes: -1. `LogicalPaginate` is added to the top of the tree. It has a private field `pageSize` being pushed down in the `Optimizer`. +Changes to plan tree for Initial Query Request with pagination: +1. `LogicalPaginate` is added to the top of the tree. It stores information about paging/scrolling should be done in a private field `pageSize` being pushed down in the `Optimizer`. ```mermaid classDiagram direction LR class LogicalPaginate { <> - -int pageSize + int pageSize } class LogicalPlanTree { <> @@ -332,10 +352,24 @@ classDiagram LogicalPlanTree --* LogicalRelation ``` +There are no changes for non-paging requests. + +```mermaid +classDiagram + direction LR + class LogicalPlanTree { + <> + } + class LogicalRelation { + <> + } + LogicalPlanTree --* LogicalRelation +``` + #### Optimized Plan tree Changes: -1. There is a `OpenSearchPagedIndexScanBuilder` instead of `OpenSearchIndexScanQueryBuilder` on the bottom of the tree for paging requests. Non-paging requests are not affected. Both are instances of `TableScanBuilder` which extends `PhysicalPlan` interface. +1. For pagination request, we push a `OpenSearchPagedIndexScanBuilder` instead of `OpenSearchIndexScanQueryBuilder` to the bottom of the tree. Both are instances of `TableScanBuilder` which extends `PhysicalPlan` interface. 2. `LogicalPaginate` is removed from the tree during push down operation in `Optimizer`. See [article about `TableScanBuilder`](query-optimizer-improvement.md#TableScanBuilder) for more details. @@ -364,7 +398,7 @@ classDiagram <> } class OpenSearchPagedIndexScan { - <> + <> } ProjectOperator --* OpenSearchPagedIndexScan From 31712d623bb2c05812bf41520fce0b81e0c92169 Mon Sep 17 00:00:00 2001 From: Yury-Fridlyand Date: Tue, 2 May 2023 13:28:08 -0700 Subject: [PATCH 3/8] Complete TODO and add some more info. Signed-off-by: Yury-Fridlyand --- docs/dev/Pagination-v2.md | 53 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 49 insertions(+), 4 deletions(-) diff --git a/docs/dev/Pagination-v2.md b/docs/dev/Pagination-v2.md index 422384b706..d6ed556604 100644 --- a/docs/dev/Pagination-v2.md +++ b/docs/dev/Pagination-v2.md @@ -300,7 +300,33 @@ OpenSearchExecutionEngine->>+ProjectOperator: getTotalHits ### Plan Tree changes -There are different plan trees are built during request processing. See more about their purpose and stages [here](query-optimizer-improvement.md#Examples). Article below describes what changes are introduced in these trees by pagination feature. +There are different plan trees are built during request processing. Read more about their purpose and stages [here](query-optimizer-improvement.md#Examples). Article below describes what changes are introduced in these trees by pagination feature. + +Simplified workflow of plan trees is shown below. First (Initial) page request is processed the same way as non-paging request. + +```mermaid +classDiagram + direction LR + class Abstract Plan Tree + class Logical Plan Tree + class Optimized Logical Plan Tree + class Physical Plan Tree + + Abstract Plan Tree --> Logical Plan Tree : Planner + Logical Plan Tree --> Optimized Logical Plan Tree : Optimizer + Optimized Logical Plan Tree --> Physical Plan Tree : Implementor +``` + +New plan tree workflow was added for subsequent page requests. Since a final Physical Plan tree was already created for Initial request, subsequent requests should have the same tree. The tree is being serialized into `cursor` by `PlanSerializer` and then restored back. + +```mermaid +classDiagram + direction LR + class Abstract Plan Tree + class Physical Plan Tree + + Abstract Plan Tree --> Physical Plan Tree : deserialization +``` #### Abstract Plan tree @@ -327,8 +353,21 @@ classDiagram Non-paging requests have the same plan tree, but `pageSize` value in `QueryPlan` is unset. -TODO: -Add graph for `ContinuePaginatedPlan`. +Abstract plan tree for Subsequent Query Request (for second and further pages) contains only one node -- `ContinuePaginatedPlan`. + +```mermaid +classDiagram + direction LR + class ContinuePaginatedPlan { + <> + -String cursor + -PlanSerializer planSerializer + -QueryService queryService + -ResponseListener queryResponseListener + } +``` + +`ContinuePaginatedPlan` translated to entire Physical Plan Tree by `PlanSerializer` on cursor deserialization. It bypasses Logical Plan tree stage, `Planner`, `Optimizer` and `Implementor`. #### Logical Plan tree @@ -366,7 +405,9 @@ classDiagram LogicalPlanTree --* LogicalRelation ``` -#### Optimized Plan tree +Note: No Logical Plan tree created for Subsequent Query Request. + +#### Optimized Logical Plan tree Changes: 1. For pagination request, we push a `OpenSearchPagedIndexScanBuilder` instead of `OpenSearchIndexScanQueryBuilder` to the bottom of the tree. Both are instances of `TableScanBuilder` which extends `PhysicalPlan` interface. @@ -376,6 +417,7 @@ See [article about `TableScanBuilder`](query-optimizer-improvement.md#TableScanB ```mermaid classDiagram + direction LR class LogicalProject { <> } @@ -386,10 +428,13 @@ classDiagram LogicalProject --* OpenSearchPagedIndexScanBuilder ``` +Note: No Logical Plan tree created for Subsequent Query Request. + #### Physical Plan tree Changes: 1. `OpenSearchPagedIndexScanBuilder` is converted to `OpenSearchPagedIndexScan` by `Implementor`. +2. Entire Physical Plan tree is created by `PlanSerializer` for Subsequent Query requests. The deserialized tree has the same structure which Initial Query Request had. ```mermaid classDiagram From 60596e29881d31f60ed28564a8e8edf8e5e11601 Mon Sep 17 00:00:00 2001 From: Yury-Fridlyand Date: Mon, 8 May 2023 18:14:19 -0700 Subject: [PATCH 4/8] Address doc review comments. Signed-off-by: Yury-Fridlyand --- docs/dev/Pagination-v2.md | 709 +++++++++++++++++------- docs/dev/index.md | 1 + docs/dev/query-optimizer-improvement.md | 6 +- 3 files changed, 509 insertions(+), 207 deletions(-) diff --git a/docs/dev/Pagination-v2.md b/docs/dev/Pagination-v2.md index d6ed556604..e491ca03a1 100644 --- a/docs/dev/Pagination-v2.md +++ b/docs/dev/Pagination-v2.md @@ -111,228 +111,90 @@ The discussion below uses *under max_result_window* to refer to scenarios that c ## SQL Node Load Balancing V2 SQL engine supports *sql node load balancing* -- a cursor request can be routed to any SQL node in a cluster. This is achieved by encoding all data necessary to retrieve the next page in the `cursor_id`. -## Design Diagrams -New code workflows are highlighted. +## Feature Design -### Initial Query Request -```mermaid -sequenceDiagram - participant SQLService - participant QueryPlanFactory - participant CanPaginateVisitor - participant QueryService - participant Planner - participant CreatePagingTableScanBuilder - participant OpenSearchExecutionEngine - participant PlanSerializer - participant Physical Plan Tree +### Plan Tree changes -SQLService->>+QueryPlanFactory: execute - critical - QueryPlanFactory->>+CanPaginateVisitor: canConvertToCursor - CanPaginateVisitor-->>-QueryPlanFactory: true - end - QueryPlanFactory->>+QueryService: execute - QueryService->>+Planner: optimize - critical - Planner->>+CreatePagingTableScanBuilder: apply - CreatePagingTableScanBuilder-->>-Planner: paged index scan - end - Planner-->>-QueryService: Logical Plan Tree - QueryService->>+OpenSearchExecutionEngine: execute - Note over OpenSearchExecutionEngine: iterate result set - critical Serialization - OpenSearchExecutionEngine->>+PlanSerializer: convertToCursor - PlanSerializer-->>-OpenSearchExecutionEngine: cursor - end - critical - OpenSearchExecutionEngine->>+Physical Plan Tree: getTotalHits - Physical Plan Tree-->>-OpenSearchExecutionEngine: total hits - end - OpenSearchExecutionEngine-->>-QueryService: execution completed - QueryService-->>-QueryPlanFactory: execution completed - QueryPlanFactory-->>-SQLService: execution completed -``` +There are different plan trees are built during request processing. Read more about their purpose and stages [here](query-optimizer-improvement.md#Examples). Article below describes what changes are introduced in these trees by pagination feature. -### Second page -```mermaid -sequenceDiagram - participant SQLService - participant QueryPlanFactory - participant QueryService - participant OpenSearchExecutionEngine - participant PlanSerializer - participant Physical Plan Tree +Simplified workflow of plan trees is shown below. Initial Page Request is processed the same way as non-paging request. -SQLService->>+QueryPlanFactory: execute - QueryPlanFactory->>+QueryService: execute - critical Deserialization - QueryService->>+PlanSerializer: convertToPlan - PlanSerializer-->>-QueryService: Physical plan tree - end - Note over QueryService: Planner, Optimizer and Implementor
are skipped - QueryService->>+OpenSearchExecutionEngine: execute - Note over OpenSearchExecutionEngine: iterate result set - critical Serialization - OpenSearchExecutionEngine->>+PlanSerializer: convertToCursor - PlanSerializer-->>-OpenSearchExecutionEngine: cursor - end - critical - OpenSearchExecutionEngine->>+Physical Plan Tree: getTotalHits - Physical Plan Tree-->>-OpenSearchExecutionEngine: total hits - end - OpenSearchExecutionEngine-->>-QueryService: execution completed - QueryService-->>-QueryPlanFactory: execution completed - QueryPlanFactory-->>-SQLService: execution completed -``` -### Legacy Engine Fallback -```mermaid -sequenceDiagram - participant RestSQLQueryAction - participant Legacy Engine - participant SQLService - participant QueryPlanFactory - participant CanPaginateVisitor - -RestSQLQueryAction->>+SQLService: prepareRequest - SQLService->>+QueryPlanFactory: execute - critical V2 support check - QueryPlanFactory->>+CanPaginateVisitor: canConvertToCursor - CanPaginateVisitor-->>-QueryPlanFactory: false - QueryPlanFactory-->>-RestSQLQueryAction: UnsupportedCursorRequestException - deactivate SQLService - end - RestSQLQueryAction->>Legacy Engine: accept - Note over Legacy Engine: Processing in Legacy engine - Legacy Engine-->>RestSQLQueryAction:complete -``` +#TODO add serialize to next request -### Serialization ```mermaid -sequenceDiagram - participant PlanSerializer - participant ProjectOperator - participant ResourceMonitorPlan - participant OpenSearchPagedIndexScan - participant OpenSearchScrollRequest - participant ContinuePageRequest - -PlanSerializer->>+ProjectOperator: getPlanForSerialization - ProjectOperator-->>-PlanSerializer: this -PlanSerializer->>+ProjectOperator: serialize - Note over ProjectOperator: dump private fields - ProjectOperator->>+ResourceMonitorPlan: getPlanForSerialization - ResourceMonitorPlan-->>-ProjectOperator: delegate - Note over ResourceMonitorPlan: ResourceMonitorPlan
is not serialized - ProjectOperator->>+OpenSearchPagedIndexScan: serialize - alt First page - OpenSearchPagedIndexScan->>+OpenSearchScrollRequest: toCursor - OpenSearchScrollRequest-->>-OpenSearchPagedIndexScan: scroll ID - else Subsequent page - OpenSearchPagedIndexScan->>+ContinuePageRequest: toCursor - ContinuePageRequest-->>-OpenSearchPagedIndexScan: scroll ID - end - Note over OpenSearchPagedIndexScan: dump private fields - OpenSearchPagedIndexScan-->>-ProjectOperator: serialized - ProjectOperator-->>-PlanSerializer: serialized -Note over PlanSerializer: Zip to reduce size +stateDiagram-v2 + state "Non Paged Request" as NonPaged { + direction LR + state "Parse Tree" as Parse + state "Unresolved Plan Tree" as Unresolved + state "Abstract Plan Tree" as Abstract + state "Logical Plan Tree" as Logical + state "Optimized Logical Plan Tree" as Optimized + state "Physical Plan Tree" as Physical + + [*] --> Parse : ANTLR + Parse --> Unresolved : AstBuilder + Unresolved --> Abstract : QueryPlanner + Abstract --> Logical : Planner + Logical --> Optimized : Optimizer + Optimized --> Physical : Implementor + } ``` - -### Deserialization ```mermaid -sequenceDiagram - participant PlanSerializer - participant Deserialization Stream - participant ProjectOperator - participant OpenSearchPagedIndexScan - participant ContinuePageRequest - -Note over PlanSerializer: Unzip -Note over PlanSerializer: Validate cursor integrity -PlanSerializer->>+Deserialization Stream: deserialize - Deserialization Stream->>+ProjectOperator: create new - Note over ProjectOperator: load private fields - ProjectOperator-->>Deserialization Stream: deserialize input - activate Deserialization Stream - Deserialization Stream->>+OpenSearchPagedIndexScan: create new - deactivate Deserialization Stream - OpenSearchPagedIndexScan-->>+Deserialization Stream: resolve engine - Deserialization Stream->>-OpenSearchPagedIndexScan: OpenSearchStorageEngine - Note over OpenSearchPagedIndexScan: load private fields - OpenSearchPagedIndexScan->>+ContinuePageRequest: create new - ContinuePageRequest-->>-OpenSearchPagedIndexScan: created - OpenSearchPagedIndexScan-->>-ProjectOperator: deserialized - ProjectOperator-->>-PlanSerializer: deserialized - deactivate Deserialization Stream +stateDiagram-v2 + state "Initial Page Request" as Paged { + direction LR + state "Parse Tree" as Parse + state "Unresolved Plan Tree" as Unresolved + state "Abstract Plan Tree" as Abstract + state "Logical Plan Tree" as Logical + state "Optimized Logical Plan Tree" as Optimized + state "Physical Plan Tree" as Physical + + [*] --> Parse : ANTLR + Parse --> Unresolved : AstBuilder + Unresolved --> Abstract : QueryPlanner + Abstract --> Logical : Planner + Logical --> Optimized : Optimizer + Optimized --> Physical : Implementor + } ``` - -### Total Hits - -Total Hits is the number of rows matching the search criteria; with `select *` queries it is equal to row (doc) number in the table (index). -Example: -Paging thru `SELECT * FROM calcs` (17 rows) with `fetch_size = 5` returns: - -* Page 1: total hits = 17, result size = 5, cursor -* Page 2: total hits = 17, result size = 5, cursor -* Page 3: total hits = 17, result size = 5, cursor -* Page 4: total hits = 17, result size = 2, cursor -* Page 5: total hits = 0, result size = 0 - -Default implementation of `getTotalHits` in a Physical Plan iterate child plans down the tree and gets the maximum value or 0. - ```mermaid -sequenceDiagram - participant OpenSearchExecutionEngine - participant ProjectOperator - participant ResourceMonitorPlan - participant OpenSearchPagedIndexScan - -OpenSearchExecutionEngine->>+ProjectOperator: getTotalHits - Note over ProjectOperator: default implementation - ProjectOperator->>+ResourceMonitorPlan: getTotalHits - Note over ResourceMonitorPlan: call to delegate - ResourceMonitorPlan->>+OpenSearchPagedIndexScan: getTotalHits - Note over OpenSearchPagedIndexScan: use stored value from the search response - OpenSearchPagedIndexScan-->>-ResourceMonitorPlan: value - ResourceMonitorPlan-->>-ProjectOperator: value - ProjectOperator-->>-OpenSearchExecutionEngine: value +stateDiagram-v2 + state "Subsequent Page Request" as Paged { + direction LR + state "Abstract Plan Tree" as Abstract + state "Physical Plan Tree" as Physical + + [*] --> Abstract : QueryPlanner + Abstract --> Physical : Deserializer + } ``` -### Plan Tree changes - -There are different plan trees are built during request processing. Read more about their purpose and stages [here](query-optimizer-improvement.md#Examples). Article below describes what changes are introduced in these trees by pagination feature. +New plan tree workflow was added for Subsequent Page Requests. Since a final Physical Plan tree was already created for Initial request, subsequent requests should have the same tree. The tree is being serialized into `cursor` by `PlanSerializer` and then restored back. Query parsing and analysis is also not performed for cursor requests, so these steps are also not executed there. -Simplified workflow of plan trees is shown below. First (Initial) page request is processed the same way as non-paging request. - -```mermaid -classDiagram - direction LR - class Abstract Plan Tree - class Logical Plan Tree - class Optimized Logical Plan Tree - class Physical Plan Tree - - Abstract Plan Tree --> Logical Plan Tree : Planner - Logical Plan Tree --> Optimized Logical Plan Tree : Optimizer - Optimized Logical Plan Tree --> Physical Plan Tree : Implementor -``` +#### Abstract Plan tree -New plan tree workflow was added for subsequent page requests. Since a final Physical Plan tree was already created for Initial request, subsequent requests should have the same tree. The tree is being serialized into `cursor` by `PlanSerializer` and then restored back. +Abstract Plan Tree for non-paged requests remains unchanged by pagination feature. `QueryPlan` as a top entry got new optional field `pageSize`, which is unset for non-paged requests. ```mermaid classDiagram direction LR - class Abstract Plan Tree - class Physical Plan Tree - - Abstract Plan Tree --> Physical Plan Tree : deserialization + class QueryPlan { + <> + -Optional~int~ pageSize + -UnresolvedPlan plan + -QueryService queryService + } + class UnresolvedPlanTree { + <> + } + QueryPlan --* UnresolvedPlanTree ``` -#### Abstract Plan tree - -Changes to plan tree for Initial Query Request with pagination: +Abstract plan tree for Initial Query Request has following changes: 1. New Plan node -- `Paginate` -- added into the tree. -2. `QueryPlan` got new optional field: page size. When it is set, `Paginate` is being added. It is converted to `LogicalPaginate` later. For non-paging requests the tree remains unchanged. +2. `pageSize` parameter in `QueryPlan` is set, and `Paginate` is being added. It is converted to `LogicalPaginate` later. ```mermaid classDiagram @@ -340,9 +202,13 @@ classDiagram class QueryPlan { <> -Optional~int~ pageSize + -UnresolvedPlan plan + -QueryService queryService } class Paginate { <> + -int pageSize + -UnresolvedPlan child } class UnresolvedPlanTree { <> @@ -363,12 +229,51 @@ classDiagram -String cursor -PlanSerializer planSerializer -QueryService queryService - -ResponseListener queryResponseListener } ``` `ContinuePaginatedPlan` translated to entire Physical Plan Tree by `PlanSerializer` on cursor deserialization. It bypasses Logical Plan tree stage, `Planner`, `Optimizer` and `Implementor`. +The examples below show Abstract Plan Tree for the same query in different request types: + +```mermaid +stateDiagram-v2 + state "Non Paged Request" as NonPaged { + state "QueryPlan" as QueryPlanNP + state "Project" as ProjectNP + state "Limit" as LimitNP + state "Filter" as FilterNP + state "Aggregation" as AggregationNP + state "Relation" as RelationNP + + QueryPlanNP --> ProjectNP + ProjectNP --> LimitNP + LimitNP --> FilterNP + FilterNP --> AggregationNP + AggregationNP --> RelationNP + } + + state "Initial Query Request" as Paged { + state "QueryPlan" as QueryPlanIP + state "Project" as ProjectIP + state "Limit" as LimitIP + state "Filter" as FilterIP + state "Aggregation" as AggregationIP + state "Relation" as RelationIP + + Paginate --> QueryPlanIP + QueryPlanIP --> ProjectIP + ProjectIP --> LimitIP + LimitIP --> FilterIP + FilterIP --> AggregationIP + AggregationIP --> RelationIP + } + + state "Subsequent Query Request" As Sub { + ContinuePaginatedPlan + } +``` + #### Logical Plan tree Changes to plan tree for Initial Query Request with pagination: @@ -405,12 +310,44 @@ classDiagram LogicalPlanTree --* LogicalRelation ``` -Note: No Logical Plan tree created for Subsequent Query Request. +Note: This step is not executed for Subsequent Query Request. + +The examples below show Logical Plan Tree for the same query in different request types: + +```mermaid +stateDiagram-v2 + state "Non Paged Request" as NonPaged { + state "LogicalProject" as ProjectNP + state "LogicalLimit" as LimitNP + state "LogicalFilter" as FilterNP + state "LogicalAggregation" as AggregationNP + state "LogicalRelation" as RelationNP + + ProjectNP --> LimitNP + LimitNP --> FilterNP + FilterNP --> AggregationNP + AggregationNP --> RelationNP + } + + state "Initial Query Request" as Paged { + state "LogicalProject" as ProjectIP + state "LogicalLimit" as LimitIP + state "LogicalFilter" as FilterIP + state "LogicalAggregation" as AggregationIP + state "LogicalRelation" as RelationIP + + LogicalPaginate --> ProjectIP + ProjectIP --> LimitIP + LimitIP --> FilterIP + FilterIP --> AggregationIP + AggregationIP --> RelationIP + } +``` #### Optimized Logical Plan tree Changes: -1. For pagination request, we push a `OpenSearchPagedIndexScanBuilder` instead of `OpenSearchIndexScanQueryBuilder` to the bottom of the tree. Both are instances of `TableScanBuilder` which extends `PhysicalPlan` interface. +1. For pagination request, a `OpenSearchPagedIndexScanBuilder` is inserted to the bottom of the tree instead of `OpenSearchIndexScanQueryBuilder`. Both are instances of `TableScanBuilder` which extends `LogicalPlan` interface. 2. `LogicalPaginate` is removed from the tree during push down operation in `Optimizer`. See [article about `TableScanBuilder`](query-optimizer-improvement.md#TableScanBuilder) for more details. @@ -430,6 +367,33 @@ classDiagram Note: No Logical Plan tree created for Subsequent Query Request. +The examples below show optimized Logical Plan Tree for the same query in different request types: + +```mermaid +stateDiagram-v2 + state "Non Paged Request" as NonPaged { + state "LogicalProject" as ProjectNP + state "LogicalLimit" as LimitNP + state "LogicalSort" as SortNP + state "OpenSearchIndexScanQueryBuilder" as RelationNP + + ProjectNP --> LimitNP + LimitNP --> SortNP + SortNP --> RelationNP + } + + state "Initial Paged Request" as Paged { + state "LogicalProject" as ProjectIP + state "LogicalLimit" as LimitIP + state "LogicalSort" as SortIP + state "OpenSearchPagedIndexScanBuilder" as RelationIP + + ProjectIP --> LimitIP + LimitIP --> SortIP + SortIP --> RelationIP + } +``` + #### Physical Plan tree Changes: @@ -448,3 +412,340 @@ classDiagram ProjectOperator --* OpenSearchPagedIndexScan ``` + +The examples below show Physical Plan Tree for the same query in different request types: + +```mermaid +stateDiagram-v2 + state "Non Paged Request" as NonPaged { + state "ProjectOperator" as ProjectNP + state "LimitOperator" as LimitNP + state "SortOperator" as SortNP + state "OpenSearchIndexScan" as RelationNP + + ProjectNP --> LimitNP + LimitNP --> SortNP + SortNP --> RelationNP + } + + state "Initial Query Request" as Paged { + state "ProjectOperator" as ProjectIP + state "LimitOperator" as LimitIP + state "SortOperator" as SortIP + state "OpenSearchPagedIndexScan" as RelationIP + + ProjectIP --> LimitIP + LimitIP --> SortIP + SortIP --> RelationIP + } + + state "Subsequent Query Request" As Sub { + state "ProjectOperator" as ProjectSP + state "LimitOperator" as LimitSP + state "SortOperator" as SortSP + state "OpenSearchPagedIndexScan" as RelationSP + + ProjectSP --> LimitSP + LimitSP --> SortSP + SortSP --> RelationSP + } +``` + +### Architecture Diagrams + +New code workflows which added by Pagination feature are highlighted. + +#### Non Paging Query Request + +To simplify comparison and understanding, below represented processing a regular query request. This workflow wasn't changed by Pagination feature. + +```mermaid +sequenceDiagram + participant SQLService + participant QueryPlanFactory + participant QueryService + participant Planner + participant CreateTableScanBuilder + participant OpenSearchExecutionEngine + +SQLService ->>+ QueryPlanFactory: execute + QueryPlanFactory ->>+ QueryService: execute + QueryService ->>+ Planner: optimize + Planner ->>+ CreateTableScanBuilder: apply + CreateTableScanBuilder -->>- Planner: index scan + Planner -->>- QueryService: Logical Plan Tree + QueryService ->>+ OpenSearchExecutionEngine: execute + OpenSearchExecutionEngine -->>- QueryService: execution completed + QueryService -->>- QueryPlanFactory: execution completed + QueryPlanFactory -->>- SQLService: execution completed +``` + +#### Initial Query Request + +# **TODO add abstract** +# TODO add analyzer + +ANTLR creates ParseTree +AstBuilder creates UnresolvedPlan tree +QueryPlanFactory creates AbstractPlan tree +Analyzer creates LogicalPlan tree +Optimizer optimizes LogicalPlan tree and replaces LogicalRelation with a TableScanBuilder +Implementor creates a PhysicalPlan tree + +# pewpew + + +Processing of an Initial Query Request has few extra steps comparing versus processing a regular Query Request: +1. Query validation with `CanPaginateVisitor`. This is required to validate whether incoming query can be paged. This also activate legacy engine fallback mechanism. +2. Creating a paged index scan with `CreatePagingTableScanBuilder` `Optimizer` rule. A Regular Query Request triggers `CreateTableScanBuilder` rule instead. +3. `Serialization` is performed by `PlanSerializer` - it converts Physical Plan Tree into a cursor, which could be used query a next page. +4. Traversal of Physical Plan Tree to get total hits, which is required to properly fill response to a user. + +```mermaid +sequenceDiagram + participant SQLService + participant QueryPlanFactory + participant CanPaginateVisitor + participant QueryService + participant Planner + participant CreatePagingTableScanBuilder + participant OpenSearchExecutionEngine + participant PlanSerializer + +SQLService ->>+ QueryPlanFactory : execute + rect rgb(191, 223, 255) + QueryPlanFactory ->>+ CanPaginateVisitor : canConvertToCursor + CanPaginateVisitor -->>- QueryPlanFactory : true + end + QueryPlanFactory ->>+ QueryService : execute + QueryService ->>+ Planner : optimize + rect rgb(191, 223, 255) + Planner ->>+ CreatePagingTableScanBuilder : apply + CreatePagingTableScanBuilder -->>- Planner : paged index scan + end + Planner -->>- QueryService : Logical Plan Tree + QueryService ->>+ OpenSearchExecutionEngine : execute + rect rgb(191, 223, 255) + Note over OpenSearchExecutionEngine, PlanSerializer : Serialization + OpenSearchExecutionEngine ->>+ PlanSerializer : convertToCursor + PlanSerializer -->>- OpenSearchExecutionEngine : cursor + end + rect rgb(191, 223, 255) + Note over OpenSearchExecutionEngine : get total hits + end + OpenSearchExecutionEngine -->>- QueryService : execution completed + QueryService -->>- QueryPlanFactory : execution completed + QueryPlanFactory -->>- SQLService : execution completed +``` + +#### Subsequent Query Request + +Second and further pages processed by a completely new workflow. The key point there: +1. `Deserialization` is performed by `PlanSerializer` to restore entire Physical Plan Tree encoded into the cursor. +2. Since query already contains the Physical Plan Tree, all tree processing steps are skipped. +3. `Serialization` is performed by `PlanSerializer` - it converts Physical Plan Tree into a cursor, which could be used query a next page. +4. Traversal of Physical Plan Tree to get total hits, which is required to properly fill response to a user. + +```mermaid +sequenceDiagram + participant SQLService + participant QueryPlanFactory + participant QueryService + participant OpenSearchExecutionEngine + participant PlanSerializer + +SQLService ->>+ QueryPlanFactory : execute + QueryPlanFactory ->>+ QueryService : execute + rect rgb(191, 223, 255) + note over QueryService, PlanSerializer : Deserialization + QueryService ->>+ PlanSerializer: convertToPlan + PlanSerializer -->>- QueryService: Physical plan tree + end + Note over QueryService : Planner, Optimizer and Implementor
are skipped + QueryService ->>+ OpenSearchExecutionEngine : execute + rect rgb(191, 223, 255) + note over OpenSearchExecutionEngine, PlanSerializer : Serialization + OpenSearchExecutionEngine ->>+ PlanSerializer : convertToCursor + PlanSerializer -->>- OpenSearchExecutionEngine : cursor + end + rect rgb(191, 223, 255) + Note over OpenSearchExecutionEngine : get total hits + end + OpenSearchExecutionEngine -->>- QueryService: execution completed + QueryService -->>- QueryPlanFactory : execution completed + QueryPlanFactory -->>- SQLService : execution completed +``` + +#### Legacy Engine Fallback + +Since pagination in V2 engine supports less SQL features than pagination in legacy engine, a fallback mechanism is created to keep V1 engine features still available for the end user. Pagination fallback is backed by a new exception type which allows legacy engine to intersect execution of a request. + +```mermaid +sequenceDiagram + participant RestSQLQueryAction + participant Legacy Engine + participant SQLService + participant QueryPlanFactory + participant CanPaginateVisitor + +RestSQLQueryAction ->>+ SQLService : prepareRequest + SQLService ->>+ QueryPlanFactory : execute + rect rgb(191, 223, 255) + note over SQLService, CanPaginateVisitor : V2 support check + QueryPlanFactory ->>+ CanPaginateVisitor : canConvertToCursor + CanPaginateVisitor -->>- QueryPlanFactory : false + QueryPlanFactory -->>- RestSQLQueryAction : UnsupportedCursorRequestException + deactivate SQLService + end + RestSQLQueryAction ->> Legacy Engine: accept + Note over Legacy Engine : Processing in Legacy engine + Legacy Engine -->> RestSQLQueryAction : complete +``` + +#### Serialization and Deserialization round trip + +The SQL engine should be able to completely recover the Physical Plan tree to continue its execution to get the next page. Serialization mechanism is responsible of proper dumping the plan tree. `ResourceMonitorPlan` shouldn't be serialized, because a new object of this type would be created for the restored plan tree before execution. +Serialization and Deserialization are performed by Java object serialization API. + +## TODO describe ... + +```mermaid +stateDiagram-v2 + direction LR + state "Initial Query Request Plan Tree" as FirstPage + state FirstPage { + state "ProjectOperator" as logState1_1 + state "..." as logState1_2 + state "ResourceMonitorPlan" as logState1_3 + state "OpenSearchPagedIndexScan" as logState1_4 + state "OpenSearchScrollRequest" as logState1_5 + logState1_1 --> logState1_2 + logState1_2 --> logState1_3 + logState1_3 --> logState1_4 + logState1_4 --> logState1_5 + } + + state "Deserialized Plan Tree" as SecondPageTree + state SecondPageTree { + state "ProjectOperator" as logState2_1 + state "..." as logState2_2 + state "OpenSearchPagedIndexScan" as logState2_3 + state "ContinuePageRequest" as logState2_4 + logState2_1 --> logState2_2 + logState2_2 --> logState2_3 + logState2_3 --> logState2_4 + } + + state "Subsequent Query Request Plan Tree" as SecondPage + state SecondPage { + state "ProjectOperator" as logState3_1 + state "..." as logState3_2 + state "ResourceMonitorPlan" as logState3_3 + state "OpenSearchPagedIndexScan" as logState3_4 + state "ContinuePageRequest" as logState3_5 + logState3_1 --> logState3_2 + logState3_2 --> logState3_3 + logState3_3 --> logState3_4 + logState3_4 --> logState3_5 + } + + FirstPage --> SecondPageTree : Serialization and\nDeserialization + SecondPageTree --> SecondPage : Execution\nPreparation +``` + +#### Serialization + +All plan tree nodes which are supported by pagination should implement [`SerializablePlan`](https://github.com/opensearch-project/sql/blob/f40bb6d68241e76728737d88026e4c8b1e6b3b8b/core/src/main/java/org/opensearch/sql/planner/SerializablePlan.java) interface. `getPlanForSerialization` method of this interface allows serialization mechanism to skip a tree node from serialization. OpenSearch search request objects are not serialized, but search context provided by the OpenSearch cluster is extracted from them. + +```mermaid +sequenceDiagram + participant PlanSerializer + participant ProjectOperator + participant ResourceMonitorPlan + participant OpenSearchPagedIndexScan + participant OpenSearchScrollRequest + participant ContinuePageRequest + +PlanSerializer ->>+ ProjectOperator : getPlanForSerialization + ProjectOperator -->>- PlanSerializer : this +PlanSerializer ->>+ ProjectOperator : serialize + Note over ProjectOperator : dump private fields + ProjectOperator ->>+ ResourceMonitorPlan : getPlanForSerialization + ResourceMonitorPlan -->>- ProjectOperator : delegate + Note over ResourceMonitorPlan : ResourceMonitorPlan
is not serialized + ProjectOperator ->>+ OpenSearchPagedIndexScan : serialize + alt First page + OpenSearchPagedIndexScan ->>+ OpenSearchScrollRequest : toCursor + OpenSearchScrollRequest -->>- OpenSearchPagedIndexScan : serialized request + else Subsequent page + OpenSearchPagedIndexScan ->>+ ContinuePageRequest : toCursor + ContinuePageRequest -->>- OpenSearchPagedIndexScan : serialized request + end + Note over OpenSearchPagedIndexScan : dump private fields + OpenSearchPagedIndexScan -->>- ProjectOperator : serialized + ProjectOperator -->>- PlanSerializer : serialized +Note over PlanSerializer : Zip to reduce size +``` + +#### Deserialization + +Deserialization restores previously serialized Physical Plan tree. The recovered tree is ready to execute and should return the next page of the search response. To complete the tree restoration, SQL engine should build a new request to the OpenSearch node. This request doesn't contain a search query, but it contains a search context reference -- `scrollID`. To create a new `ContinuePageRequest` object it is require to access to the instance of `OpenSearchStorageEngine`. `OpenSearchStorageEngine` can't be serialized and it exists as a singletone in the SQL plugin engine. `PlanSerializer` creates a customized deserialization binary object stream -- `CursorDeserializationStream`; this stream provides an interface to access to the `OpenSearchStorageEngine` object. + +```mermaid +sequenceDiagram + participant PlanSerializer + participant CursorDeserializationStream + participant ProjectOperator + participant OpenSearchPagedIndexScan + participant ContinuePageRequest + +Note over PlanSerializer : Unzip +Note over PlanSerializer : Validate cursor integrity +PlanSerializer ->>+ CursorDeserializationStream : deserialize + CursorDeserializationStream ->>+ ProjectOperator : create new + Note over ProjectOperator: load private fields + ProjectOperator -->> CursorDeserializationStream : deserialize input + activate CursorDeserializationStream + CursorDeserializationStream ->>+ OpenSearchPagedIndexScan : create new + deactivate CursorDeserializationStream + OpenSearchPagedIndexScan -->>+ CursorDeserializationStream : resolve engine + CursorDeserializationStream ->>- OpenSearchPagedIndexScan : OpenSearchStorageEngine + Note over OpenSearchPagedIndexScan : load private fields + OpenSearchPagedIndexScan ->>+ ContinuePageRequest : create new + ContinuePageRequest -->>- OpenSearchPagedIndexScan : created + OpenSearchPagedIndexScan -->>- ProjectOperator : deserialized + ProjectOperator -->>- PlanSerializer : deserialized + deactivate CursorDeserializationStream +``` + +#### Total Hits + +Total Hits is the number of rows matching the search criteria; with `select *` queries it is equal to row (doc) number in the table (index). +Example: +Paging thru `SELECT * FROM calcs` (17 rows) with `fetch_size = 5` returns: + +* Page 1: total hits = 17, result size = 5, cursor +* Page 2: total hits = 17, result size = 5, cursor +* Page 3: total hits = 17, result size = 5, cursor +* Page 4: total hits = 17, result size = 2, cursor +* Page 5: total hits = 0, result size = 0 + +Default implementation of `getTotalHits` in a Physical Plan iterate child plans down the tree and gets the maximum value or 0. + +```mermaid +sequenceDiagram + participant OpenSearchExecutionEngine + participant ProjectOperator + participant ResourceMonitorPlan + participant OpenSearchPagedIndexScan + +OpenSearchExecutionEngine ->>+ ProjectOperator: getTotalHits + Note over ProjectOperator: default implementation + ProjectOperator ->>+ ResourceMonitorPlan: getTotalHits + Note over ResourceMonitorPlan: call to delegate + ResourceMonitorPlan ->>+ OpenSearchPagedIndexScan: getTotalHits + Note over OpenSearchPagedIndexScan: use stored value from the search response + OpenSearchPagedIndexScan -->>- ResourceMonitorPlan: value + ResourceMonitorPlan -->>- ProjectOperator: value + ProjectOperator -->>- OpenSearchExecutionEngine: value +``` diff --git a/docs/dev/index.md b/docs/dev/index.md index bff1d9c7bb..c69fa3c164 100644 --- a/docs/dev/index.md +++ b/docs/dev/index.md @@ -55,6 +55,7 @@ + [Relevancy Search](opensearch-relevancy-search.md): OpenSearch relevancy search functions + [Sub Queries](opensearch-nested-field-subquery.md): support sub queries on OpenSearch nested field + [Pagination](opensearch-pagination.md): pagination implementation by OpenSearch scroll API + + [Pagination in V2](Pagination-v2.md): pagination implementation in V2 engine + [Prometheus](datasource-prometheus.md): Prometheus query federation + **File System** + [Querying S3](datasource-query-s3.md): S3 query federation proposal diff --git a/docs/dev/query-optimizer-improvement.md b/docs/dev/query-optimizer-improvement.md index 4c59cdc378..8d4e0634cb 100644 --- a/docs/dev/query-optimizer-improvement.md +++ b/docs/dev/query-optimizer-improvement.md @@ -6,7 +6,7 @@ This section introduces the current architecture of logical optimizer and physic Currently each storage engine adds its own logical operator as concrete implementation for `TableScanOperator` abstraction. Typically each data source needs to add 2 logical operators for table scan with and without aggregation. Take OpenSearch for example, there are `OpenSearchLogicalIndexScan` and `OpenSearchLogicalIndexAgg` and a bunch of pushdown optimization rules for each accordingly. -``` +```py class LogicalPlanOptimizer: /* * OpenSearch rules include: @@ -38,7 +38,7 @@ class LogicalPlanOptimizer: After logical transformation, planner will let the `Table` in `LogicalRelation` (identified before logical transformation above) transform the logical plan to physical plan. -``` +```py class OpenSearchIndex: def implement(plan: LogicalPlan): @@ -58,7 +58,7 @@ The current planning architecture causes 2 serious problems: 1. Each data source adds special logical operator and explode the optimizer rule space. For example, Prometheus also has `PrometheusLogicalMetricAgg` and `PrometheusLogicalMetricScan` accordingly. They have the exactly same pattern to match query plan tree as OpenSearch. 2. A bigger problem is the difficulty of transforming from logical to physical when there are 2 `Table`s in query plan. Because only 1 of them has the chance to do the `implement()`. This is a blocker for supporting `INSERT ... SELECT ...` statement or JOIN query. See code below. -``` +```java public PhysicalPlan plan(LogicalPlan plan) { Table table = findTable(plan); if (table == null) { From 3a4f808c15f23d54200ce2a4003aed8c4d7cded5 Mon Sep 17 00:00:00 2001 From: Yury-Fridlyand Date: Mon, 8 May 2023 19:53:06 -0700 Subject: [PATCH 5/8] Clean up docs. Signed-off-by: Yury-Fridlyand --- docs/dev/Pagination-v2.md | 19 ++----------------- 1 file changed, 2 insertions(+), 17 deletions(-) diff --git a/docs/dev/Pagination-v2.md b/docs/dev/Pagination-v2.md index e491ca03a1..ba80ffe317 100644 --- a/docs/dev/Pagination-v2.md +++ b/docs/dev/Pagination-v2.md @@ -15,7 +15,7 @@ https://user-images.githubusercontent.com/88679692/224208630-8d38d833-abf8-4035- # REST API ## Initial Query Request -Initial query request contains the search request and page size. It can't be changed later while scrolling through pages issued by this request. Search query to OpenSearch engine is built during processing the initial request. +Initial query request contains the search request and page size. Search query to OpenSearch is built during processing of this request. Neither the query nor page size can be change while scrolling through pages based on this request. ```json POST /_plugins/_sql @@ -73,7 +73,7 @@ This timer is reset every time a page is retrieved. The client will receive an [error response](#error-response) if it sends a cursor request for an expired cursor. -Keep alive timeout is [configurable](../user/admin/settings.rst#plugins.sql.cursor.keep_alive) by setting `plugins.sql.cursor.keep_alive` and has default value 1 minute. +Keep alive timeout is [configurable](../user/admin/settings.rst#plugins.sql.cursor.keep_alive) by setting `plugins.sql.cursor.keep_alive` and has default value of 1 minute. ## Error Response @@ -119,8 +119,6 @@ There are different plan trees are built during request processing. Read more ab Simplified workflow of plan trees is shown below. Initial Page Request is processed the same way as non-paging request. -#TODO add serialize to next request - ```mermaid stateDiagram-v2 state "Non Paged Request" as NonPaged { @@ -482,19 +480,6 @@ SQLService ->>+ QueryPlanFactory: execute #### Initial Query Request -# **TODO add abstract** -# TODO add analyzer - -ANTLR creates ParseTree -AstBuilder creates UnresolvedPlan tree -QueryPlanFactory creates AbstractPlan tree -Analyzer creates LogicalPlan tree -Optimizer optimizes LogicalPlan tree and replaces LogicalRelation with a TableScanBuilder -Implementor creates a PhysicalPlan tree - -# pewpew - - Processing of an Initial Query Request has few extra steps comparing versus processing a regular Query Request: 1. Query validation with `CanPaginateVisitor`. This is required to validate whether incoming query can be paged. This also activate legacy engine fallback mechanism. 2. Creating a paged index scan with `CreatePagingTableScanBuilder` `Optimizer` rule. A Regular Query Request triggers `CreateTableScanBuilder` rule instead. From cc5c63f8f7d8277eb2171328148b016735467534 Mon Sep 17 00:00:00 2001 From: Yury-Fridlyand Date: Thu, 11 May 2023 20:29:23 -0700 Subject: [PATCH 6/8] Apply suggestions from code review Co-authored-by: Andrew Carbonetto Signed-off-by: Yury-Fridlyand --- docs/dev/Pagination-v2.md | 14 +++++++------- docs/dev/query-optimizer-improvement.md | 2 +- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/dev/Pagination-v2.md b/docs/dev/Pagination-v2.md index ba80ffe317..b75c408d8d 100644 --- a/docs/dev/Pagination-v2.md +++ b/docs/dev/Pagination-v2.md @@ -61,7 +61,7 @@ Similarly to v1 engine, the response object is the same as initial response if t ## End of scrolling/paging -When scrolling is finished, SQL plugin still returns a cursor. This cursor leads to the final page, which as no cursor and no data. Receiving that page means all data was properly queried and returned to user. Cursor is closed automatically on processing that page. +When scrolling is finished, SQL plugin returns a final cursor. This cursor leads to an empty page, which has no cursor and no data hits. Receiving that page means all data was properly queried, and the scrolling cursor has been closed. The client will receive an [error response](#error-response) if executing this request results in an OpenSearch or SQL plug-in error. @@ -115,9 +115,9 @@ V2 SQL engine supports *sql node load balancing* -- a cursor request can be rout ### Plan Tree changes -There are different plan trees are built during request processing. Read more about their purpose and stages [here](query-optimizer-improvement.md#Examples). Article below describes what changes are introduced in these trees by pagination feature. +Different plan trees are built during request processing. Read more about their purpose and stages [here](query-optimizer-improvement.md#Examples). The section below describes the changes being introduced to these trees by the pagination feature. -Simplified workflow of plan trees is shown below. Initial Page Request is processed the same way as non-paging request. +Simplified workflow of plan trees is shown below. Initial Page Request is processed the same way as a non-paging request. ```mermaid stateDiagram-v2 @@ -169,11 +169,11 @@ stateDiagram-v2 } ``` -New plan tree workflow was added for Subsequent Page Requests. Since a final Physical Plan tree was already created for Initial request, subsequent requests should have the same tree. The tree is being serialized into `cursor` by `PlanSerializer` and then restored back. Query parsing and analysis is also not performed for cursor requests, so these steps are also not executed there. +New plan tree workflow was added for Subsequent Page Requests. Since a final Physical Plan tree was already created for Initial request, subsequent requests should have the same tree. The tree is serialized into a `cursor` by `PlanSerializer` to be de-serialized on the subsequence Page Request. Query parsing and analysis is not performed for Subsequent Page Requests, since the Physical Plan tree is instead de-serialized from the `cursor`. #### Abstract Plan tree -Abstract Plan Tree for non-paged requests remains unchanged by pagination feature. `QueryPlan` as a top entry got new optional field `pageSize`, which is unset for non-paged requests. +Abstract Plan Tree for non-paged requests remains unchanged. The `QueryPlan`, as a top entry`, has a new optional field `pageSize`, which is not defined for non-paged requests. ```mermaid classDiagram @@ -396,7 +396,7 @@ stateDiagram-v2 Changes: 1. `OpenSearchPagedIndexScanBuilder` is converted to `OpenSearchPagedIndexScan` by `Implementor`. -2. Entire Physical Plan tree is created by `PlanSerializer` for Subsequent Query requests. The deserialized tree has the same structure which Initial Query Request had. +2. Entire Physical Plan tree is created by `PlanSerializer` for Subsequent Query requests. The deserialized tree has the same structure as the Initial Query Request. ```mermaid classDiagram @@ -455,7 +455,7 @@ New code workflows which added by Pagination feature are highlighted. #### Non Paging Query Request -To simplify comparison and understanding, below represented processing a regular query request. This workflow wasn't changed by Pagination feature. +A non-paging request sequence diagram is shown below for comparison: ```mermaid sequenceDiagram diff --git a/docs/dev/query-optimizer-improvement.md b/docs/dev/query-optimizer-improvement.md index 8d4e0634cb..7503087562 100644 --- a/docs/dev/query-optimizer-improvement.md +++ b/docs/dev/query-optimizer-improvement.md @@ -4,7 +4,7 @@ This section introduces the current architecture of logical optimizer and physic #### Logical-to-Logical Optimization -Currently each storage engine adds its own logical operator as concrete implementation for `TableScanOperator` abstraction. Typically each data source needs to add 2 logical operators for table scan with and without aggregation. Take OpenSearch for example, there are `OpenSearchLogicalIndexScan` and `OpenSearchLogicalIndexAgg` and a bunch of pushdown optimization rules for each accordingly. +Currently each storage engine adds its own logical operator as concrete implementation for `TableScanOperator` abstraction. Typically each data source needs to add 2 logical operators for table scan with without aggregation. Take OpenSearch for example, there are `OpenSearchLogicalIndexScan` and `OpenSearchLogicalIndexAgg` and a bunch of pushdown optimization rules for each accordingly. ```py class LogicalPlanOptimizer: From 8910bbe884903dc84df518ceb07c0b595345ef8e Mon Sep 17 00:00:00 2001 From: Yury-Fridlyand Date: Thu, 11 May 2023 20:38:18 -0700 Subject: [PATCH 7/8] Apply suggestions from code review Co-authored-by: Andrew Carbonetto Signed-off-by: Yury-Fridlyand --- docs/dev/Pagination-v2.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/dev/Pagination-v2.md b/docs/dev/Pagination-v2.md index b75c408d8d..420b279448 100644 --- a/docs/dev/Pagination-v2.md +++ b/docs/dev/Pagination-v2.md @@ -525,7 +525,7 @@ SQLService ->>+ QueryPlanFactory : execute #### Subsequent Query Request -Second and further pages processed by a completely new workflow. The key point there: +Subsequent pages are processed by a new workflow. The key point there: 1. `Deserialization` is performed by `PlanSerializer` to restore entire Physical Plan Tree encoded into the cursor. 2. Since query already contains the Physical Plan Tree, all tree processing steps are skipped. 3. `Serialization` is performed by `PlanSerializer` - it converts Physical Plan Tree into a cursor, which could be used query a next page. @@ -563,7 +563,7 @@ SQLService ->>+ QueryPlanFactory : execute #### Legacy Engine Fallback -Since pagination in V2 engine supports less SQL features than pagination in legacy engine, a fallback mechanism is created to keep V1 engine features still available for the end user. Pagination fallback is backed by a new exception type which allows legacy engine to intersect execution of a request. +Since pagination in V2 engine supports fewer SQL commands than pagination in legacy engine, a fallback mechanism is created to keep V1 engine features still available for the end user. Pagination fallback is backed by a new exception type which allows legacy engine to intersect execution of a request. ```mermaid sequenceDiagram @@ -589,7 +589,7 @@ RestSQLQueryAction ->>+ SQLService : prepareRequest #### Serialization and Deserialization round trip -The SQL engine should be able to completely recover the Physical Plan tree to continue its execution to get the next page. Serialization mechanism is responsible of proper dumping the plan tree. `ResourceMonitorPlan` shouldn't be serialized, because a new object of this type would be created for the restored plan tree before execution. +The SQL engine should be able to completely recover the Physical Plan tree to continue its execution to get the next page. Serialization mechanism is responsible for recovering the plan tree. note: `ResourceMonitorPlan` isn't serialized, because a new object of this type would be created for the restored plan tree before execution. Serialization and Deserialization are performed by Java object serialization API. ## TODO describe ... @@ -674,7 +674,7 @@ Note over PlanSerializer : Zip to reduce size #### Deserialization -Deserialization restores previously serialized Physical Plan tree. The recovered tree is ready to execute and should return the next page of the search response. To complete the tree restoration, SQL engine should build a new request to the OpenSearch node. This request doesn't contain a search query, but it contains a search context reference -- `scrollID`. To create a new `ContinuePageRequest` object it is require to access to the instance of `OpenSearchStorageEngine`. `OpenSearchStorageEngine` can't be serialized and it exists as a singletone in the SQL plugin engine. `PlanSerializer` creates a customized deserialization binary object stream -- `CursorDeserializationStream`; this stream provides an interface to access to the `OpenSearchStorageEngine` object. +Deserialization restores previously serialized Physical Plan tree. The recovered tree is ready to execute and should return the next page of the search response. To complete the tree restoration, SQL engine should build a new request to the OpenSearch node. This request doesn't contain a search query, but it contains a search context reference -- `scrollID`. To create a new `ContinuePageRequest` object it is require to access to the instance of `OpenSearchStorageEngine`. `OpenSearchStorageEngine` can't be serialized and it exists as a singleton in the SQL plugin engine. `PlanSerializer` creates a customized deserialization binary object stream -- `CursorDeserializationStream`. This stream provides an interface to access the `OpenSearchStorageEngine` object. ```mermaid sequenceDiagram From d3888ceafe126b2d949bf805312fb8b81ce92ad0 Mon Sep 17 00:00:00 2001 From: Yury-Fridlyand Date: Thu, 11 May 2023 21:03:37 -0700 Subject: [PATCH 8/8] Minor fixes. Signed-off-by: Yury-Fridlyand --- docs/dev/Pagination-v2.md | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/docs/dev/Pagination-v2.md b/docs/dev/Pagination-v2.md index 420b279448..77694c2510 100644 --- a/docs/dev/Pagination-v2.md +++ b/docs/dev/Pagination-v2.md @@ -16,6 +16,7 @@ https://user-images.githubusercontent.com/88679692/224208630-8d38d833-abf8-4035- ## Initial Query Request Initial query request contains the search request and page size. Search query to OpenSearch is built during processing of this request. Neither the query nor page size can be change while scrolling through pages based on this request. +The only difference between paged and non-paged requests is `fetch_size` parameter supplied in paged request. ```json POST /_plugins/_sql @@ -42,12 +43,13 @@ Response: If `query` is a DML statement then pagination does not apply, the `fetch_size` parameter is ignored and a cursor is not created. This is existing behaviour in v1 engine. The client receives an [error response](#error-response) if: -- `fetch_size` is not a positive integer, or -- evaluating `query` results in a server-side error. +- `fetch_size` is not a positive integer +- evaluating `query` results in a server-side error +- `fetch_size` is bigger than `max_window_size` cluster-wide parameter. -## Next Page Request +## Subsequent Query Request -Subsequent page request contains a cursor only. +Subsequent query request contains a cursor only. ```json POST /_plugins/_sql @@ -87,7 +89,7 @@ The response object has the following format: "reason": "", "type": "" }, - "status": "" + "status": } ``` @@ -109,6 +111,7 @@ Efficient implementation of pagination needs to be aware of retrival API used. E The discussion below uses *under max_result_window* to refer to scenarios that can be implemented with simple retrieval API and *over max_result_window* for scenarios that require scroll API to implement. ## SQL Node Load Balancing + V2 SQL engine supports *sql node load balancing* -- a cursor request can be routed to any SQL node in a cluster. This is achieved by encoding all data necessary to retrieve the next page in the `cursor_id`. ## Feature Design @@ -173,7 +176,7 @@ New plan tree workflow was added for Subsequent Page Requests. Since a final Phy #### Abstract Plan tree -Abstract Plan Tree for non-paged requests remains unchanged. The `QueryPlan`, as a top entry`, has a new optional field `pageSize`, which is not defined for non-paged requests. +Abstract Plan Tree for non-paged requests remains unchanged. The `QueryPlan`, as a top entry, has a new optional field `pageSize`, which is not defined for non-paged requests. ```mermaid classDiagram @@ -498,24 +501,24 @@ sequenceDiagram participant PlanSerializer SQLService ->>+ QueryPlanFactory : execute - rect rgb(191, 223, 255) + rect rgb(91, 123, 155) QueryPlanFactory ->>+ CanPaginateVisitor : canConvertToCursor CanPaginateVisitor -->>- QueryPlanFactory : true end QueryPlanFactory ->>+ QueryService : execute QueryService ->>+ Planner : optimize - rect rgb(191, 223, 255) + rect rgb(91, 123, 155) Planner ->>+ CreatePagingTableScanBuilder : apply CreatePagingTableScanBuilder -->>- Planner : paged index scan end Planner -->>- QueryService : Logical Plan Tree QueryService ->>+ OpenSearchExecutionEngine : execute - rect rgb(191, 223, 255) + rect rgb(91, 123, 155) Note over OpenSearchExecutionEngine, PlanSerializer : Serialization OpenSearchExecutionEngine ->>+ PlanSerializer : convertToCursor PlanSerializer -->>- OpenSearchExecutionEngine : cursor end - rect rgb(191, 223, 255) + rect rgb(91, 123, 155) Note over OpenSearchExecutionEngine : get total hits end OpenSearchExecutionEngine -->>- QueryService : execution completed @@ -541,19 +544,19 @@ sequenceDiagram SQLService ->>+ QueryPlanFactory : execute QueryPlanFactory ->>+ QueryService : execute - rect rgb(191, 223, 255) + rect rgb(91, 123, 155) note over QueryService, PlanSerializer : Deserialization QueryService ->>+ PlanSerializer: convertToPlan PlanSerializer -->>- QueryService: Physical plan tree end Note over QueryService : Planner, Optimizer and Implementor
are skipped QueryService ->>+ OpenSearchExecutionEngine : execute - rect rgb(191, 223, 255) + rect rgb(91, 123, 155) note over OpenSearchExecutionEngine, PlanSerializer : Serialization OpenSearchExecutionEngine ->>+ PlanSerializer : convertToCursor PlanSerializer -->>- OpenSearchExecutionEngine : cursor end - rect rgb(191, 223, 255) + rect rgb(91, 123, 155) Note over OpenSearchExecutionEngine : get total hits end OpenSearchExecutionEngine -->>- QueryService: execution completed @@ -575,7 +578,7 @@ sequenceDiagram RestSQLQueryAction ->>+ SQLService : prepareRequest SQLService ->>+ QueryPlanFactory : execute - rect rgb(191, 223, 255) + rect rgb(91, 123, 155) note over SQLService, CanPaginateVisitor : V2 support check QueryPlanFactory ->>+ CanPaginateVisitor : canConvertToCursor CanPaginateVisitor -->>- QueryPlanFactory : false @@ -592,8 +595,6 @@ RestSQLQueryAction ->>+ SQLService : prepareRequest The SQL engine should be able to completely recover the Physical Plan tree to continue its execution to get the next page. Serialization mechanism is responsible for recovering the plan tree. note: `ResourceMonitorPlan` isn't serialized, because a new object of this type would be created for the restored plan tree before execution. Serialization and Deserialization are performed by Java object serialization API. -## TODO describe ... - ```mermaid stateDiagram-v2 direction LR