Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs for pagination. #1592

Merged
merged 8 commits into from
May 17, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
142 changes: 130 additions & 12 deletions docs/dev/Pagination-v2.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ A cursor is a SQL abstraction for pagination. A client can open a cursor, retrie

Currently, SQL plugin does not provide SQL cursor syntax. However, the SQL REST endpoint can return result a page at a time. This feature is used by JDBC and ODBC drivers.


# Scope
Currenty, V2 engine supports pagination only for simple `SELECT * FROM <table>` queries without any other clauses like `WHERE` or `ORDER BY`.

Expand All @@ -15,6 +14,9 @@ https://user-images.githubusercontent.com/88679692/224208630-8d38d833-abf8-4035-

# REST API
## Initial Query Request

Initial query request contains the search request and page size. It can't be changed later while scrolling through pages issued by this request. Search query to OpenSearch engine is built during processing the initial request.
Yury-Fridlyand marked this conversation as resolved.
Show resolved Hide resolved

```json
POST /_plugins/_sql
{
Expand All @@ -26,24 +28,27 @@ POST /_plugins/_sql
Response:
```json
{
"cursor": /* cursor_id */,
"cursor": "<cursor_id>",
"datarows": [
// ...
...
],
"schema" : [
// ...
...
]
}
```
`query` is a DQL statement. `fetch_size` is a positive integer, indicating number of rows to return in each page.

If `query` is a DML statement then pagination does not apply, the `fetch_size` parameter is ignored and a cursor is not created. This is existing behaviour in v1 engine.

The client receives an (error response](#error-response) if:
The client receives an [error response](#error-response) if:
- `fetch_size` is not a positive integer, or
- evaluating `query` results in a server-side error.

## Next Page Request

Subsequent page request contains a cursor only.

```json
POST /_plugins/_sql
{
Expand All @@ -54,29 +59,35 @@ Similarly to v1 engine, the response object is the same as initial response if t

`cursor_id` will be different with each request.

If this is the last page, the `cursor` property is ommitted. The cursor is closed automatically.
## End of scrolling/paging

When scrolling is finished, SQL plugin still returns a cursor. This cursor leads to the final page, which as no cursor and no data. Receiving that page means all data was properly queried and returned to user. Cursor is closed automatically on processing that page.
Yury-Fridlyand marked this conversation as resolved.
Show resolved Hide resolved

The client will receive an [error response](#error-response) if executing this request results in an OpenSearch or SQL plug-in error.

## Cursor Keep Alive Timeout
Each cursor has a keep alive timer associated with it. When the timer runs out, the cursor is closed by OpenSearch.

Each cursor has a keep alive timer associated with it. When the timer runs out, the cursor is automatically closed by OpenSearch.

This timer is reset every time a page is retrieved.

The client will receive an [error response](#error-response) if it sends a cursor request for an expired cursor.

Keep alive timeout is [configurable](../user/admin/settings.rst#plugins.sql.cursor.keep_alive) by setting `plugins.sql.cursor.keep_alive` and has default value 1 minute.
Yury-Fridlyand marked this conversation as resolved.
Show resolved Hide resolved

## Error Response

The client will receive an error response if any of the above REST calls result in an server-side error.

The response object has the following format:
```json
{
"error": {
"details": <string>,
"reason": <string>,
"type": <string>
"details": "<string>",
"reason": "<string>",
"type": "<string>"
},
"status": <integer>
"status": "<integer>"
Yury-Fridlyand marked this conversation as resolved.
Show resolved Hide resolved
}
```

Expand All @@ -103,7 +114,7 @@ V2 SQL engine supports *sql node load balancing* -- a cursor request can be rout
## Design Diagrams
New code workflows are highlighted.

Yury-Fridlyand marked this conversation as resolved.
Show resolved Hide resolved
### First page
### Initial Query Request
```mermaid
sequenceDiagram
participant SQLService
Expand Down Expand Up @@ -237,6 +248,7 @@ sequenceDiagram
participant ContinuePageRequest

Note over PlanSerializer: Unzip
Note over PlanSerializer: Validate cursor integrity
PlanSerializer->>+Deserialization Stream: deserialize
Deserialization Stream->>+ProjectOperator: create new
Note over ProjectOperator: load private fields
Expand Down Expand Up @@ -285,3 +297,109 @@ OpenSearchExecutionEngine->>+ProjectOperator: getTotalHits
ResourceMonitorPlan-->>-ProjectOperator: value
ProjectOperator-->>-OpenSearchExecutionEngine: value
```

### Plan Tree changes
Yury-Fridlyand marked this conversation as resolved.
Show resolved Hide resolved

There are different plan trees are built during request processing. See more about their purpose and stages [here](query-optimizer-improvement.md#Examples). Article below describes what changes are introduced in these trees by pagination feature.

#### Abstract Plan tree
Yury-Fridlyand marked this conversation as resolved.
Show resolved Hide resolved

Changes to plan tree for Initial Query Request with pagination:
1. New Plan node -- `Paginate` -- added into the tree.
2. `QueryPlan` got new optional field: page size. When it is set, `Paginate` is being added. It is converted to `LogicalPaginate` later. For non-paging requests the tree remains unchanged.
Yury-Fridlyand marked this conversation as resolved.
Show resolved Hide resolved

```mermaid
classDiagram
direction LR
class QueryPlan {
<<AbstractPlan>>
-Optional~int~ pageSize
}
class Paginate {
<<UnresolvedPlan>>
}
class UnresolvedPlanTree {
<<UnresolvedPlan>>
}
QueryPlan --* Paginate
Paginate --* UnresolvedPlanTree
```

Non-paging requests have the same plan tree, but `pageSize` value in `QueryPlan` is unset.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to say this twice?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find the second time, where is it?


TODO:
Add graph for `ContinuePaginatedPlan`.

#### Logical Plan tree

Changes to plan tree for Initial Query Request with pagination:
1. `LogicalPaginate` is added to the top of the tree. It stores information about paging/scrolling should be done in a private field `pageSize` being pushed down in the `Optimizer`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the 1. for?

Copy link
Collaborator Author

@Yury-Fridlyand Yury-Fridlyand May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a list of changes. Luckily, it has only one entry.


```mermaid
classDiagram
direction LR
class LogicalPaginate {
<<LogicalPlan>>
int pageSize
}
class LogicalPlanTree {
<<LogicalPlan>>
}
class LogicalRelation {
<<LogicalPlan>>
}
LogicalPaginate --* LogicalPlanTree
LogicalPlanTree --* LogicalRelation
```

There are no changes for non-paging requests.

```mermaid
classDiagram
direction LR
class LogicalPlanTree {
<<LogicalPlan>>
}
class LogicalRelation {
<<LogicalPlan>>
}
LogicalPlanTree --* LogicalRelation
```

#### Optimized Plan tree

Changes:
1. For pagination request, we push a `OpenSearchPagedIndexScanBuilder` instead of `OpenSearchIndexScanQueryBuilder` to the bottom of the tree. Both are instances of `TableScanBuilder` which extends `PhysicalPlan` interface.
2. `LogicalPaginate` is removed from the tree during push down operation in `Optimizer`.

See [article about `TableScanBuilder`](query-optimizer-improvement.md#TableScanBuilder) for more details.

```mermaid
classDiagram
class LogicalProject {
<<LogicalPlan>>
}
class OpenSearchPagedIndexScanBuilder {
<<TableScanBuilder>>
}

LogicalProject --* OpenSearchPagedIndexScanBuilder
```

#### Physical Plan tree

Changes:
1. `OpenSearchPagedIndexScanBuilder` is converted to `OpenSearchPagedIndexScan` by `Implementor`.
Yury-Fridlyand marked this conversation as resolved.
Show resolved Hide resolved

```mermaid
classDiagram
direction LR
class ProjectOperator {
<<PhysicalPlan>>
}
class OpenSearchPagedIndexScan {
<<TableScanOperator>>
}

ProjectOperator --* OpenSearchPagedIndexScan
```
2 changes: 1 addition & 1 deletion docs/dev/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
+ [Semantic Analysis](query-semantic-analysis.md): performs semantic analysis to ensure semantic correctness
+ [Type Conversion](query-type-conversion.md): implement implicit data type conversion
+ **Query Planning**
+ [Logical Optimization](query-optimizier-improvement.md): improvement on logical optimizer and physical implementer
+ [Logical Optimization](query-optimizer-improvement.md): improvement on logical optimizer and physical implementer
Yury-Fridlyand marked this conversation as resolved.
Show resolved Hide resolved
+ **Query Execution**
+ [Query Manager](query-manager.md): query management
+ **Query Acceleration**
Expand Down
Loading