Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: replace the scroll with PIT for data batching #61873

Closed
bpintea opened this issue Sep 2, 2020 · 7 comments · Fixed by #83381
Closed

SQL: replace the scroll with PIT for data batching #61873

bpintea opened this issue Sep 2, 2020 · 7 comments · Fixed by #83381
Labels
:Analytics/SQL SQL querying >enhancement Team:QL (Deprecated) Meta label for query languages team

Comments

@bpintea
Copy link
Contributor

bpintea commented Sep 2, 2020

Since the scroll functionality replacement is underway, the SQL plugin should move away from it, replacing it with the newly added point-in-time (PIT) API.

@bpintea bpintea self-assigned this Sep 2, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (:Query Languages/SQL)

@elasticmachine elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Sep 2, 2020
@jimczi
Copy link
Contributor

jimczi commented Sep 2, 2020

I wonder if this should be an option in the SQL request like we have for _search. I always disliked the fact that SQL uses a persistent scroll for pagination. That could be an opt-in but imposing a scroll or a PIT that you need to close explicitly is not consistent with _search. The question is different for SQL queries that use multiple requests internally (group_by) since they could benefit from using a PIT that they could close when the request is fulfilled.

@costin
Copy link
Member

costin commented Sep 3, 2020

The main reason why SQL relies on scroll is that by default, the vast majority of queries require pagination since the result set is too large.
Having a session-like (open/query/close) and stateless (run query/consume results) is appealing though we need to figure out the workflow in the later case.
One option for stateless queries would be to limit them to return only one page (and thus impose a hard limit). The challenge though is the mechanism for asking the next page; would could potentially use search_after but how will this parameter be returned to the user in the first place?

@bpintea
Copy link
Contributor Author

bpintea commented Sep 3, 2020

could potentially use search_after but how will this parameter be returned to the user in the first place?

Generally with SQL, one simply can't server-side paginate outside a cursor-bound context (such as xDBC; or REST API-aware app, in our case) -- it's always just one result set.

Querying with or without a scroll/PIT is appealing imo due to the equivalence to SQL's transactional vs. non-transactional mode. However, maybe unlike typical RDBMSes, the "non-transactional" mode is more expensive for us (when paginating), so I guess we'd generally default to the "transactional" mode anyways? At least for our CLI/xDBC clients; which also automatically close the scroll, part of their internal cursor lifecycle.

@costin
Copy link
Member

costin commented Sep 3, 2020

At least for our CLI/xDBC clients; which also automatically close the scroll, part of their internal cursor lifecycle.

Right, I envision the stateless request on the HTTP/REST front due to its request/response contract.

@jimczi
Copy link
Contributor

jimczi commented Sep 3, 2020

The main reason why SQL relies on scroll is that by default, the vast majority of queries require pagination since the result set is too large.

But that's the same problem for normal search. I don't understand why there would be a difference in SQL. Paginating with a cursor is a feature in SQL too.

@bpintea
Copy link
Contributor Author

bpintea commented Sep 7, 2020

The difference might stem from expectations, since RDBMSs are generally expected to return all results matching a query, or up to a given limit the user provides (like ... LIMIT 10001).
We could add an SQL API parameter to control the usage of a scroll/PIT and maybe simply reuse the existing cursor parameter to return the search_/after parameter (which would also contain or not the scroll/PIT ID). Pagination would then work in both scenarios and we could also default xDBC clients to use a search context by default. Would this make sense?

@costin costin assigned Luegg and unassigned bpintea Nov 23, 2021
elasticsearchmachine pushed a commit that referenced this issue Feb 15, 2022
Resolves #61873

The goal of this PR is to remove the use of the deprecated scroll
cursors in SQL. Functionality and APIs should remain the same with one
notable difference: The last page of a search hit query used to always
include a scroll cursor if it is non-empty. This is no longer the case,
if a result set is exhausted, the PIT will be closed and the last page
does not include a cursor.

Note, PIT can also be used for aggregation and PIVOT queries but this is
not in the scope of this PR and will be implemented in a follow up.

Additionally, this PR resolves #80523 because the total doc count is no
longer required.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/SQL SQL querying >enhancement Team:QL (Deprecated) Meta label for query languages team
Projects
None yet
5 participants