[FEATURE REQ] improve spring-data-cosmos paging for non-trivial datasets #24262

Blackbaud-MikeLueders · 2021-09-22T15:41:48Z

Is your feature request related to a problem? Please describe.
spring-data paging requires row counts in order to function. Row counts in cosmos are generally expensive. For datasets of non-trivial size (say, >100k), a count query which includes an order clause is orders of magnitude more expensive than the same query without the count clause. In the spring-data-cosmos library, every 'paged' query also issues a count query in order to conform to the spring-data page model. For collections of non-trivial size, this makes paged queries prohibitively expensive. For small datasets, this is not an issue and things work as you would expect. For larger datasets, this makes paging unusable.

Describe the solution you'd like
One option is to do away with the facade of paging in spring-data and lean into the continuation token for all operations. This would mean certain methods of org.springframework.data.domain.Page would simply fail - getTotalPages/getTotalElements/previous/etc - while hasNext would rely on the continuation token instead of total element calculation. There are other options like not failing but returning incorrect data, though I think failing would be better from a user perspective so as to not give the false illusion of functioning. In any case, the important part is the elimination of the count query.

This brings up two obvious issues. First, it breaks traditional paging for clients that are using spring-data-cosmos and do not have large datasets. Second, it is not backwards compatible with the existing implementation. To sidestep these issues, a configuration option could be introduced which would allow the client to select which paging strategy to use. The current functionality would remain the default but the user could opt into the continuation token strategy which would not require count queries on paged requests.

Thoughts? Am I missing something wrt to the count queries being extremely inefficient? Are there other approaches which should be considered? I'd like to get consensus on an approach before I begin work on a PR.

Additional context
To be clear, I realize I could use CosmosTemplate:sliceQuery to execute paged queries since no count query is executed as part of the slice method. However, this requires manually constructing the query and using the CosmosTemplate directly, as opposed to using dynamic query derivation via repository methods which accept a Pageable input. Using the CosmosTemplate for paged queries and the repository for all other queries is far from ideal.

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

Description Added
Expected solution specified

joshfree · 2021-09-22T17:27:19Z

@stliu @kushagraThapar could you please take a look

kushagraThapar · 2021-09-23T06:35:10Z

@Blackbaud-MikeLueders - let me discuss this issue within my team - Here is some general information that I found - https://docs.microsoft.com/en-us/azure/cosmos-db/sql/sql-query-pagination

@j82w - tagging you here, in case you have any ideas on how to implement pagination in our SDKs with cosmos queries in general.

kushagraThapar · 2021-09-29T17:35:54Z

@Blackbaud-MikeLueders - We are still discussing this feature with our query team to make sure we cover every aspect of this.

Also, another limitation on continuation tokens is that they don't work for DISTINCT / GROUP BY queries, as mentioned here - https://docs.microsoft.com/en-us/azure/cosmos-db/sql/sql-query-pagination#continuation-tokens

Blackbaud-MikeLueders · 2021-09-29T22:08:18Z

@kushagraThapar - No worries, thanks for the update. I think that's a reasonable limitation, especially since DISTINCT does work if you add an ORDER BY. Depending on the route we go, we're likely going to need to include documentation on how paging works with spring-data, so I think this will be fine as long as we clearly document the limitations and include workarounds for the DISTINCT/GROUP BY queries (e.g. using sliceQuery from CosmosTemplate).

The documentation specifically says "you cannot use continuation tokens" - do you know specifically what that means? Will a continuation token not be returned as a reply header on the initial query? So is it the case that the library would need to detect use of DISTINCT (without ORDER BY) and GROUP BY and disallow those queries if in 'continuation token' mode? Otherwise, it would 'work' in the sense that the first page of results would be returned but it would appear as there were no more results. Or is there another way to detect this?

kushagraThapar · 2021-09-30T17:05:31Z

@Blackbaud-MikeLueders DISTINCT works with ORDER BY, it is mentioned in the above documentation that I linked.
Yes, regarding documentation, we should be able to well document our new pagination feature.

Regarding the continuation token, it will be returned from the initial page for the GROUP BY / DISTINCT query, however, SDK won't resume from that continuation token returned in the first page. It throws the error -

azure-sdk-for-java/sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/GroupByDocumentQueryExecutionContext.java

Line 32 in 3f31d68

    
           public static final String CONTINUATION_TOKEN_NOT_SUPPORTED_WITH_GROUP_BY = "Continuation token is not supported " +

j82w · 2021-09-30T19:27:50Z

One option is to do away with the facade of paging in spring-data and lean into the continuation token for all operations. This would mean certain methods of org.springframework.data.domain.Page would simply fail - getTotalPages/getTotalElements/previous/etc - while hasNext would rely on the continuation token instead of total element calculation. There are other options like not failing but returning incorrect data, though I think failing would be better from a user perspective so as to not give the false illusion of functioning. In any case, the important part is the elimination of the count query.

@Blackbaud-MikeLueders what if the getTotalPages/getTotalElements/etc is made to be lazy while making the hasNext depend on the the continuation token? This way it avoids the count query without causing a breaking change so most users will get the benefit without needing to opt-in via some option.

Blackbaud-MikeLueders · 2021-10-03T20:56:30Z

@j82w yeah, i think that's probably a better approach.

We should still consider a configuration option for this - defaulting to lazy but configurable as eager. The reason being, some clients (such as my company) do not use the built-in autoscale mechanism of cosmos but instead rely on a proxy to the repository object which can be used to intercept method requests and scale up only after retry. We've found that the vast majority of throttle events do not require a scale up but just a single retry to succeed. If we were relying on the count, having the count operation part of the repository method call would be desirable since the retry/autoscale mechanism would kick in - executing the count lazily would require detecting the Page response and wrapping that in a separate proxy.

That said, I'm not going to fight for the configuration option since I can't think of a scenario I would use it (since I don't use paging in the first place) but I wanted to present an actual usage scenario for consideration, since there might be clients who rely on an eager count operation.

j82w · 2021-10-07T17:03:30Z

I agree with the configuration option.

Blackbaud-MikeLueders · 2021-10-07T21:54:23Z

@kushagraThapar @j82w are we aligned on an approach? Should I begin working on a PR or are y'all still discussing?

kushagraThapar · 2021-10-07T22:35:06Z

@Blackbaud-MikeLueders We are aligned, would be great to start the work and then we can continue the discussion on the PR.

mlueders · 2022-03-21T19:06:46Z

I've closed the associated PR as I no longer believe the overall approach makes sense. Rather than use org.springframework.data.domain.Page, use org.springframework.data.domain.Slice which exposes no page number and is a much better fit for the cosmos model.

joshfree added azure-spring All azure-spring related issues azure-spring-cosmos Spring cosmos related issues. Client This issue points to a problem in the data-plane of the library. labels Sep 22, 2021

ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Sep 22, 2021

joshfree added feature-request This issue requires a new behavior in the product in order be resolved. and removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Sep 22, 2021

ghost added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Sep 22, 2021

chenrujun assigned kushagraThapar Sep 23, 2021

saragluna assigned mbhaskar Sep 23, 2021

kushagraThapar unassigned mbhaskar Sep 23, 2021

kushagraThapar mentioned this issue Oct 5, 2021

[FEATURE REQ] spring-data-cosmos paging without incurring cost of count query for custom query using paginationQuery method. #23484

Closed

2 tasks

Blackbaud-MikeLueders mentioned this issue Oct 12, 2021

add option to lazily invoke count query on paged requests #24747

Closed

chenrujun added this to Spring Cloud Azure Nov 5, 2021

chenrujun moved this to Todo in Spring Cloud Azure Nov 5, 2021

chenrujun added this to the Backlog milestone Nov 22, 2021

Blackbaud-MikeLueders closed this as completed Mar 24, 2022

Repository owner moved this from Todo to Done in Spring Cloud Azure Mar 24, 2022

github-actions bot locked and limited conversation to collaborators Apr 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE REQ] improve spring-data-cosmos paging for non-trivial datasets #24262

[FEATURE REQ] improve spring-data-cosmos paging for non-trivial datasets #24262

Blackbaud-MikeLueders commented Sep 22, 2021

joshfree commented Sep 22, 2021

kushagraThapar commented Sep 23, 2021

kushagraThapar commented Sep 29, 2021

Blackbaud-MikeLueders commented Sep 29, 2021

kushagraThapar commented Sep 30, 2021

j82w commented Sep 30, 2021

Blackbaud-MikeLueders commented Oct 3, 2021

j82w commented Oct 7, 2021

Blackbaud-MikeLueders commented Oct 7, 2021

kushagraThapar commented Oct 7, 2021

mlueders commented Mar 21, 2022

[FEATURE REQ] improve spring-data-cosmos paging for non-trivial datasets #24262

[FEATURE REQ] improve spring-data-cosmos paging for non-trivial datasets #24262

Comments

Blackbaud-MikeLueders commented Sep 22, 2021

joshfree commented Sep 22, 2021

kushagraThapar commented Sep 23, 2021

kushagraThapar commented Sep 29, 2021

Blackbaud-MikeLueders commented Sep 29, 2021

kushagraThapar commented Sep 30, 2021

j82w commented Sep 30, 2021

Blackbaud-MikeLueders commented Oct 3, 2021

j82w commented Oct 7, 2021

Blackbaud-MikeLueders commented Oct 7, 2021

kushagraThapar commented Oct 7, 2021

mlueders commented Mar 21, 2022