-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Paginating ClusterManager Read APIs] Paginate _cat/shards API. #14257
Labels
Comments
gargharsh3134
added
enhancement
Enhancement or improvement to existing feature or request
untriaged
labels
Jun 13, 2024
This was referenced Jul 2, 2024
It makes sense not to have |
rwali-aws
added
v2.17.0
and removed
v2.16.0
Issues and PRs related to version 2.16.0
labels
Jul 22, 2024
github-project-automation
bot
moved this to 2.17 (First RC 09/03, Release 09/17)
in OpenSearch Project Roadmap
Aug 30, 2024
github-project-automation
bot
moved this from Now(This Quarter)
to ✅ Done
in Cluster Manager Project Board
Oct 7, 2024
This was referenced Oct 22, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Is your feature request related to a problem? Please describe
As the number of shards grow in an opensearch cluster, the response size and the latency of the default _cat/shards API increases which makes it difficult not only for the client to consume such large responses, but also stresses out the cluster by making it accumulate stats across all the shards.
Thus, pagination will not only help in limiting the size of response for a single query but will also prevent the cluster from accumulating shards stats for all the shards. So, this issue tracks the approaches that can be used to paginate the response.
Describe the solution you'd like
Drawing inspiration and extending the approach called out in: #14258
A new
_list/shards
API would be introduced.For paginating the response, a pagination key would be required for which a deterministic order is/can be maintained/generated in the cluster. Deterministic order is required for starting a new response page from the point where the last page left. Index creation timestamps will thus be used as pagination keys.
Overview
Each index has a creation timestamp stored in IndexMetadata which is part of Metadata object of ClusterState. These creation timestamps can act as sort/pagination key using which list of indices, sorted as per their respective creation timestamps, can be generated. The generated sorted list can then be used to prepare a list of shards to be sent in response as per the page size.
Proposed User Experience
New API Path/URL:
curl "localhost:9200/_list/shards?___
curl "localhost:9200/_list/shards/{indices}?___
where{indices}
is a comma separated list of indices.New Query Parameters:
Illegal_Argument_Exception
Sample Query ->
curl "localhost:9200/_list/shards?next_token=<nextToken>&size=20000&sort=asc"
New Response Parameters:
Note: The next_token would be Base64 encoded.
New Response Formats:
format=JSON: next_token, and shards will be new keys of the JSON response object.
Plain text format (or table format): next_token will be the last row of the table.
Proposed Pagination Behaviour
Note: The indices which might get created while the paginated queries are being executed, will be referred to as newly created indices for the following points.
Number of shards in a page will always be less than or equal to the user provided size query parameter iff user provided value is greater than the default value (10k). i.e.
page_size = max(userProvidedMaxPageSize, defaultMaxPageSize)
.Given that shardRoutings for a shardID do NOT have any unique identifiers, it becomes difficult to define a strategy which can help start the next page from the point where last page left incase shards corresponding to a shardID get split across pages. So, shards for a shardID should NOT split/span across pages and need to be displayed in a single page. This limitation then inherently imposes a restriction on pageSize, i.e. minimum pageSize should always be greater than the maximum of number of replicas across all the indices in the cluster.
min(max_page_size) = max(#replicasOfIndex1, #replicasOfIndex2, #replicasOfIndex3, ....)
.With this restriction on max_page_size, it is being proposed to set a high value of default max_page_size and use that incase user provided value is lesser than it.
Displaying shards for newly created indices will depend on the requested sort type.
If sort is specified as "asc", then the newly created shards will be shown as part of rear end of the response pages. However, for sort specified as "desc", because the subsequent pages will only contain the shards which are older than the already displayed ones, newly created shards will be filtered out.
Any shard for an index yet to be displayed, if deleted will NOT be a part of response.
Implementation Details:
Extending and implementing the classes and interfaces called out under #14258, and introducing a new ShardPaginationStrategy class which would encompass the core logic to generate pages of shards.
Related component
Cluster Manager
Describe alternatives you've considered
Pagination key is NodeID.
The idea here will be to respond with all the shards on a set of nodes in a single page.
The concern with having NodeID as a pagination key is that, it is not agnostic to index creations or cluster re-balancing activities which could happen while the queries are getting executed.
Additional context
No response
The text was updated successfully, but these errors were encountered: