-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Paginating _cat APIs #15013
Comments
the url path can be curl "localhost:9200/_cat/indices/paginate" so that usage is explicit |
@gargharsh3134 Please expand on this more with examples around _cat/segments and _cat/recovery, _cat/snapshots etc. would filtering be mandatory? how are the thresholds setup? |
@shwetathareja Since for this new path, we will anyway require a new RestAction, was thinking of keeping it more generic. As we are aligned towards deprecating the existing ones and making pagination as the default behaviour, having pagination related keywords in the URL seemed bit restrictive. That being said, i'm open to changing the path. Please let me know, if you still feel, pagination related keyword in path is required. |
I don't like the idea of calling this as V2 API and deprecating default V1 in future. The url path can be: A feature flag ( The In this way, a user will not have to worry about which variant of an API to use and this (should ideally?) comes with no learning curve for the user. Also, the user will have the option to force a particular behavior by setting pagination to true/false. My point is that a user should just set the large cluster behavior once and then let OpenSearch take care of the underlying logic that we use for all APIs then. |
The problem with feature flags that toggle default behavior is that there's no way to know for an instance of OpenSearch which flavor will be enabled. Thus your client has to be aware of what options the server has, and building a single client that works against both flavors is now impossible. I recommend adding the flag, defaulting it to false, then flipping it in the next major version. |
Another thought, if |
@dblock Thanks for taking a look. Please find my responses for the 2 queries below:
The proposal is to keep the flag disabled by default, and the existing queries can work as is. Users can enable it if required, if done so, each API can honour its pre-defined fail-fast mechanism (for e.g,
The scenarios around using query params as an identifier were evaluated (say user explicitly passing |
I understand, but please see my reasoning of having this flag at all. Clients will not be able to adapt easily to an API that behaves differently behind a feature-flag. |
@dblock in large cluster mode when there are too many nodes and shards, the admin APIs may just continue to timeout and stress the nodes in the cluster. This |
@shwetathareja maybe we miscommunicated, I am referring to the following:
Here fail fast is not what's proposed, it's a different API behavior. |
Got it @dblock, yeah not going ahead with different API behavior based on feature flag. |
If we are talking about V2 _cat API, is it possible to include security in a way that resources like indices ( |
@sharp-pixel We decided to introduce new I'll update the RFC with latest developments. |
Are the new _cat APIs being introduced in 2.18? |
@Pallavi-AWS Yes, they will be part of 2.18 release. Documentation PR (opensearch-project/documentation-website#8594) is still under review though, if you are looking for behavioural details. |
Is your feature request related to a problem? Please describe
The
_cat
APIs in opensearch (such as_cat/indices
,_cat/shards
...) which are primarily used for monitoring and operational purposes, are both CPU and Memory intensive and thereby consume significant resources. As the cluster size increases (number of nodes, shards and indices), the usage of _cat APIs start to adversely impact the cluster. For large clusters, these APIs not only put the cluster's availability at risk, but their non-paginated responses make it difficult for the clients to consume their correspondingly larger response sizes with increased latencies.The proposal to overcome such issues, is to paginate these APIs which would help in limiting both the response size and resource consumption (by not aggregating stats or information of all the queried elements at once).
Describe the solution you'd like
Proposal is to implement token based pagination.
Requirements:
_cat/indices
API will always use point in time information from the current cluster state and build response accordingly._cat/indices
API has list of indices ordered according to their creation time, users should be able to define the order, be it ascending or descending.Approach:
After evaluating multiple options (discussed in #14258), the conclusion is to introduce new
_list
APIs whenever pagination support for any existing or new API is to be provided (reference _list/indices & _list/shards).Meta Issue: #15014
Related component
Cluster Manager
Describe alternatives you've considered
Introducing new V2 APIs for which default behaviour is paginated responses.
As supporting pagination in existing APIs as a default behaviour is a breaking change, the proposal is to instead
introduce new V2 APIs. The existing APIs can then be deprecated (say in opensearch version 3.x).
For e.g.
curl "localhost:9200/_cat/indices/V2"
Introducing new feature flag (say largeClusterModeEnabled) which if set to true, the existing APIs will fail fast with a
validation error that non-paginated queries are not supported for large clusters.
Additional context
No response
The text was updated successfully, but these errors were encountered: