-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] API for decommissioning/recommissioning zone and weighted zonal search request routing policy #3639
Comments
@imRishN Could we please add req(/res) for obtaining current status of Recommission/Decommission as well? |
@saikaranam-amazon updated the req/res for both update/get calls for recommission/decommission |
Thanks @imRishN
|
@saikaranam-amazon could you help elaborate on why that might be needed? |
As we have two APIs to update the weights of the search traffic and decommissioning entire zone(without any additional checks/wait time), the later operation might incur instability for some of the inflight operations.
|
sure @Bukhtawar |
Thanks @imRishN . should we have If users want to know the status of |
When we decommission the zone it is possible that traffic hasn't be DRAINED in which case it might take longer and calls getting timed out. The |
@Bukhtawar : This makes sense . But do you think we can start without it for now and iterate on it later based on the need ? In the case where traffic hasn't be DRAINED, we would return the API call , with the reasons for the same. A user can call the APIs with a lower timeout and see the reasons for that getting stalled. As and when we add more checks around snapshots, shard relocation , that will automatically get added to the reasons as well . |
Can we add labels for "roadmap" and the version of OpenSearch this is targeting? I can add it to the overall project roadmap in the right column once that is done. |
Regarding the
How are responding regarding failures in executing the call? - ( May be let's track |
Regarding the
Can we have the list of values under |
Updated the API contract above, including
This makes sense to have a list under the same key. Updated the details |
Should we change |
Updated the API structures above |
Thanks for summarizing the API design @imRishN , I personally see large disconnect between the existing routing awareness and suggested decomissioning / recommissioning API.
Logically, it looks to me that The weights could be modeled in the similar fashion using I have nothing against introducing dedicated APIs but it is going to be difficult and confusing to maintain the API/settings split. Also, one important thing to keep in mind is that cluster settings could be Does it make sense or I completely derailed the conversation? |
@reta, thanks for taking a look into the RFC. The cluster settings that you mentioned above is more of shard allocation strategy based on the awareness attribute set to the cluster. As part of decommissioning an awareness attribute, we intend to remove the nodes from the cluster during zonal outages as it might be operating in a degraded manner and impacting the overall cluster's availability. Today, any write request requires a response from all the shard copies before the request is acknowledged. During zonal outages, this model can impact the writes to the cluster as any slow copy or impairment will slow down the writes significantly. The API design gives the user flexibility to remove the nodes present in an impacted zone out from the cluster and mark shards there as unavailable.
We don't need to remove the zone from force zone values as it might trigger a storm of shard recoveries impacting latencies due to additional CPU and network consumption. We will let shard stay in UNASSIGNED state after decommissioning the zone. During recovery, the user can decide on recommissioning the zone back again. More details on recommission and decommissioning a zone can be found here #3402 |
@imRishN aha, I see, thanks for clarification, I think I have even more questions, this time regarding the API:
The
And in case of
Regarding
The termilogy we settled upon is just
Even better (arguably) approach is to follow decommission/decommission and design something like this:
WDYT? |
That's correct. The API will take in the awareness attribute set to the cluster by the setting |
@reta, thanks for the suggestions. I have updated the API path for weights as well. Let me know if this looks good to you? |
Thanks @imRishN , it looks concise to me (minor typo with missed slash, |
@reta, thanks for pointing out. Updated above. |
Default looks like - {"msg":"Weights are not set"}. Shouldn't we just return empty object {} |
@imRishN This is what I think it should be:
|
@sachinpkale this is another way to look at it, but I believe we aim to decommission an awareness attribute value ( |
Will make the response empty object |
@sachinpkale Although, the attribute key value is a node property, but awareness in general is a cluster property. This is how we set the awareness attribute to the cluster - Also, I feel, |
Closing this issue as ALL the API PRs are merged now |
I tried to build a test following our documentation to exercise these APIs and failed. If someone wants to either pickup the adding/completing specs in the API specification for this, or at least can show me how to get a simple cluster to a state where these APIs can be called, starting with the code in the description of opensearch-project/opensearch-api-specification#524 that would be great. |
Is your feature request related to a problem? Please describe.
#3402 aims to build support for decommissioning and recommissioning a zone based on the value assigned to a zonal value. Similarly, #2859 aims to build support for weighted zonal search request using weighted round robin mechanism. We need to have a consistent and precise API structure finalised for the same.
Scope of this issue is limited to finalise the API structure for zonal decommission/recommission and weighted zonal search request.
Describe the solution you'd like
Below are API structure that we can use for zonal decommission/recommission and weighted zonal search request.
Zone Decommission
Zone Recommission
Get Zone Decommission Status
Weighted Round Robin for search request
Get weight for a local node
Get Weight
The
PUT /_cluster/decommission/awareness/<zone>/<zoneA>
would ensure it modifies the weights to weigh away the traffic of the zone attribute and would also check if there is no incoming HTTP traffic or search traffic to the weighed away zone. If there is traffic it moves the status toDRAINING
once incoming HTTP traffic and search traffic is drained, the decommission is executed.Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: