-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broadcast responses always return 200 - OK #29169
Comments
Pinging @elastic/es-core-infra |
Good issue to discuss @javanna. I think that we are doing the right thing here; a 200 OK from the coordinating node means: I got your request, it checked out okay, and I didn't blow up dutifully forwarding the request to other node(s) in the cluster nor in parsing their response(s), and here is a response back. I think we should reserve a 400 reply from the coordinating node for a bad request that was sent to the coordinating node (e.g., an unrecognized request parameter) and a 500 reply from the coordinating node for the coordinating node blowing up coordinating the request (as opposed to executing one of the sub-requests if the coordinating node holds any relevant shards). I do conceptually think this is the right thing, and I also think that there are problems with alternatives:
In fact, I think the refresh behavior is wrong here; this should be considered a bug and fixed. |
We bump every once in a while into the fact that there is no status code for "partial results". As @jasontedor noted we had similar discussions for the bulk API. I'm +1 on Jason's suggestion - "bundled" requests' status should reflect the result of coordination. For individual sub-request people should look at the body. |
See 39e7c30 for the commit that introduced returning error if there are shard failures for refresh API. Not sure what the implications could be of changing this back. Seems like the When it comes to API like refresh and flush and force merge that change the state of the index, I struggle seeing this as "partial results". I do get the point of looking at the response body which holds failures. Should the behaviour change as well then if we ever change the default behaviour for the search API? |
We discussed this some more and came up with these guidelines:
|
@russcam based on your proposal in #60442 for using |
This is covered by #60442, so this can be looked at there. |
Besides
RefreshResponse
, all ourBroadcastResponse
s always return200
, despite they may contain shard failures. Should we change that to not ignore the status from the response? Here is a list of the API that are affected:Only the refresh response calls the following method, which already isn't great as it picks the code from the first shard failure, but at least it doesn't always return
200
:The text was updated successfully, but these errors were encountered: