Broadcast responses always return 200 - OK #29169

javanna · 2018-03-20T15:21:38Z

Besides RefreshResponse, all our BroadcastResponses always return 200, despite they may contain shard failures. Should we change that to not ignore the status from the response? Here is a list of the API that are affected:

upgrade
upgrade status
indices segments
clear cache
flush
validate query
recovery
indices stats
force merge

Only the refresh response calls the following method, which already isn't great as it picks the code from the first shard failure, but at least it doesn't always return 200:

    /**
     * The REST status that should be used for the response
     */
    public RestStatus getStatus() {
        if (failedShards > 0) {
            return shardFailures[0].status();
        } else {
            return RestStatus.OK;
        }
    }

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-03-20T15:21:43Z

Pinging @elastic/es-core-infra

jasontedor · 2018-03-20T22:30:45Z

Relates #23059, relates #28522

jasontedor · 2018-03-20T22:40:23Z

Good issue to discuss @javanna.

I think that we are doing the right thing here; a 200 OK from the coordinating node means: I got your request, it checked out okay, and I didn't blow up dutifully forwarding the request to other node(s) in the cluster nor in parsing their response(s), and here is a response back. I think we should reserve a 400 reply from the coordinating node for a bad request that was sent to the coordinating node (e.g., an unrecognized request parameter) and a 500 reply from the coordinating node for the coordinating node blowing up coordinating the request (as opposed to executing one of the sub-requests if the coordinating node holds any relevant shards).

I do conceptually think this is the right thing, and I also think that there are problems with alternatives:

if there are multiple failures, there is no good choice of which status code to use
a 400 reply for a bad request to the coordinating node would be indistinguishable from a 400 reply due to a bad sub-request (e.g., a non-existent index on one of the sub-requests in a bulk indexing request); the client will still have to parse the errors and find out what happened
a 500 reply for the coordinating node blowing up while coordinating (a stupid NPE) would be indistinguishable from a 500 reply due one of the shards blowing up handling a sub-request; the client will still have to parse the errors and find out what happened

In fact, I think the refresh behavior is wrong here; this should be considered a bug and fixed.

bleskes · 2018-03-21T13:50:54Z

We bump every once in a while into the fact that there is no status code for "partial results". As @jasontedor noted we had similar discussions for the bulk API. I'm +1 on Jason's suggestion - "bundled" requests' status should reflect the result of coordination. For individual sub-request people should look at the body.

javanna · 2018-03-21T14:57:10Z

See 39e7c30 for the commit that introduced returning error if there are shard failures for refresh API. Not sure what the implications could be of changing this back. Seems like the getStatus method should have been added in RefreshResponse rather than BroadcastResponse?

When it comes to API like refresh and flush and force merge that change the state of the index, I struggle seeing this as "partial results". I do get the point of looking at the response body which holds failures. Should the behaviour change as well then if we ever change the default behaviour for the search API?

bleskes · 2018-03-23T13:46:42Z

We discussed this some more and came up with these guidelines:

_bulk, _search and _msearch are special and have their own semantics.
Broadcast requests that should do something on behalf of the user (like flush and refresh) should return a non 2xx code if that things was not successfully done on any of the targets.
Other broadcast requests (like node stats) should use the body to communicate a failure of one of the targets.

jaymode · 2020-12-14T17:52:36Z

Broadcast requests that should do something on behalf of the user (like flush and refresh) should return a non 2xx code if that things was not successfully done on any of the targets.

Other broadcast requests (like node stats) should use the body to communicate a failure of one of the targets.

@russcam based on your proposal in #60442 for using 207 as the response code when there are varying results, would you expect that to apply in these situations as well?

thecoop · 2024-11-01T16:52:34Z

This is covered by #60442, so this can be looked at there.

javanna added discuss :Core/Infra/REST API REST infrastructure and utilities labels Mar 20, 2018

javanna added help wanted adoptme >enhancement and removed discuss labels Mar 26, 2018

jimczi mentioned this issue Apr 16, 2019

allow_partial_search_results setting must be passed as query-string parameter #41223

Closed

danielsnider mentioned this issue Apr 23, 2019

200 response with even though response body is error #41434

Open

DaveCTurner mentioned this issue Mar 3, 2020

Indices with index.blocks.read_only set to true return OK (HTTP-200) on _bulk requests #53013

Closed

rjernst added the Team:Core/Infra Meta label for core/infra team label May 4, 2020

rjernst added the needs:triage Requires assignment of a team area label label Dec 3, 2020

jaymode removed the needs:triage Requires assignment of a team area label label Dec 14, 2020

thecoop closed this as not planned Won't fix, can't repro, duplicate, stale Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broadcast responses always return 200 - OK #29169

Broadcast responses always return 200 - OK #29169

javanna commented Mar 20, 2018 •

edited

Loading

elasticmachine commented Mar 20, 2018

jasontedor commented Mar 20, 2018

jasontedor commented Mar 20, 2018

bleskes commented Mar 21, 2018

javanna commented Mar 21, 2018 •

edited

Loading

bleskes commented Mar 23, 2018

jaymode commented Dec 14, 2020

thecoop commented Nov 1, 2024

Broadcast responses always return 200 - OK #29169

Broadcast responses always return 200 - OK #29169

Comments

javanna commented Mar 20, 2018 • edited Loading

elasticmachine commented Mar 20, 2018

jasontedor commented Mar 20, 2018

jasontedor commented Mar 20, 2018

bleskes commented Mar 21, 2018

javanna commented Mar 21, 2018 • edited Loading

bleskes commented Mar 23, 2018

jaymode commented Dec 14, 2020

thecoop commented Nov 1, 2024

javanna commented Mar 20, 2018 •

edited

Loading

javanna commented Mar 21, 2018 •

edited

Loading