-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should GET / return 503 in case of discovery.zen.no_master_block: write ? #8902
Comments
@Mpdreamz Agreed, the main rest endpoint should return 200 in case there is no elected master. It doesn't report on cluster state related things, just configured cluster_name and a couple of node related stats. |
should it always return |
true, we should return 503 when all operations are blocked, but in the case On 16 January 2015 at 12:23, Luca Cavanna [email protected] wrote:
Met vriendelijke groet, Martijn van Groningen |
+1 . During master lost we go into a new master election which takes 3s (by default). During those 3s the node has a configured block - if it allows read we should indeed return 200. This is likely a transient state which will be solved before we start rejecting indexing requests (remember they wait up to 1m for the situation to be resolved). |
I'm not sure that this is the right thing to do. Imagine you're using sniffing. You try to perform a write and get back a 503 so you sniff and get back a 200, then you try the write again, get back a 503, etc etc. That said, the above would work for reads. I know the python client aborts after 3 attempts, while the Perl client keeps going until it gets back a 503 on sniffing. Perhaps we should only sniff once before giving up. |
@clintongormley how are other transient errors that are not reflected by Another aspect to consider here - on master loss (since 1.4), all data nodes will have a master block for 3s. If you hit |
We have said that a 503 response code should mean "retry on another node".
Not in the Perl client, but not sure about the others. I think 429 should probably not retry but instead backoff.
That means retry on another node.... This one is debatable. If you've sent the request that has triggered the circuit breaker, you could then replicate that bad behaviour across all nodes in the cluster by retrying. |
A 503 is completely broken behavior here. A REST status is a response for the given request. A 503 means "I am overloaded right now, I can not handle your request." That is completely out of alignment with a |
I opened #29045. |
This PR update readiness probe endpoint to check only `/` endpoint instead of `/_cluster/health?timeout=0s` when Elasticsearch is already running. This revert to initial config which was changed in elastic#380 with the exception that 503 HTTP code is accepted for 6.x (see elastic/elasticsearch#8902 for more details about why 503 is OK on Elasticsearch 6.x).
This PR update readiness probe endpoint to check only `/` endpoint instead of `/_cluster/health?timeout=0s` when Elasticsearch is already running. This revert to initial config which was changed in elastic#380 with the exception that 503 HTTP code is accepted for 6.x (see elastic/elasticsearch#8902 for more details about why 503 is OK on Elasticsearch 6.x).
This PR update readiness probe endpoint to check only `/` endpoint instead of `/_cluster/health?timeout=0s` when Elasticsearch is already running. This revert to initial config which was changed in elastic/helm-charts#380 with the exception that 503 HTTP code is accepted for 6.x (see elastic/elasticsearch#8902 for more details about why 503 is OK on Elasticsearch 6.x).
Given we have two nodes one
(A)
with:and
(B)
being a vanilla master node.When we stop node
(B)
,(A)
is still allowed to service read requests.However when calling
GET http://(A):9200/ HTTP/1.1
It currently returns:
HTTP/1.1 503 Service Unavailable
but is the service really unavailable in this case? Since we now explicitly allow you to configure for this state IMO it should return
200 OK
with a possible boolean in the response signalling its in readonly mode.A call to
_search
in this state also results in a200
and not503
.The text was updated successfully, but these errors were encountered: