-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elasticsearch does not indicate retryability when flood stage is exceeded #49393
Comments
Pinging @elastic/es-distributed (:Distributed/CRUD) |
Hi @jasontedor , I'm intersted in this issue. Should we return 429 status code if the cluster block is set manually rather than set automaticly when the flood stage is exceeded? |
@gaobinlong I think it's fine to treat them the same. I wish we had an easy way to distinguish when it's automatically set versus when it's not, be we don't really so let's proceed to treat them as the same. |
@jasontedor ok, I got it. |
Hi @jasontedor , I hava made a PR for this issue, can you help to review the code change? |
We consider index level read_only_allow_delete blocks temporary since the DiskThresholdMonitor can automatically release those when an index is no longer allocated on nodes above high threshold. The rest status has therefore been changed to 429 when encountering this index block to signal retryability to clients. Related to #49393
…#50166) We consider index level read_only_allow_delete blocks temporary since the DiskThresholdMonitor can automatically release those when an index is no longer allocated on nodes above high threshold. The rest status has therefore been changed to 429 when encountering this index block to signal retryability to clients. Related to elastic#49393
We consider index level read_only_allow_delete blocks temporary since the DiskThresholdMonitor can automatically release those when an index is no longer allocated on nodes above high threshold. The rest status has therefore been changed to 429 when encountering this index block to signal retryability to clients. Related to #49393
This PR valid from 7.7 onwards has been brought to my attention |
Closed by #50166. |
Today if a node exceeds the disk flood stage watermark, the disk threshold monitor will apply a special read-only index block to any indices that have a shard allocated to the node that exceeded the watermark. This block carries with it a forbidden status code so that if an attempt is made to index into such an index, the client receives a HTTP 403 status code.
Clients assume that a 403 status code is not retryable and they drop data.
This situation is retryable though, as once the disk threshold monitor observes the free disk space go above the appropriate threshold, the index block is automatically removed.
Rather than expecting our clients to all account for this situation (by inspecting the specifics of the exception that led to the 403 status code), we should indicate retryability by using HTTP status code 429. While 429 is often translated as "too many requests", the HTTP specification is liberal about what this means:
By making this change, all of our clients can start retrying when faced with an index that was marked read-only due to a flood stage watermark exceeded event.
Similarly, the status codes of other cluster blocks should be reexamined in this context.
The text was updated successfully, but these errors were encountered: