-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cap max RetryableAction wait time/timeout. #74940
Cap max RetryableAction wait time/timeout. #74940
Conversation
RetryableAction uses randomized and exponential back off. If unlucky, the randomization would cause a series of very short waits, which would double the bound every time, risking a subsequent very long wait. Now randomize between [bound/2, bound[. Closes elastic#70996
Pinging @elastic/ml-core (Team:ML) |
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ML changes LGTM
Consider renaming the calculateDelay(long)
function to calculateMaxDelay(long)
or calculateMaxDelayBound(long)
as the logic to determine the actual delay passed to threadPool.schedule
is in the onFailure
method.
...n/ml/src/main/java/org/elasticsearch/xpack/ml/utils/persistence/ResultsPersisterService.java
Outdated
Show resolved
Hide resolved
Will this PR fix timeouts like this one? https://gradle-enterprise.elastic.co/s/qvkqnuh5ocffw |
I am afraid not, the symptoms of the failing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…s/persistence/ResultsPersisterService.java Co-authored-by: David Kyle <[email protected]>
💔 Backport failed
To backport manually run: |
RetryableAction uses randomized and exponential back off. If unlucky, the randomization would cause a series of very short waits, which would double the bound every time, risking a subsequent very long wait. Now randomize between [bound/2, bound[. Closes elastic#70996
RetryableAction uses randomized and exponential back off. If unlucky, the randomization would cause a series of very short waits, which would double the bound every time, risking a subsequent very long wait. Now randomize between [bound/2, bound[. Closes #70996
RetryableAction uses randomized and exponential back off. If unlucky, the randomization would cause a series of very short waits, which would double the bound every time, risking a subsequent very long wait. Now randomize between [bound/2, bound[. Closes #70996
RetryableAction uses randomized and exponential back off. If unlucky,
the randomization would cause a series of very short waits, which would
double the bound every time, risking a subsequent very long wait. Now
randomize between [bound/2, bound[.
Closes #70996
This fixes
testAckedIndexing
because it ran into a final very long wait time for one of the retries.I am not necessarily fixed on this solution, though it seems like a step in the right direction. An
alternative could be to cap the final wait time to meet the timeout, but it does add more deterministic/less random
behavior and the solution here should be good enough.