-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove support for maxRetryTimeout from low-level REST client #38085
Remove support for maxRetryTimeout from low-level REST client #38085
Conversation
We have had various reports of problems caused by the maxRetryTimeout setting in the low-level REST client. Such setting was initially added in the attempts to not have requests go through retries if the request already took longer than the provided timeout. The implementation was problematic though as such timeout would also expire in the first request attempt (see elastic#31834), would leave the request executing after expiration causing memory leaks (see elastic#33342), and would not take into account the http client internal queuing (see Given all these issues, my conclusion is that this custom timeout mechanism gives little benefits while causing a lot of harm. We should rather rely on connect and socket timeout exposed by the underlying http client and accept that a request can overall take longer than the configured timeout, which is the case even with a single retry anyways. This commit removes the maxRetryTimeout setting and all of its usages.
Pinging @elastic/es-core-features |
@nik9000 we have talked in the past about rewriting and fixing the max retry timeout mechanism. Curious what you think of removing it completely, I think given all the problems it's causing, this is better than trying to fix it and complicate the codebase. I wish I did not add that in the first place. |
run elasticsearch-ci/2 |
1 similar comment
run elasticsearch-ci/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with dropping the timeout, though I think it'd be nice to be clear what the behavior is now. I think we are now relying on whatever timeout we give to the async http client and it will have to timeout. Right?
client/rest/src/main/java/org/elasticsearch/client/RestClient.java
Outdated
Show resolved
Hide resolved
client/rest/src/main/java/org/elasticsearch/client/RestClient.java
Outdated
Show resolved
Hide resolved
I dont know enuf about the history of this to have any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
client/rest/src/main/java/org/elasticsearch/client/RestClient.java
Outdated
Show resolved
Hide resolved
client/rest/src/main/java/org/elasticsearch/client/RestClient.java
Outdated
Show resolved
Hide resolved
client/rest/src/test/java/org/elasticsearch/client/SyncResponseListenerTests.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I left a small thing but LGTM.
@@ -212,66 +222,50 @@ private Response performRequest(final NodeTuple<Iterator<Node>> nodeTuple, | |||
} catch(Exception e) { | |||
RequestLogger.logFailedRequest(logger, request.httpRequest, context.node, e); | |||
onFailure(context.node); | |||
Exception cause = unwrapExecutionException(e); | |||
Exception cause = extractAndWrapCause(e); | |||
addSuppressedException(previousException, cause); | |||
if (nodeTuple.nodes.hasNext()) { | |||
return performRequest(nodeTuple, request, cause); | |||
} | |||
if (cause instanceof IOException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. I think you might be able to drop these instanceof
if you moved some stuff into extractAndWrapCause
. You already know the type. I'm not sure you need to do that though. It'd be slightly cleaner, I think, but it isn't worth holding up the PR for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the thing is that there are cases where I don't re-throw the exception gotten from extractAndWrapCause, I may pass it over to performRequest is I can retry on anothe rnode :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ha! I see that two lines up. That makes sense.
} | ||
throw new RuntimeException(cause); | ||
throw new IllegalStateException("cause must be either RuntimeException or IOException", cause); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "unexpected exception type so wrapping into an expected one to prevent even more chaos"?
|
||
/** | ||
* Wrap whatever exception we received, copying the type where possible so the synchronous API looks as much as possible | ||
* like the asynchronous API. We wrap the exception so that the caller's signature shows up in any exception we throw. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd reverse the sentences. "Wrap the exception so the caller's signature shows up in the stack trace, taking care to copy the original type and message where possible so async and sync code don't have to check different exceptions."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm aware you may be copying a comment that I wrote and I'm effectively commenting on my own way of writing javadoc....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
I was excited to lose the |
It is though better than before, we no longer wrap ResponseException and we wrap each exception rather than only the final (outer one), which means that the suppressed exceptions are in the returned exception rather than in the cause. |
run elasticsearch-ci/2 |
run elasticsearch-ci/default-distro |
…on-leases-recovery * elastic/master: SQL: Allow look-ahead resolution of aliases for WHERE clause (elastic#38450) Add API key settings documentation (elastic#38490) Remove support for maxRetryTimeout from low-level REST client (elastic#38085) Update IndexTemplateMetaData to allow unknown fields (elastic#38448) Mute testRetentionLeasesSyncOnRecovery (elastic#38488) Change the min supported version to 6.7.0 for API keys (elastic#38481)
@javanna We're seeing the connection leak in 6.1.2 even with the following settings: connection timeout: 5s Issue #33342 seems to indicate that a max retry timeout greater than the socket timeout is an effective workaround, but that doesn't seem to be valid for our use case. So I'm considering applying this patch to a local fork of 6.1.2 instead. Was there any reason you chose to put this into 7.x besides the backwards compatibility implications of losing the |
@coryfklein I'm afraid that you are experiencing a different problem, or maybe the same problem but caused by different circumstances. Setting a high enough max retry timeout should achieve the same as applying the patch to 6.1.2 as effectively it guarantees that the max retry timeout mechanism does not kick in. You can set it even higher just to make sure. I generally don't think that setting socket timeout to 5 minutes is a good idea, remember that it's not the overall duration of the request but it's applied at a lower level, meaning that socket timeout expires only when nothing goes through the wire for 5 minutes in your case. Could you open a separate issue and provide more info on what you are experiencing please? |
We have had various reports of problems caused by the
maxRetryTimeout
setting in the low-level REST client. Such setting was initially added
in the attempts to not have requests go through retries if the request
already took longer than the provided timeout.
The implementation was problematic though as such timeout would also
expire in the first request attempt (see #31834), would leave the
request executing after expiration causing memory leaks (see #33342),
and would not take into account the http client internal queuing (see
#25951).
Given all these issues, my conclusion is that this custom timeout
mechanism gives little benefits while causing a lot of harm. We should
rather rely on connect and socket timeout exposed by the underlying
http client and accept that a request can overall take longer than the
configured timeout, which is the case even with a single retry anyways.
This commit removes the
maxRetryTimeout
setting fromRestClient
andRestClientBuilder
and all of its usages.