-
Notifications
You must be signed in to change notification settings - Fork 923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a bug where a connection may be not reused when using RetryingClient
#5290
Conversation
…client. Motivation: Armeria's `HttpChannelPool` is bound to an `EventLoop`. Different `EventLoop`s have different `HttpChannelPool`s. In other words, in order to reuse a connection for an `Endpoint`, the same `EventLoop` must be selected. When creating a derived client in `RetryingClient`, a new endpoint is selected for each retry, but since the `EventLoop` of parent is used as is. That causes that the `Endpoint` can't use the existing connection pool for multiplexing and makes a new connection created. Modifications: - Use `EventLoopScheduler` to return constant `EventLoop`s for the same endpoint. - Allow setting `EndpointGroup` in `ClientRequestContextBuilder` for testability. Result: - You no longer not see a connection leak when using `RetryingClient` with `EndpointGroup`.
RetryingClient
acquireEventLoop(endpoint); | ||
} | ||
|
||
log = RequestLog.builder(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we should do this before calling acquireEventLoop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor questions 🙇
@@ -531,6 +528,14 @@ private DefaultClientRequestContext(DefaultClientRequestContext ctx, | |||
|
|||
this.endpointGroup = endpointGroup; | |||
updateEndpoint(endpoint); | |||
if (ctx.endpoint() == endpoint || endpoint == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question) Is this compare by reference as opposed to equality intentionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, reference comparison was used to know the derivation was the initial attempt. The logic is changed to check the initial attempt more explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reference comparison was used to know the derivation was the initial attempt
Are you suggesting that ctx.endpoint() == endpoint
implies that it is the initial attempt?
Anyways, I was actually implying that it is possible that an endpoint group updates its endpoints, and comparing by reference won't reflect it. I think it's fine if you are aware of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx.log.children().isEmpty()
would be enough but added reference equality for double check.
ctx.endpoint() == endpoint && ctx.log.children().isEmpty()
If a derivation is not an initial attempt, acquireEventloop()
needs to be called even for the same Endpoint
to update the internal state of DefaultEventLoopScheduler
.
core/src/main/java/com/linecorp/armeria/internal/client/DefaultClientRequestContext.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/client/ClientRequestContextBuilder.java
Outdated
Show resolved
Hide resolved
...rc/test/java/com/linecorp/armeria/internal/client/DerivedClientRequestContextClientTest.java
Outdated
Show resolved
Hide resolved
if (ctx.endpoint() == endpoint || endpoint == null) { | ||
eventLoop = ctx.eventLoop().withoutContext(); | ||
} else { | ||
acquireEventLoop(endpoint); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we just always call acquireEventLoop()
? (maybe except when endpoint == null
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like a good idea to update the lastActivityTimeNanos
.
However if we always call acquireEventLoop()
, the number of active requests may be increased for the initial attempt. Let me add an exception for the initial attempt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, when ctx.endpoint() == endpoint
and the maxNumEventLoopPerEndpoint
is greater than 1, the different eventLoop might be used which is inefficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the different eventLoop might be used which is inefficient.
Although it is inefficient I think it would be good to try a different connection than the connection that has already failed.
And can we also add an integration test that involves |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5290 +/- ##
===========================================
+ Coverage 0 73.95% +73.95%
- Complexity 0 20104 +20104
===========================================
Files 0 1730 +1730
Lines 0 74139 +74139
Branches 0 9460 +9460
===========================================
+ Hits 0 54831 +54831
- Misses 0 14832 +14832
- Partials 0 4476 +4476
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy with the changes once the CI passes 👍 Thanks @ikhoon 🙇 👍 🙇
I've reverted all changes in Let me handle it in a separate PR to build a RequestContext with an |
By the way, was this ever commented on/addressed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just once nit - LGTM 🙇
core/src/test/java/com/linecorp/armeria/client/retry/RetyingClientEventLoopSchedulerTest.java
Outdated
Show resolved
Hide resolved
Commented. JFYI, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this. 🙇
Motivation:
Armeria's
HttpChannelPool
is bound to anEventLoop
. DifferentEventLoop
s have differentHttpChannelPool
s. In other words, in order to reuse a connection for anEndpoint
, the sameEventLoop
must be selected.When creating a derived client in
RetryingClient
, a new endpoint is selected for each retry, but since theEventLoop
of the parent is used as is. That causes theEndpoint
can't use the existing connection pool for multiplexing and makes a new connection.Modifications:
EventLoopScheduler
to return constantEventLoop
s for the same endpoint.EndpointGroup
inClientRequestContextBuilder
for testability.Result:
RetryingClient
withEndpointGroup
.