-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rls: fix child lb leak when client channel is shutdown #8750
Conversation
@sergiitk A gentle ping. |
Per IRL discussion, this needs more testing/fixing, so the review is on hold. |
Oh, I misunderstood you. I just meant it's higher priority than the other RLS PR. It's ready for review. The test was added (https://github.com/grpc/grpc-java/pull/8750/files#diff-14b2d9691a6f8fc60385a0bcf960a20b51b7415d9119e02e22d17cccc8103349R176). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions
@@ -346,6 +392,7 @@ public ChildPolicyWrapper returnObject(Object object) { | |||
long newCnt = refCnt.decrementAndGet(); | |||
checkState(newCnt != -1, "Cannot return never pooled childPolicyWrapper"); | |||
if (newCnt == 0) { | |||
childPolicyWrapper.shutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this possible that in concurrent environment one thread sets childPolicyWrapper = null
, and another will call childPolicyWrapper.shutdown()
causing NPE?
I'm not sure if AtomicLong
volatility visibility guarantee is sufficient. IRRC because childPolicyWrapper = null
comes after the write to the volatile variable, it's not guaranteed to flush immediately.
Should returnObject()
be fully synchronized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the concurrency issue in 3ae4e3d
new Runnable() { | ||
@Override | ||
public void run() { | ||
lb.shutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care if this operation competes before returning a ChildPolicyWrapper object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! There's a (pre-existing) concurrency issue with RefCountedChildPolicyWrapperFactory.createOrGet()
and release()
. As long as they become thread-safe, it doesn't matter when shutdown()
operation completes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the concurrency issue in 3ae4e3d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Plug those pesky leaks 💧 😄
@sergiitk Thank you so much for the review. I know this PR as well as RLS is so complex and hard to read. |
When client channel is shutting down, the RlsLoadBalancer is shutting down. However, the child loadbalancers of RlsLoadBalancer are not shut down. This is causing the issue b/209831670
When client channel is shutting down, the RlsLoadBalancer is shutting down. However, the child loadbalancers of RlsLoadBalancer are not shut down. This is causing the issue b/209831670
When client channel is shutting down, the RlsLoadBalancer is shutting down. However, the child loadbalancers of RlsLoadBalancer are not shut down. This is causing the issue b/209831670
This PR contains two commits and it will be easier to review separately. The 1st commit change the lifecycle of child lb to make it easier to manage. The 2nd commit is to shut down all child lbs when the parent lb is shut down.