-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(go-client): loopForRequest not return and retry forever #1444
fix(go-client): loopForRequest not return and retry forever #1444
Conversation
should update config and change target
During test, we found that there is still timeout exists, thus locate another problem--sdk should not retry on |
Futhermore, if we remove user-side call of log:
|
Codecov Report
@@ Coverage Diff @@
## master #1444 +/- ##
=========================================
Coverage ? 53.58%
=========================================
Files ? 27
Lines ? 2538
Branches ? 0
=========================================
Hits ? 1360
Misses ? 1132
Partials ? 46 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution~
What problem does this PR solve?
fix #1385
What is changed and how does it work?
expected behavior:
after replica server restart,The client should be able to reconnect with node without considering the client configuration update (reselecting the primary replica).
Actually, client configuration update is another issue we should talk about. I'd like to discuss that in another issue.
In this pr, we focus on why client cannot reconnect with restarted server.
SDK use several loop to monitor rpc.conn, when server is closed,
loopForDialing
will continue to retry dialling until connected with server(when server is restart),then, two new looploopForRequest
&loopForResponse
will be created to handle request.however,
loopForRequest
will not return when correlativeloopForResponse
returned because of IsNetworkClosed(EOF), since latter only return nil and will not shutdown tom. thus, there will be more aliveloopForRequest
thanloopForResponse
in this case.SDK retried not on timeout err, however, we wrapped timeout err incorrectly in here
incubator-pegasus/go-client/session/session.go
Line 338 in d16e65c
multiset
will continue to do retry here since err we passed to outer is no longer timeout errincubator-pegasus/go-client/pegasus/table_connector.go
Line 719 in d16e65c
Checklist
monitor capture based on this pr:
timeout is eliminated after server restart.