-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: busy loop through resolver list during join process #53713
Conversation
pkg/server/init.go
Outdated
return nil | ||
} | ||
|
||
if errors.Is(err, ErrJoinRPCUnsupported) || errors.Is(err, ErrIncompatibleBinaryVersion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should handle ctx.Done()
and ctx.Err()
here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was an intentional omission. This API is a bit crufty as written (I've just added a TODO for what needs to change), but basically as written it needs to return nil errors on context cancellation. Which is what we're doing below. Here I just wanted to avoid doing a non-nil error check and return a nil error anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea there were some pretty crappy hacks in there, so I'll wait for #53934 to land and have this sit on top of that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I'll let you rebase and then we can wrap this up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, PTAL.
6aa2985
to
c6b518f
Compare
Previously we used two separate channels to relay the results of using the join RPC, one for possible error in attempting to do and one for the actual result if successful. There was no real reason for this, and the code gluing them together was a bit fragile (and hid a latent bug). Previously we could deadlock if the select loop in ServeAndWait, for a successful join attempt, drained the errCh before draining joinCh. Because the successful joinCh drain then followed up with trying to drain the errCh (already drained by now), it would just block indefinitely. We just simplify all this to using one channel to encapsulate the results of the join attempt. Release justification: bug fixes and low-risk updates to new functionality Release note: None
Deferred doing this in cockroachdb#52526. Probably a good idea to do have it, it'll bring down the cluster convergence time (time taken for all nodes to find out about the initialization) by a bit. Release justification: low risk, high benefit changes to existing functionality Release note: None
c6b518f
to
c6bc053
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @tbg)
TFTR! bors r+ |
Build succeeded: |
Deferred doing this in #52526. Probably a good idea to do have it, it'll
bring down the cluster convergence time (time taken for all nodes to
find out about the initialization) by a bit.
Release justification: low risk, high benefit changes to existing functionality
Release note: None