Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: busy loop through resolver list during join process #53713

Merged
merged 2 commits into from
Sep 4, 2020

Conversation

irfansharif
Copy link
Contributor

Deferred doing this in #52526. Probably a good idea to do have it, it'll
bring down the cluster convergence time (time taken for all nodes to
find out about the initialization) by a bit.

Release justification: low risk, high benefit changes to existing functionality
Release note: None

@irfansharif irfansharif requested a review from tbg August 31, 2020 22:02
@cockroach-teamcity
Copy link
Member

This change is Reviewable

pkg/server/init.go Outdated Show resolved Hide resolved
return nil
}

if errors.Is(err, ErrJoinRPCUnsupported) || errors.Is(err, ErrIncompatibleBinaryVersion) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should handle ctx.Done() and ctx.Err() here too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was an intentional omission. This API is a bit crufty as written (I've just added a TODO for what needs to change), but basically as written it needs to return nil errors on context cancellation. Which is what we're doing below. Here I just wanted to avoid doing a non-nil error check and return a nil error anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea there were some pretty crappy hacks in there, so I'll wait for #53934 to land and have this sit on top of that instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I'll let you rebase and then we can wrap this up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, PTAL.

@irfansharif irfansharif force-pushed the 200831.join-busy-loop branch from 6aa2985 to c6b518f Compare September 3, 2020 14:58
Previously we used two separate channels to relay the results of using
the join RPC, one for possible error in attempting to do and one for the
actual result if successful. There was no real reason for this, and the
code gluing them together was a bit fragile (and hid a latent bug).

Previously we could deadlock if the select loop in ServeAndWait, for a
successful join attempt, drained the errCh before draining joinCh.
Because the successful joinCh drain then followed up with trying to
drain the errCh (already drained by now), it would just block
indefinitely.

We just simplify all this to using one channel to encapsulate the
results of the join attempt.

Release justification: bug fixes and low-risk updates to new functionality
Release note: None
Deferred doing this in cockroachdb#52526. Probably a good idea to do have it, it'll
bring down the cluster convergence time (time taken for all nodes to
find out about the initialization) by a bit.

Release justification: low risk, high benefit changes to existing functionality
Release note: None
@irfansharif irfansharif force-pushed the 200831.join-busy-loop branch from c6b518f to c6bc053 Compare September 4, 2020 17:37
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

@irfansharif
Copy link
Contributor Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 4, 2020

Build succeeded:

@craig craig bot merged commit 7cd72a7 into cockroachdb:master Sep 4, 2020
@irfansharif irfansharif deleted the 200831.join-busy-loop branch September 4, 2020 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants