Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: add fast-path for ticking quiesced/dormant replicas #17617

Merged
merged 1 commit into from
Aug 14, 2017

Conversation

petermattis
Copy link
Collaborator

Add fast-path for ticking quiesced/dormant replicas which avoids going
through the Raft scheduler and avoids grabbing Replica.raftMu which can be
held for significant periods of time.

Fixes #17609

@petermattis petermattis requested a review from a team August 12, 2017 13:52
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@petermattis petermattis requested a review from bdarnell August 12, 2017 13:52
@petermattis
Copy link
Collaborator Author

@bdarnell If there are caveats to changing the Raft tick settings as is done by the first commit, I'll document the rationale behind the current settings thoroughly.

@bdarnell
Copy link
Contributor

:lgtm:

I described the issues with changing the raft values in #17609 (comment). I don't think we've done any experimentation to show whether the coarser randomization of election timeouts makes a difference in practice.

I'd rather just do the fast-path for now and leave the constants alone.


Review status: 0 of 3 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


pkg/base/config.go, line 77 at r2 (raw file):

var defaultRaftElectionTimeoutTicks = envutil.EnvOrDefaultInt(
	"COCKROACH_RAFT_ELECTION_TIMEOUT_TICKS", 3)

This change is backwards-incompatible for anyone who might have been using this environment variable. (Have we recommended this to anyone or have we just been using it internally? It was added in #15781)


Comments from Reviewable

} else if r.mu.quiescent {
done = true
if tickQuiesced {
// NB: It is safe to call TickQuiesced without holding Replica.raftMu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You couldn't have two goroutines trying to tick the replica at the same time and thus clobber the counter in undefined ways?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you hold the replica mu. Nevermind.

@petermattis petermattis force-pushed the pmattis/raft-ticking branch from 36f9837 to 48fe84d Compare August 14, 2017 15:33
@petermattis
Copy link
Collaborator Author

Removed the first commit, leaving just the fast-path for ticking.


Review status: 0 of 2 files reviewed at latest revision, 2 unresolved discussions.


pkg/base/config.go, line 77 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This change is backwards-incompatible for anyone who might have been using this environment variable. (Have we recommended this to anyone or have we just been using it internally? It was added in #15781)

Removed this commit from the PR.


Comments from Reviewable

@petermattis petermattis force-pushed the pmattis/raft-ticking branch from 48fe84d to a44aa8e Compare August 14, 2017 15:52
@tamird
Copy link
Contributor

tamird commented Aug 14, 2017

:lgtm:


Reviewed 1 of 2 files at r2, 2 of 2 files at r3.
Review status: 0 of 2 files reviewed at latest revision, 4 unresolved discussions.


pkg/storage/replica.go, line 3560 at r3 (raw file):

			// because that method simply increments a counter without performing any
			// other logic.
			r.mu.internalRaftGroup.TickQuiesced()

it's a shame that this same logic is in tickRaftMuLocked.


pkg/storage/store.go, line 3593 at r3 (raw file):

				// disruption. Replica.maybeTickQuiesced only grabs short-duration
				// locks and not locks that are held during disk I/O.
				if r := v.(*Replica); r.maybeTickQuiesced() {

nit: what's the point of binding r? this looked like a checked type assertion to me briefly. could just be if v.(*Replica).maybeTickQuiesced() {

while we're at it, it's perhaps slightly clearer as:

if !v.(*Replica).maybeTickQuiesced() {
  rangeIDs = append(rangeIDs, k.(roachpb.RangeID))
}
return true

Comments from Reviewable

Add fast-path for ticking quiesced/dormant replicas which avoids going
through the Raft scheduler and avoids grabbing Replica.raftMu which can
be held for significant periods of time.

Fixes cockroachdb#17609
@petermattis petermattis force-pushed the pmattis/raft-ticking branch from a44aa8e to 2186866 Compare August 14, 2017 15:56
@petermattis
Copy link
Collaborator Author

Review status: 0 of 2 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.


pkg/storage/replica.go, line 3560 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

it's a shame that this same logic is in tickRaftMuLocked.

I feel no shame about it. Trying to share the code seems more trouble than its worth (IMO).


pkg/storage/store.go, line 3593 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit: what's the point of binding r? this looked like a checked type assertion to me briefly. could just be if v.(*Replica).maybeTickQuiesced() {

while we're at it, it's perhaps slightly clearer as:

if !v.(*Replica).maybeTickQuiesced() {
  rangeIDs = append(rangeIDs, k.(roachpb.RangeID))
}
return true

Done.


Comments from Reviewable

@petermattis petermattis merged commit bd6ed9e into cockroachdb:master Aug 14, 2017
@petermattis petermattis deleted the pmattis/raft-ticking branch August 14, 2017 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants