Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Avoid holding raftMu during evaluation #15935

Merged
merged 2 commits into from
May 16, 2017

Conversation

bdarnell
Copy link
Contributor

Now that the command-queue issues in #10084 have been fixed, the only
reason to hold raftMu during evaluation was because it controlled
access to the replicaStateLoader (and lock-ordering requirements
forced us to give the lock a broad scope). By forking the
replicaStateLoader into several copies (one protected by Replica.mu
and one by Replica.raftMu, plus a few temporary loaders created off
the critical path), we can reduce the scope of this coarse-grained
lock (which in turn facilitates #15802).

@bdarnell bdarnell requested review from tbg and petermattis May 15, 2017 22:02
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg
Copy link
Member

tbg commented May 15, 2017

:lgtm: This should give a performance bump in the hotspot case (when the hotspot isn't sufficiently split).

@tamird would be interested in seeing whether your in-the-makings benchmark stuff can measure the improvement that no doubt exists.


Reviewed 9 of 9 files at r1.
Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


pkg/storage/replica_raftstorage.go, line 54 at r1 (raw file):

// InitialState implements the raft.Storage interface.
// InitialState requires that the replica lock be held.

while you're here: "the replica lock" is awfully unspecific. From looking at the body it looks like we need both locks.


pkg/storage/replica_raftstorage.go, line 236 at r1 (raw file):

// first current entry. This includes both entries that have been compacted away
// and the dummy entries that make up the starting point of an empty log.
// raftTruncatedStateLocked requires that the replica lock be held.

ditto.


pkg/storage/replica_raftstorage.go, line 435 at r1 (raw file):

// r.mu.lastIndex and r.mu.raftLogSize, and returns new values. We do this
// rather than modifying them directly because these modifications need to be
// atomic with the commit of the batch.

Lock comment here also wouldn't hurt. The comment applies elsewhere as well (I'm not going to repeat it).


pkg/storage/store.go, line 3039 at r1 (raw file):

		r.mu.Unlock()

		appliedIndex, _, err := r.raftMu.stateLoader.loadAppliedIndex(ctx, r.store.Engine())

Not having looked at this code in a while, I kept scrolling up to learn about raftMu being held, but never got satisfied.


Comments from Reviewable

@petermattis
Copy link
Collaborator

:lgtm:


Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


pkg/storage/replica_raftstorage.go, line 243 at r1 (raw file):

		return *r.mu.state.TruncatedState, nil
	}
	ts, err := r.mu.stateLoader.loadTruncatedState(ctx, r.store.Engine())

Should this be raftMu? I thought the raft.Storage routines were always called with raftMu held, not mu.


Comments from Reviewable

@bdarnell
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


pkg/storage/replica_raftstorage.go, line 54 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

while you're here: "the replica lock" is awfully unspecific. From looking at the body it looks like we need both locks.

Changed this method to only need r.mu. Updated all comments in this file to be more precise. (we need fewer locks than you'd think here)


pkg/storage/replica_raftstorage.go, line 243 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Should this be raftMu? I thought the raft.Storage routines were always called with raftMu held, not mu.

Methods in the raft.Storage interface are always called with both locks (when called by raft), but this method is not a part of raft.Storage. It definitely holds r.mu because it is referenced on the preceding lines.


pkg/storage/store.go, line 3039 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Not having looked at this code in a while, I kept scrolling up to learn about raftMu being held, but never got satisfied.

There's a defer raftMu.Unlock() at the start of the function. It's subtle because getOrCreateReplica returns a replica whose raftMu is already locked.


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: 8 of 9 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


pkg/storage/replica_raftstorage.go, line 243 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Methods in the raft.Storage interface are always called with both locks (when called by raft), but this method is not a part of raft.Storage. It definitely holds r.mu because it is referenced on the preceding lines.

I might be missing something (I often am), but I can't see where we're holding Replica.mu in evalTruncateLog which calls FirstIndex. I would expect the following diff to cause deadlock if we were already holding Replica.mu (it does not):

diff --git a/pkg/storage/replica_state.go b/pkg/storage/replica_state.go
index 47910f8..9269996 100644
--- a/pkg/storage/replica_state.go
+++ b/pkg/storage/replica_state.go
@@ -619,6 +619,8 @@ func (rec ReplicaEvalContext) pushTxnQueue() *pushTxnQueue {

 // FirstIndex returns the oldest index in the raft log.
 func (rec ReplicaEvalContext) FirstIndex() (uint64, error) {
+       rec.repl.mu.Lock()
+       defer rec.repl.mu.Unlock()
        return rec.repl.FirstIndex()
 }

Comments from Reviewable

@tbg
Copy link
Member

tbg commented May 16, 2017

Review status: 8 of 9 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


pkg/storage/replica_raftstorage.go, line 243 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I might be missing something (I often am), but I can't see where we're holding Replica.mu in evalTruncateLog which calls FirstIndex. I would expect the following diff to cause deadlock if we were already holding Replica.mu (it does not):

diff --git a/pkg/storage/replica_state.go b/pkg/storage/replica_state.go
index 47910f8..9269996 100644
--- a/pkg/storage/replica_state.go
+++ b/pkg/storage/replica_state.go
@@ -619,6 +619,8 @@ func (rec ReplicaEvalContext) pushTxnQueue() *pushTxnQueue {

 // FirstIndex returns the oldest index in the raft log.
 func (rec ReplicaEvalContext) FirstIndex() (uint64, error) {
+       rec.repl.mu.Lock()
+       defer rec.repl.mu.Unlock()
        return rec.repl.FirstIndex()
 }

We erroneously don't grab r.mu in evalTruncateLock. I think the call there needs to be GetFirstIndex() (which grabs the lock), and then you'd get your deadlock. Not sure about other call paths.


Comments from Reviewable

bdarnell added 2 commits May 16, 2017 12:23
Now that the command-queue issues in cockroachdb#10084 have been fixed, the only
reason to hold raftMu during evaluation was because it controlled
access to the replicaStateLoader (and lock-ordering requirements
forced us to give the lock a broad scope). By forking the
replicaStateLoader into several copies (one protected by Replica.mu
and one by Replica.raftMu, plus a few temporary loaders created off
the critical path), we can reduce the scope of this coarse-grained
lock (which in turn facilitates cockroachdb#15802).
@bdarnell
Copy link
Contributor Author

Review status: 8 of 9 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


pkg/storage/replica_raftstorage.go, line 243 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

We erroneously don't grab r.mu in evalTruncateLock. I think the call there needs to be GetFirstIndex() (which grabs the lock), and then you'd get your deadlock. Not sure about other call paths.

Yep, that's a bug. It was fine before because all evaluation was under raftMu. We don't appear to do enough log truncation in tests for testrace to turn this up. Also, GetFirstIndex needs to get a write lock and not a read lock.


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: 7 of 9 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.


pkg/storage/replica_raftstorage.go, line 243 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Yep, that's a bug. It was fine before because all evaluation was under raftMu. We don't appear to do enough log truncation in tests for testrace to turn this up. Also, GetFirstIndex needs to get a write lock and not a read lock.

Can we rename FirstIndex to FirstIndexLocked?


Comments from Reviewable

@bdarnell
Copy link
Contributor Author

Review status: 7 of 9 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.


pkg/storage/replica_raftstorage.go, line 243 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Can we rename FirstIndex to FirstIndexLocked?

No, the name FirstIndex is imposed by the etcd/raft.Storage interface. We could do some refactoring to define these methods on a type alias instead of Replica so they're not visible under those names elsewhere.


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: 7 of 9 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


pkg/storage/replica_raftstorage.go, line 243 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

No, the name FirstIndex is imposed by the etcd/raft.Storage interface. We could do some refactoring to define these methods on a type alias instead of Replica so they're not visible under those names elsewhere.

Doh, I forgot that FirstIndex is part of raft.Storage. Let's file an issue about said refactoring, and just fix the problem in a straightforward manner in this PR.


Comments from Reviewable

@bdarnell bdarnell merged commit 8f94ee0 into cockroachdb:master May 16, 2017
@bdarnell bdarnell deleted the narrow-raftmu branch May 16, 2017 17:08
@tbg
Copy link
Member

tbg commented May 16, 2017

Reviewed 2 of 2 files at r3, 1 of 1 files at r4.
Review status: all files reviewed at latest revision, 6 unresolved discussions, all commit checks successful.


pkg/storage/replica_raftstorage.go, line 54 at r2 (raw file):

// InitialState implements the raft.Storage interface.
// InitialState requires that r.mu is held.

my understanding now is that this method in itself doesn't require raftMu held, but in practice, raftMu would be held, plus if you didn't hold raftMu while calling this your initial state isn't worth much because it might change from under you any time. Is that about right?


pkg/storage/replica_raftstorage.go, line 73 at r2 (raw file):

// and this method will always return at least one entry even if it exceeds
// maxBytes. Passing maxBytes equal to zero disables size checking.
// Entries requires that the replica lock is held.

"replica lock"


pkg/storage/replica_raftstorage.go, line 236 at r2 (raw file):

// first current entry. This includes both entries that have been compacted away
// and the dummy entries that make up the starting point of an empty log.
// raftTruncatedStateLocked requires that the replica lock be held.

"replica lock"


Comments from Reviewable

@bdarnell
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 6 unresolved discussions, all commit checks successful.


pkg/storage/replica_raftstorage.go, line 54 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

my understanding now is that this method in itself doesn't require raftMu held, but in practice, raftMu would be held, plus if you didn't hold raftMu while calling this your initial state isn't worth much because it might change from under you any time. Is that about right?

Yeah, in practice both locks will be held because this has no callers but raft. If it were called anywhere else, I'm not sure that holding both locks would be sufficient; I think this also relies on the fact that it is called only at raft group initialization time.


pkg/storage/replica_raftstorage.go, line 73 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

"replica lock"

You're looking at an older revision; that comment is now removed.


pkg/storage/replica_raftstorage.go, line 236 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

"replica lock"

You're looking at an older revision; it now says r.mu.


Comments from Reviewable

bdarnell added a commit to bdarnell/cockroach that referenced this pull request May 17, 2017
The call to assertStateLocked may fail if called without the raft lock
as the on-disk state may be changed concurrently. This fixes a
regression introduced in cockroachdb#15935.

Fixes cockroachdb#15975
Fixes cockroachdb#15979
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants