Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: compute quorum commit index instead of using raft.Status.Commit #10482

Conversation

petermattis
Copy link
Collaborator

@petermattis petermattis commented Nov 5, 2016

Compute the index at which a quorum of nodes have committed instead of using raft.Status.Commit. The latter can be in advance of the computed quorum commit index just after a replica has been added to the group. And by computing the value ourselves we can include the pending snapshot index in the computation.

Fixes #10409


This change is Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 5, 2016

:lgtm:


Reviewed 2 of 2 files at r1.
Review status: all files reviewed at latest revision, all discussions resolved, some commit checks pending.


Comments from Reviewable

@bdarnell
Copy link
Contributor

bdarnell commented Nov 6, 2016

:lgtm_strong: as a quick fix for the current issues, although I think there are times where we'd want to truncate the logs even at the cost of a second snapshot, just like we don't require that 100% of the current replicas are up to date. The best way to do this may be to pass the pending snapshot index into getBehindIndex, where it gets appended to the list of log positions from the current replicas (treating the pending snapshot as if it's already been added to the configuration)


Review status: :shipit: all files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@petermattis
Copy link
Collaborator Author

The best way to do this may be to pass the pending snapshot index into getBehindIndex, where it gets appended to the list of log positions from the current replicas (treating the pending snapshot as if it's already been added to the configuration)

Don't we already effectively accomplish this by processing pendingSnapshotIndex after calling getBehindIndex in computeTruncatableIndex?


Review status: :shipit: all files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@petermattis petermattis force-pushed the pmattis/preserve-raft-log-during-upreplication branch from edc9721 to 567ef20 Compare November 6, 2016 20:11
@petermattis petermattis changed the title storage: preserve raft log during up-replication storage: compute quorum commit index instead of using raft.Status.Commit Nov 6, 2016
@petermattis
Copy link
Collaborator Author

Rewrote this so that we compute the quorum commit index ourselves and include the pending snapshot index in the computation. That accomplishes the same effect as the previous incarnation of this PR. PTAL.

@tamird
Copy link
Contributor

tamird commented Nov 6, 2016

Reviewed 1 of 2 files at r2.
Review status: 1 of 2 files reviewed at latest revision, 2 unresolved discussions, some commit checks pending.


pkg/storage/raft_log_queue.go, line 151 at r2 (raw file):

      // way that will require that new replica to be caught up via a Raft
      // snapshot.
      if pendingSnapshotIndex > 0 && truncatableIndex > pendingSnapshotIndex {

why is this still necessary? getQuorumIndex appears to take this into account now.


pkg/storage/raft_log_queue.go, line 159 at r2 (raw file):

      truncatableIndex = firstIndex
  }
  // Never truncate past the quorum committed index.

i think you can remove this now; it is no longer possible for this case to occur.


Comments from Reviewable

@petermattis
Copy link
Collaborator Author

Review status: 1 of 2 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


pkg/storage/raft_log_queue.go, line 151 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

why is this still necessary? getQuorumIndex appears to take this into account now.

There is a case in `TestComputeTruncatableIndex` which demonstrates the need. The `quorumIndex` might be greater than `pendingSnapshotIndex` if `pendingSnapshotIndex` is the furthest behind. We still want to maintain the raft log for the `pendingSnapshotIndex` if the raft log is small enough.

pkg/storage/raft_log_queue.go, line 159 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

i think you can remove this now; it is no longer possible for this case to occur.

I don't think it could have occurred before. I think of this more as a sanity check than a required bit of code.

Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 6, 2016

Review status: 1 of 2 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


pkg/storage/raft_log_queue.go, line 151 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

There is a case in TestComputeTruncatableIndex which demonstrates the need. The quorumIndex might be greater than pendingSnapshotIndex if pendingSnapshotIndex is the furthest behind. We still want to maintain the raft log for the pendingSnapshotIndex if the raft log is small enough.

I see. This formula is pretty confusing; all this code wants is the minimum of a bunch of `uint64`s, naively:
a := uint64Slice{quorumIndex}
if pendingSnapshotIndex >0 {
  a = append(a, pendingSnapshoIndex)
}
for _, p := range raftStatus.Progress {
  a = append(a, p.Match)
}
sort.Sort(a)
idx := a[0]

The quantity of functions and control flow employed here to express this outcome bizarrely outweighs the conceptual complexity. I guess I'm arguing for the removal of getBehindindex as that method does nothing to enhance readability though it does currently have unit test coverage (which are also of dubious value compared to tests of getTruncatableIndex).


pkg/storage/raft_log_queue.go, line 159 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I don't think it could have occurred before. I think of this more as a sanity check than a required bit of code.

OK. I find this pretty confusing, but I won't insist.

pkg/storage/raft_log_queue.go, line 166 at r2 (raw file):

}

// getBehindIndex returns the raft log index of the oldest node or the quorum

this comment is kind of odd now; this function doesn't compute the quorum commit index, it just trusts the caller. Perhaps a more accurate phrasing would be "returns the minimum of the raft log indices and the supplied quorum index".


Comments from Reviewable

@petermattis petermattis force-pushed the pmattis/preserve-raft-log-during-upreplication branch from 567ef20 to 2d9708b Compare November 6, 2016 21:01
@petermattis
Copy link
Collaborator Author

Review status: 0 of 2 files reviewed at latest revision, 2 unresolved discussions, some commit checks pending.


pkg/storage/raft_log_queue.go, line 151 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I see. This formula is pretty confusing; all this code wants is the minimum of a bunch of uint64s, naively:

a := uint64Slice{quorumIndex}
if pendingSnapshotIndex >0 {
  a = append(a, pendingSnapshoIndex)
}
for _, p := range raftStatus.Progress {
  a = append(a, p.Match)
}
sort.Sort(a)
idx := a[0]

The quantity of functions and control flow employed here to express this outcome bizarrely outweighs the conceptual complexity. I guess I'm arguing for the removal of getBehindindex as that method does nothing to enhance readability though it does currently have unit test coverage (which are also of dubious value compared to tests of getTruncatableIndex).

What you have is functionality equivalent to the existing code, though slower because it builds up a slice. I'm sure there are various ways we can make this code more elegant. Do you really think we should spend time doing so? It's not like the previous code was buggy because if its structure. It was working exactly as advertised.

pkg/storage/raft_log_queue.go, line 166 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this comment is kind of odd now; this function doesn't compute the quorum commit index, it just trusts the caller. Perhaps a more accurate phrasing would be "returns the minimum of the raft log indices and the supplied quorum index".

Done.

Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 6, 2016

Reviewed 1 of 2 files at r2, 1 of 1 files at r3.
Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks pending.


pkg/storage/raft_log_queue.go, line 151 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

What you have is functionality equivalent to the existing code, though slower because it builds up a slice. I'm sure there are various ways we can make this code more elegant. Do you really think we should spend time doing so? It's not like the previous code was buggy because if its structure. It was working exactly as advertised.

Yes, it is slower - that's why I used the word "naively". Do I think we should clarify this code? Well, yes, since the cognitive overhead of parsing it is a tax paid on every visit.

Such a simplification shouldn't hold up this change (I've already approved it), but I mention it here because you (and now I) have already paid the parsing tax.


pkg/storage/raft_log_queue_test.go, line 63 at r3 (raw file):

}

func TestGetQuorumIndex(t *testing.T) {

as with TestGetBehindIndex, I am not sure this test provides much value; it is explicitly a subset of TestComputeTruncatableIndex


Comments from Reviewable

@petermattis petermattis force-pushed the pmattis/preserve-raft-log-during-upreplication branch from 2d9708b to 15bdc87 Compare November 6, 2016 21:27
@petermattis
Copy link
Collaborator Author

Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


pkg/storage/raft_log_queue.go, line 151 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Yes, it is slower - that's why I used the word "naively". Do I think we should clarify this code? Well, yes, since the cognitive overhead of parsing it is a tax paid on every visit.

Such a simplification shouldn't hold up this change (I've already approved it), but I mention it here because you (and now I) have already paid the parsing tax.

Ok, I inlined `getBehindIndex`.

pkg/storage/raft_log_queue_test.go, line 63 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

as with TestGetBehindIndex, I am not sure this test provides much value; it is explicitly a subset of TestComputeTruncatableIndex

True, but unlike `getBehindIndex`, `getQuorumIndex` is a bit more involved and worth testing independently IMO.

Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 6, 2016

Reviewed 2 of 2 files at r4.
Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks pending.


pkg/storage/raft_log_queue.go, line 151 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Ok, I inlined getBehindIndex.

Thanks.

pkg/storage/raft_log_queue.go, line 144 at r4 (raw file):

  if raftLogSize <= targetSize {
      // Only truncate to one of the behind indexes if the raft log is less than

s/behind/follower/?


Comments from Reviewable

Compute the index at which a quorum of nodes have committed instead of
using raft.Status.Commit. The latter can be in advance of the computed
quorum commit index just after a replica has been added to the
group. And by computing the value ourselves we can include the pending
snapshot index in the computation.

Fixes cockroachdb#10409
@petermattis petermattis force-pushed the pmattis/preserve-raft-log-during-upreplication branch from 15bdc87 to 561398f Compare November 6, 2016 22:16
@petermattis
Copy link
Collaborator Author

Review status: 1 of 2 files reviewed at latest revision, 1 unresolved discussion.


pkg/storage/raft_log_queue.go, line 144 at r4 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

s/behind/follower/?

Done.

Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 6, 2016

Reviewed 1 of 1 files at r5.
Review status: :shipit: all files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@petermattis petermattis merged commit 09c0f88 into cockroachdb:master Nov 7, 2016
@petermattis petermattis deleted the pmattis/preserve-raft-log-during-upreplication branch November 7, 2016 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants