Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: allow acquisition from closed quota pools #16413

Merged
merged 1 commit into from
Jun 12, 2017

Conversation

irfansharif
Copy link
Contributor

(actually) Fixes #16376, reverts #16399.

TestRaftRemoveRace touched the short window of time where it was
possible that the lease holder and the raft leader were not the same
replica (raft leadership could change from under us, but the lease
holder stayed steady).

Consider the following sequence of events:

  • the lease holder and the raft leader are co-located
  • 'add replica' commands get queued up on the replicate queue
  • leader replica steps down as leader thus closing the quota pool on
    the leaseholder, because they're one and the same
  • commands get out of the queue, cannot acquire quota because the quota pool is
    closed (on the lease holder) and fail with an error indicating so

We make two observations:

  • quotaPool.close() only takes place when a raft leader is becoming a
    follower and thus causing all ongoing acquisitions to fail
  • Ongoing acquisitions are only taking place on the lease holder replica

The quota pool was implemented in a manner such that it is effectively
disabled
when the lease holder and the range leader are not co-located.
Failing with an error here (now that the raft leader has changed, the
lease holder and raft leader are no longer co-located) runs contrary to this.
What we really want is to "fail open" in this case instead, i.e. allow the
acquisition to proceed as if the quota pool is effectively disabled.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif irfansharif requested a review from bdarnell June 8, 2017 21:23
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I associate broadcast with an operation that frees everyone waiting once, but not permanently. I'm not sure I know a better name, though. Was close() worse?

@@ -122,14 +121,14 @@ func TestQuotaPoolClose(t *testing.T) {
errCh <- qp.acquire(ctx, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well have five of those to test a little more.

@@ -122,14 +121,14 @@ func TestQuotaPoolClose(t *testing.T) {
errCh <- qp.acquire(ctx, 1)
}()

qp.close()
qp.broadcast()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call it twice just to prove that you can.

@@ -122,14 +121,14 @@ func TestQuotaPoolClose(t *testing.T) {
errCh <- qp.acquire(ctx, 1)
}()

qp.close()
qp.broadcast()

select {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also run 5x if you have 5 acquisitions above.

@irfansharif
Copy link
Contributor Author

Review status: 0 of 4 files reviewed at latest revision, 3 unresolved discussions.


pkg/storage/quota_pool_test.go, line 121 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Might as well have five of those to test a little more.

Done.


pkg/storage/quota_pool_test.go, line 124 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

call it twice just to prove that you can.

lol, Done.


pkg/storage/quota_pool_test.go, line 126 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Would also run 5x if you have 5 acquisitions above.

Done.


Comments from Reviewable

@irfansharif
Copy link
Contributor Author

Was close() worse?

guess not, renamed.

(actually) Fixes cockroachdb#16376, reverts cockroachdb#16399.

TestRaftRemoveRace touched the short window of time where it was
possible that the lease holder and the raft leader were not the same
replica (raft leadership could change from under us, but the lease
holder stayed steady).

Consider the following sequence of events:
- the lease holder and the raft leader are co-located
- 'add replica' commands get queued up on the replicate queue
- leader replica steps down as leader thus closing the quota pool (on
  the leaseholder, because they're one and the same)
- commands get out of the queue, cannot acquire quota because the quota pool is
  closed (on the lease holder) and fail with an error indicating so

We make two observations:
- quotaPool.close() only takes place when a raft leader is becoming a
follower and thus causing all ongoing acquisitions to fail
- Ongoing acquisitions are only taking place on the lease holder replica

The quota pool was implemented in a manner such that it is effectively
disabled when the lease holder and the range leader are not co-located.
Failing with an error here (now that the raft leader has changed, the
lease holder and raft leader are no longer co-located) runs contrary to this.
What we really want is to "fail open" in this case instead, i.e. allow the
acquisition to proceed as if the quota pool is effectively disabled.
@irfansharif irfansharif merged commit 7b08c1a into cockroachdb:master Jun 12, 2017
@irfansharif irfansharif deleted the quota-leaseholder branch June 12, 2017 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: TestRaftRemoveRace failed under stress
3 participants