Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: flow control throttling replica operations #15802

Merged
merged 2 commits into from
May 27, 2017

Conversation

irfansharif
Copy link
Contributor

@irfansharif irfansharif commented May 9, 2017

Revisiting #13869.

The leader maintains a pool of "proposal quota". Before proposing a Raft
command, we acquire proposal quota equal to the size of the
command. When all of the healthy followers have committed the entry,
quota equal to the size of the corresponding command is returned to the
pool. We maintain on Replica a map associating storagebase.CmdIDKey to
the corresponding command size. Downstream of raft we extract the same
storagebase.CmdIDKey from the command and append the command size to
quotaReleaseQueue. This queue is subsequently used to return quota back
to the pool once all the followers have caught up.

We only consider "healthy" followers to be those that have "healthy" RPC
connections when determining if a unit of quota should be returned to the
pool.

cc @bdarnell.

@irfansharif irfansharif requested a review from petermattis May 9, 2017 14:59
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@petermattis
Copy link
Collaborator

:lgtm:

Can the call to submitProposalLocked in refreshProposalsLocked foul up the proposal quota adjustments? I think the answer is no because we reset the proposal quota upon Raft leadership changes. Cc @bdarnell


Review status: 0 of 4 files reviewed at latest revision, 2 unresolved discussions.


pkg/storage/client_raft_test.go, line 1685 at r1 (raw file):

// slower replica catching up. By throttling write throughput we avoid having
// to constantly catch up the slower node via snapshots. See #8659.
func TestQuotaPool(t *testing.T) {

Nice test!


pkg/storage/client_raft_test.go, line 1696 at r1 (raw file):

	mtc.replicateRange(rangeID, 1, 2)

	mtc.stores[0].SetRaftLogQueueActive(false)

I'm guessing that you needed to disable the raft log queue because the truncations it was generating was fouling up the counting below. Might want to add a comment to that effect. Or whatever the real reason is.


Comments from Reviewable

@petermattis
Copy link
Collaborator

@tamird This is another PR that I'd be interested in seeing the impact on performance benchmarks.


Review status: 0 of 4 files reviewed at latest revision, 2 unresolved discussions.


Comments from Reviewable

@bdarnell
Copy link
Contributor

bdarnell commented May 9, 2017

:lgtm:


Reviewed 4 of 4 files at r1.
Review status: all files reviewed at latest revision, 3 unresolved discussions.


pkg/storage/replica.go, line 816 at r1 (raw file):

		} else {
			// We're becoming a follower.
			r.mu.proposalQuota = nil

Another thread could be blocked in quotaPool.acquire when we transition to follower. We need to close the old pool's channel to unblock all waiting threads.


Comments from Reviewable

// blocked.
incArgs := incrementArgs([]byte("k"), 1)
for i := 0; i < quota; i++ {
if _, err := client.SendWrapped(context.Background(), repl1, incArgs); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this already block if some other moving part in the system gets a proposal in?

@tamird
Copy link
Contributor

tamird commented May 9, 2017

Indeed. Good stuff, @irfansharif!


Reviewed 4 of 4 files at r1.
Review status: all files reviewed at latest revision, 11 unresolved discussions.


pkg/storage/client_raft_test.go, line 1696 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I'm guessing that you needed to disable the raft log queue because the truncations it was generating was fouling up the counting below. Might want to add a comment to that effect. Or whatever the real reason is.

also, consider doing this by iterating through the stores:

for _, store := range mtc.stores {
  store.SetRaftLogQueueActive(false)
}

pkg/storage/client_raft_test.go, line 1707 at r1 (raw file):

	}

	repl1, err := mtc.stores[0].GetReplica(rangeID)

This assumes that replica 1 is the leader. Could you make that more explicit? I think this test may end up flaky under stress.

You might also consider using a more descriptive variable name here, perhaps leaderRepl? Similarly, s/repl3/followerRepl/ below?


pkg/storage/client_raft_test.go, line 1743 at r1 (raw file):

		defer close(ch)
		if _, err := client.SendWrapped(context.Background(), repl1, incArgs); err != nil {
			t.Fatal(err)

It is not permitted to call t.Fatal in the non-main goroutine. Consider changing ch to a chan error and sending the error on it, and then checking it in the main goroutine.


pkg/storage/client_raft_test.go, line 1748 at r1 (raw file):

	}()

	ticker := time.After(15 * time.Millisecond)

there's no need for this local, it can be inlined below


pkg/storage/client_raft_test.go, line 1751 at r1 (raw file):

	select {
	case <-ch:
		t.Fatal(errors.New("write not throttled by the quota pool"))

Fatal just takes interface{}, so you can pass a string.


pkg/storage/quota_pool.go, line 17 at r1 (raw file):

/*
 *
 * Copyright 2014, Google Inc.

where did this code come from?


pkg/storage/quota_pool.go, line 58 at r1 (raw file):

	// TODO(peter): This setting needs additional thought. Should it be adjusted
	// dynamically?
	defaultProposalQuota = 1000

how come this constant is in this file, while it is only used elsewhere?


pkg/storage/quota_pool.go, line 64 at r1 (raw file):

	syncutil.Mutex

	c chan int64

Can you help me understand this structure? It looks like there's always a numeric value parked in this channel, which is read and written under lock -- why do we need a channel at all? Is it just for composition with context cancellation? The pattern is unusual enough that I think it'd benefit from a comment.


Comments from Reviewable

@irfansharif
Copy link
Contributor Author

irfansharif commented May 11, 2017

Thanks! Will be following this up with another PR with quota acquisitions based on the size of the raft log entry itself instead of the unary acquisitions as it is currently.

Note to self re: Tobi's driveby comment we might want to delay proposer evaluation until after acquiring quota, but if we're doing size based quota acquisitions we'd need the 'size' of the evaluation.


Review status: 1 of 4 files reviewed at latest revision, 11 unresolved discussions, some commit checks pending.


pkg/storage/client_raft_test.go, line 1685 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Nice test!

Thanks!


pkg/storage/client_raft_test.go, line 1696 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

also, consider doing this by iterating through the stores:

for _, store := range mtc.stores {
  store.SetRaftLogQueueActive(false)
}

Done (both).


pkg/storage/client_raft_test.go, line 1707 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

This assumes that replica 1 is the leader. Could you make that more explicit? I think this test may end up flaky under stress.

You might also consider using a more descriptive variable name here, perhaps leaderRepl? Similarly, s/repl3/followerRepl/ below?

Done.


pkg/storage/client_raft_test.go, line 1734 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Couldn't this already block if some other moving part in the system gets a proposal in?

yes, at the time of writing the only possible moving parts that could do this are node liveness heartbeats and raft log truncations. This will be revisited once quota acquisitions are based on raft log entry sizes (in bytes). At the interest of keeping the PR small I've added a comment to this affect so to add size based quota in a follow up PR.


pkg/storage/client_raft_test.go, line 1743 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

It is not permitted to call t.Fatal in the non-main goroutine. Consider changing ch to a chan error and sending the error on it, and then checking it in the main goroutine.

Done.


pkg/storage/client_raft_test.go, line 1748 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

there's no need for this local, it can be inlined below

Done.


pkg/storage/client_raft_test.go, line 1751 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Fatal just takes interface{}, so you can pass a string.

Done.


pkg/storage/quota_pool.go, line 17 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

where did this code come from?

grpc-go, added a comment for future readers.


pkg/storage/quota_pool.go, line 58 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

how come this constant is in this file, while it is only used elsewhere?

nit: the constant is only used with the the quota pool defined below, thought it best to keep it in the same place.


pkg/storage/quota_pool.go, line 64 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Can you help me understand this structure? It looks like there's always a numeric value parked in this channel, which is read and written under lock -- why do we need a channel at all? Is it just for composition with context cancellation? The pattern is unusual enough that I think it'd benefit from a comment.

Added a comment. Yes, it helps with composition with context cancellation but also makes for clean acquisition given we're essentially using it as a counting semaphore.


pkg/storage/replica.go, line 816 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Another thread could be blocked in quotaPool.acquire when we transition to follower. We need to close the old pool's channel to unblock all waiting threads.

Done.


Comments from Reviewable

@tbg
Copy link
Member

tbg commented May 11, 2017

Note to self re: Tobi's driveby comment we might want to delay proposer evaluation until after acquiring quota but if we're doing size based quota acquisitions we'd need the 'size' of the evaluation.

One more drive-by comment: there's also a risk in this (PR's) change in that now there's more "waiting" downstream of the command queue. If you already know you're going to take proposal quota, it would make sense to do so before entering the command queue in a future change. That way, you could have thousands of writers blocked on the quota, without negatively impacting other ops that can go ahead (reads).

@irfansharif irfansharif force-pushed the proposal-quota branch 2 times, most recently from ee921a6 to b809275 Compare May 11, 2017 15:03
@tamird
Copy link
Contributor

tamird commented May 11, 2017

Will be following this up with another PR with quota acquisitions based on the size of the raft log entry itself instead of the unary acquisitions as it is currently.

What is the impact of merging this as-is without this follow up work? Perhaps we want to include that here - it doesn't seem likely to me that this stuff will rot in the interim, though I could be convinced otherwise.


Reviewed 3 of 3 files at r2.
Review status: all files reviewed at latest revision, 15 unresolved discussions, some commit checks pending.


pkg/storage/client_raft_test.go, line 1707 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Done.

Heh, we have a utility function for this already.

leaderRepl := mtc.getRaftLeader(rangeID)

pkg/storage/client_raft_test.go, line 1730 at r2 (raw file):

	leaderRepl.SetQuotaPool(quota)

	followerRepl, err := mtc.stores[2].GetReplica(rangeID)

consider avoiding hardcoding 2 here as well:

followerRepl := func() *storage.Replica {  
	for _, store := range mtc.stores {
	  repl, err := store.GetReplica(rangeID)
	  if err != nil {
	    t.Fatal(err)
	  }
	  if repl == leaderRepl {
	    continue
	  }
	  return repl
	}
	return nil
}()
if followerRepl == nil {
  t.Fatal(...)
}

pkg/storage/client_raft_test.go, line 1748 at r2 (raw file):

	// We verify this by writing this many keys and ensuring the next write is
	// blocked.
	// NB: This can block if some other moving part of the system gets a

add an empty comment line above here to preserve the paragraph separation:

// block.
//
// NB:

pkg/storage/client_raft_test.go, line 1752 at r2 (raw file):

	// liveness heartbeats and raft log truncations, both of which are disabled
	// for the purposes of this test.
	// TODO(irfansharif): Once we move to quota acquisitions based on the size

ditto


pkg/storage/client_raft_test.go, line 1766 at r2 (raw file):

		defer close(ch)
		if _, err := client.SendWrapped(context.Background(), leaderRepl, incArgs); err != nil {
			ch <- errors.New("write not throttled by the quota pool")

I think this should be:

_, err := client.SendWrapped(context.Background(), leaderRepl, incArgs)
ch <- err

pkg/storage/client_raft_test.go, line 1773 at r2 (raw file):

	select {
	case err := <-ch:
		t.Fatal(err)

This should be t.Fatalf("write not throttled by the quota pool: err=%v", err)


pkg/storage/quota_pool.go, line 58 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

nit: the constant is only used with the the quota pool defined below, thought it best to keep it in the same place.

Understood, but I think it might be better to put it in the file where it is used, in the interest of reducing context switches.


pkg/storage/quota_pool.go, line 73 at r2 (raw file):

	// We use a channel to 'park' our quota value for easier composition with
	// context cancellation and leadership changes (see quotaPool.acquire).
	// NB: A value of '0' is never allowed to be parked in the

consider adding an empty comment line above.


pkg/storage/quota_pool.go, line 120 at r2 (raw file):

		return nil
	case <-qp.done:
		return errors.New("raft leadership changed, quota pool no longer in use")

I don't think it's appropriate to say that raft leadership changed here; this structure doesn't know anything about raft, all it knows is that it was closed.

You might say that at the caller, though.


pkg/storage/quota_pool.go, line 129 at r2 (raw file):

		close(qp.done)
	}
	qp.done = nil

move inside the conditional?


pkg/storage/replica.go, line 820 at r2 (raw file):

				r.mu.proposalQuota.close()
			}
			r.mu.proposalQuota = nil

you could move this inside the nil check, and make the else an else if


Comments from Reviewable

@tbg
Copy link
Member

tbg commented May 12, 2017

Having looked at this change a little bit together with @irfansharif last night, I'm nervous about acquiring the quota on the client's goroutine but releasing it on application. There are various error conditions which prevent a proposal from going into Raft in the first place (reproposals, etc). What would be lost if the proposal quota were released in the clients goroutine as well? That way, you'd get a simple defer and less error-prone guarantees required between processRaft and the client goroutine. You would release quota an iota later, but I don't see that there's a big issue with that. Or maybe I'm completely missing the point.


Review status: all files reviewed at latest revision, 16 unresolved discussions, some commit checks failed.


pkg/storage/helpers_test.go, line 264 at r2 (raw file):

	r.mu.proposalQuotaBaseIndex = r.mu.lastIndex
	r.mu.proposalQuota = newQuotaPool(quota)

You should drain the old pool if there is one, I think.


pkg/storage/replica.go, line 813 at r2 (raw file):

			// where we acquire quota from the pool. To offset this we reset
			// the quota pool whenever leadership changes hands.
			r.mu.proposalQuota = newQuotaPool(defaultProposalQuota)

is proposalQuota always nil here? If so I wouldn't mind an assertion.


Comments from Reviewable

@tbg
Copy link
Member

tbg commented May 12, 2017

That way, you'd get a simple defer and less error-prone guarantees required between processRaft and the client goroutine. You would release quota an iota later, but I don't see that there's a big issue with that. Or maybe I'm completely missing the point.

Ah yeah, completely missed the point, as @irfansharif kindly reminded me. We figured out that the current code isn't really a sound way of doing things: the goroutine that handles raft.Ready isn't activated on incoming MsgAppResp, and so essentially you would release quota for command n only when n+1 commits, which poses a problem when you don't ever propose n+1 until n releases quota. You need something that gets activated (at least) on MsgAppResp.

This off-by-one only happens with 3+ replicas:

  • 1 appends the command
  • 2 appends the command
  • 1 hears back from itself and 2, commits -> releases quota, but too little since 3 hasn't caught up yet
  • 3 appends the command
  • 1 hears back from 3, but since the command is already committed, no new Ready is emitted.

@irfansharif irfansharif force-pushed the proposal-quota branch 5 times, most recently from 94f4e65 to e7cdb1b Compare May 13, 2017 05:43
@irfansharif
Copy link
Contributor Author

What is the impact of merging this as-is without this follow up work? Perhaps we want to include that here - it doesn't seem likely to me that this stuff will rot in the interim, though I could be convinced otherwise.

No impact, should be fine. I worked on the proposal size based quota acquisition reimplementing quotaPool as seen here for arbitrary sized acquisitions and I'd prefer to introduce that in a separate PR.

What would be lost if the proposal quota were released in the clients goroutine as well?

Yup, ran into this failure for contentious workloads (TestSingleKey kept running out of quota), addressed the failure conditions but returning quota on failure via callback (see returnQuota). One thing I'm not sure about is re: tryAbandon, not sure if I understood the usage completely. 'Abandoning a command only abandons the associated context', do we have to release the acquired quota back in this case? I think not.

off-by-one only happens with 3+ replicas

I've addressed this by returning acquired quota before the early hasReady check. I've revised this in my work for size based quota acquisitions but this should be fine as is.


Review status: 0 of 4 files reviewed at latest revision, 16 unresolved discussions, all commit checks successful.


pkg/storage/helpers_test.go, line 264 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

You should drain the old pool if there is one, I think.

Done, good hygiene.


pkg/storage/quota_pool.go, line 73 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

consider adding an empty comment line above.

Done.


pkg/storage/quota_pool.go, line 120 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I don't think it's appropriate to say that raft leadership changed here; this structure doesn't know anything about raft, all it knows is that it was closed.

You might say that at the caller, though.

Done.


pkg/storage/quota_pool.go, line 129 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

move inside the conditional?

Done.


pkg/storage/replica.go, line 813 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

is proposalQuota always nil here? If so I wouldn't mind an assertion.

Done.


pkg/storage/replica.go, line 820 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

you could move this inside the nil check, and make the else an else if

Done.


Comments from Reviewable

@irfansharif
Copy link
Contributor Author

Additionally PTAL at Replica.propose, I've plumbed the quota acquisition down below but given it's a blocking call I release raftMu before acquisition, lock it again after. This appears to be a safe change to me given we need only hold raftMu during evaluation but I could be missing something. Plumbing it down was motivated by wanting to situate it so that I have access to the proposal size before acquisition.


Review status: 0 of 4 files reviewed at latest revision, 16 unresolved discussions, all commit checks successful.


pkg/storage/client_raft_test.go, line 1707 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Heh, we have a utility function for this already.

leaderRepl := mtc.getRaftLeader(rangeID)

Done.


pkg/storage/client_raft_test.go, line 1730 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

consider avoiding hardcoding 2 here as well:

followerRepl := func() *storage.Replica {  
	for _, store := range mtc.stores {
	  repl, err := store.GetReplica(rangeID)
	  if err != nil {
	    t.Fatal(err)
	  }
	  if repl == leaderRepl {
	    continue
	  }
	  return repl
	}
	return nil
}()
if followerRepl == nil {
  t.Fatal(...)
}

Done.


pkg/storage/client_raft_test.go, line 1748 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

add an empty comment line above here to preserve the paragraph separation:

// block.
//
// NB:

Done.


pkg/storage/client_raft_test.go, line 1752 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

ditto

Done.


pkg/storage/client_raft_test.go, line 1766 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I think this should be:

_, err := client.SendWrapped(context.Background(), leaderRepl, incArgs)
ch <- err

Done.


pkg/storage/client_raft_test.go, line 1773 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

This should be t.Fatalf("write not throttled by the quota pool: err=%v", err)

Done.


Comments from Reviewable

@irfansharif irfansharif force-pushed the proposal-quota branch 2 times, most recently from e5ad92b to 188b45d Compare May 15, 2017 13:24
@tbg
Copy link
Member

tbg commented May 15, 2017

I'll review this afternoon.

@petermattis
Copy link
Collaborator

Review status: 0 of 4 files reviewed at latest revision, 15 unresolved discussions, some commit checks failed.


pkg/storage/replica.go, line 2619 at r3 (raw file):

	// maybeAcquireProposalQuota is a blocking command, will block if there's
	// no quota available. We temporarily unlock the raftMu mutex while waiting
	// for quota to be made available, we lock it immediately after.

@tschottdorf Do you recall why we need to hold raftMu while calling requestToProposal?


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented May 15, 2017

Reviewed 4 of 4 files at r3.
Review status: all files reviewed at latest revision, 11 unresolved discussions, some commit checks failed.


pkg/storage/client_raft_test.go, line 1773 at r3 (raw file):

		defer close(ch)
		_, err := client.SendWrapped(context.Background(), leaderRepl, incArgs)
		ch <- errors.New(err.String())

why can't you just ch <- err?


pkg/storage/client_raft_test.go, line 1791 at r3 (raw file):

	select {
	case <-ch:

I think you still want to check the error here:

case err := <-ch:
  if err != nil {
    t.Fatal(err)
  }

pkg/storage/client_raft_test.go, line 1793 at r3 (raw file):

	case <-ch:
	case <-time.After(15 * time.Millisecond):
		t.Fatal(errors.New("throttled write not unblocked"))

no need for the errors.New here.


pkg/storage/replica.go, line 832 at r3 (raw file):

			// hands.
			if r.mu.proposalQuota != nil {
				log.Fatal(ctx, "proposalQuota was not nil before becoming the leader")

is there any interesting information to be gained by giving proposalQuota a String() string method and including it in this message?


pkg/storage/replica.go, line 885 at r3 (raw file):

the delta which can be released back to the quota
// pool.

There's something wrong with this sentence.


pkg/storage/replica.go, line 2620 at r3 (raw file):

	// no quota available. We temporarily unlock the raftMu mutex while waiting
	// for quota to be made available, we lock it immediately after.
	r.raftMu.Unlock()

it may be clearer to call it maybeAcquireProposalQuotaRaftMuLocked and to handle the unlocking/locking within.


pkg/storage/replica.go, line 2636 at r3 (raw file):

	// proposals that never get cleared.
	if err := r.mu.destroyed; err != nil {
		returnQuota()

this pattern seems very fragile. I think you might want to wrap this bit in a func() error so you can call returnQuota in just one place, or else use a named return so you can check the error in a defer.


Comments from Reviewable

@irfansharif
Copy link
Contributor Author

Review status: 2 of 4 files reviewed at latest revision, 11 unresolved discussions.


pkg/storage/client_raft_test.go, line 1773 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

why can't you just ch <- err?

Whoops, left over from before using chan error. Done.


pkg/storage/client_raft_test.go, line 1791 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I think you still want to check the error here:

case err := <-ch:
  if err != nil {
    t.Fatal(err)
  }

yup, Done.


pkg/storage/client_raft_test.go, line 1793 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

no need for the errors.New here.

Done.


pkg/storage/replica.go, line 832 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

is there any interesting information to be gained by giving proposalQuota a String() string method and including it in this message?

Not really, not that I can think of at least.


pkg/storage/replica.go, line 885 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

the delta which can be released back to the quota
// pool.

There's something wrong with this sentence.

Fixed.


pkg/storage/replica.go, line 2620 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

it may be clearer to call it maybeAcquireProposalQuotaRaftMuLocked and to handle the unlocking/locking within.

Done.


pkg/storage/replica.go, line 2636 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this pattern seems very fragile. I think you might want to wrap this bit in a func() error so you can call returnQuota in just one place, or else use a named return so you can check the error in a defer.

Done (named return, checking the error in a defer).


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented May 15, 2017

Reviewed 2 of 2 files at r4.
Review status: all files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


pkg/storage/client_raft_test.go, line 1773 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Whoops, left over from before using chan error. Done.

ah, it's a different type. we try to name those variables pErr for clarify. Optional.


pkg/storage/client_raft_test.go, line 1791 at r4 (raw file):

	select {
	case err := <-ch:

here too, if you decide to rename.


pkg/storage/replica.go, line 2559 at r4 (raw file):

	endCmds *endCmds,
	spans *SpanSet,
) (ch chan proposalResult, tryAbandon func() bool, err error) {

nit: you're not using the name ch, tryAbandon, they can be left unnamed: (_ chan proposalResult, _ func() bool, err error)


Comments from Reviewable

@tbg
Copy link
Member

tbg commented May 26, 2017

@irfansharif sorry, should've been clearer: I'm fine with use of approximate quota.

@irfansharif irfansharif force-pushed the proposal-quota branch 2 times, most recently from f52cc89 to f32229f Compare May 26, 2017 19:02
@bdarnell
Copy link
Contributor

:lgtm:


Reviewed 2 of 5 files at r18, 5 of 5 files at r19.
Review status: 1 of 9 files reviewed at latest revision, 31 unresolved discussions, some commit checks pending.


pkg/storage/quota_pool.go, line 154 at r19 (raw file):

		case <-ctx.Done():
			qp.Lock()
			// We no longer need to notified but we need to be careful and check

s/to notified/to be notified/


pkg/storage/quota_pool.go, line 158 at r19 (raw file):

			// next acquisition thread and clean up the waiting queue while doing
			// so.
			// Otherwise we simply 'unregister' ourselves from the queue by filling

Why do we fill up the channel instead of setting it to nil or removing it from the slice completely? Both sides of this operation are carried out under the lock.


pkg/storage/quota_pool.go, line 243 at r19 (raw file):

	qp.Lock()
	for i, ch := range qp.queue[1:] {

This loop appears three times; how about a notifyNextLocked method?


Comments from Reviewable

@tbg
Copy link
Member

tbg commented May 26, 2017

:lgtm: (please add the approximate quota in the RangeInfo so that it shows up in _status/ranges/local).

Excited to see this go in!


Reviewed 8 of 8 files at r20, 1 of 1 files at r21, 5 of 5 files at r22, 2 of 2 files at r23, 7 of 7 files at r24.
Review status: all files reviewed at latest revision, 13 unresolved discussions, some commit checks failed.


pkg/storage/quota_pool_test.go, line 275 at r19 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Don't see how it would 'test more', each thread returns twice what it acquired but yes, might as well.

Yeah, don't think I have a real point there. Thinking was that before you weren't ever exceeding the available quota, so less of the internal quota pool code was exercised.


Comments from Reviewable

@petermattis
Copy link
Collaborator

Cc @BramGruneir. Approximate quota should be added to the range debug page.


Review status: all files reviewed at latest revision, 13 unresolved discussions, some commit checks failed.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented May 26, 2017

nit: looks like the last 4 commits should be squashed into one, wdyt?

Apologies for the volume of comments, I missed a few revisions of this work. Excited to see it nearing completion!


Reviewed 1 of 1 files at r6, 10 of 10 files at r10, 7 of 7 files at r14, 3 of 5 files at r15, 8 of 8 files at r20, 1 of 5 files at r22, 7 of 7 files at r24.
Review status: all files reviewed at latest revision, 40 unresolved discussions, some commit checks failed.


pkg/storage/client_raft_test.go, line 1775 at r24 (raw file):

leaderRepl.QuotaAvailable()

does this second call return the same value as the first? might want to extract a local (as you do below)


pkg/storage/client_test.go, line 194 at r20 (raw file):

	rpcContext  *rpc.Context

	nodeIDtoAddrMu *syncutil.RWMutex

can you use the anonymous struct pattern?

nodeIDtoAddrMu struct {
  *syncutil.RWMutex
  nodeIDtoAddr map[roachpb.NodeID]net.Addr
}

Also, does this need to be a RWMutex, or can it just be a plain mutex?


pkg/storage/helpers_test.go, line 260 at r24 (raw file):

leader replica.

lease holder of raft leader?


pkg/storage/helpers_test.go, line 275 at r24 (raw file):

range leader.

raft leader?


pkg/storage/quota_pool.go, line 16 at r24 (raw file):

//
// Author: Irfan Sharif ([email protected])
//

consider separating this from the above header, since this isn't part of our license - it really explains the following header.


pkg/storage/quota_pool.go, line 69 at r24 (raw file):

type quotaPool struct {
	syncutil.Mutex

are all members protected by this mutex? can this use the anonymous struct pattern?

type quotePool struct {
  mu struct {
    syncutil.Mutex

    queue []chan struct{}

    ....
  }
}

pkg/storage/quota_pool.go, line 72 at r24 (raw file):

so as to in order

too many phrasings


pkg/storage/quota_pool.go, line 73 at r24 (raw file):

from a

by a


pkg/storage/quota_pool.go, line 76 at r24 (raw file):

i.e. their

double space

also, can we avoid the term "thread"? either a "waiter", a "channel", or a "goroutine"


pkg/storage/quota_pool.go, line 155 at r24 (raw file):

		select {
		case <-slowTimer.C:
			timeWaitedFor++

you don't reset the timer here, so this will only ever fire once. I think you can do without timeWaitedFor.

If you did want to reset the timer here, note that the value sent on slowTimer.C is the current time, so you could record the start time and then subtract it from the value received on the channel instead of doing this slightly inaccurate arithmetic.


pkg/storage/quota_pool.go, line 161 at r24 (raw file):

to notified

to be notified


pkg/storage/quota_pool.go, line 165 at r24 (raw file):

			// next acquisition thread and clean up the waiting queue while doing
			// so.
			// Otherwise we simply 'unregister' ourselves from the queue by filling

nit (throughout): note that some editors will eagerly re-wrap this unless you add an empty comment line above this line.


pkg/storage/quota_pool.go, line 184 at r24 (raw file):

				// Threads are not a risk of getting notified and finding out
				// they're not first in line.
				for i, ch := range qp.queue[1:] {

I think you can simply this to avoid the extra Unlock and return (applies in all 3 places, I think):

  qp.queue = qp.queue[1:]
chanLoop:
  for _, ch := range qp.queue {
    select {
    case ch <- struct{}{}:
      // The next waiter has been notified; we're done.
      break chanLoop
    default:
      // The next waiter is no longer interested, move on.
      qp.queue = qp.queue[1:]
    }
  }
} else {
  notifyCh <- struct{}{}
}
qp.Unlock()
return ctx.Err()

pkg/storage/quota_pool_test.go, line 86 at r24 (raw file):

	}

	errCh := make(chan error, 1)

throughout: how come buffered?


pkg/storage/quota_pool_test.go, line 88 at r24 (raw file):

	errCh := make(chan error, 1)
	go func() {
		if err := qp.acquire(ctx, 1); !testutils.IsError(err, "context canceled") {

throughout: might be easier to just always send the error and check it at the receiver since you are checking it at the receiver anyhow.

go func() {
  errChan <- gp.acquire(ctx, 1)
}()

pkg/storage/replica.go, line 868 at r24 (raw file):

			r.mu.proposalQuotaBaseIndex = r.mu.lastIndex

			// Raft may propose commands itself (specifically the empty

this comment is now misplaced (it should come after these assertions)


pkg/storage/replica.go, line 874 at r24 (raw file):

			// hands.
			if r.mu.proposalQuota != nil {
				log.Fatal(ctx, "proposalQuota was not nil before becoming the leader")

might be good to include the unexpected value in all of these assertions. possibly nothing to log here, but the others are numeric and possibly useful.


pkg/storage/replica.go, line 886 at r24 (raw file):

		} else if r.mu.proposalQuota != nil {
			// We're either becoming a follower or simply observing a
			// leadership change.

how can proposalQuota be non-nil if we're observing a leadership change (and not becoming a follower)?


pkg/storage/replica.go, line 898 at r24 (raw file):

			return
		}
		panic("unreachable")

this looks like an assertion, but it's randomly a different style than the ones above.

How about:

if r.mu.replicaID == r.mu.leaderID {
  log.Fatal(.....)
}
// We're a follower.
return

pkg/storage/replica.go, line 950 at r24 (raw file):

		// Hence we only process min(delta, len(r.mu.quotaReleaseQueue))
		// quota releases.
		delta := int64(minIndex - r.mu.proposalQuotaBaseIndex)

delta seems redundant with numReleases. can we just have one?


pkg/storage/replica.go, line 956 at r24 (raw file):

		}
		sum := 0
		for i := int64(0); i < numReleases; i++ {

Perhaps this can be:

sum := 0
for _, rel := range r.mu.quotaReleaseQueue[:numReleases] {
  sum += rel
}

the int64 is somewhat confusing as written (it makes me think quotaReleaseQueue is a map).


pkg/storage/replica.go, line 2758 at r24 (raw file):

	// Add size of proposal to commandSizes map.
	if r.mu.commandSizes != nil {

when is it possible for this to be nil?


pkg/storage/replica.go, line 2985 at r24 (raw file):

	//     If we don't release quota back at the end of
	//     handleRaftReadyRaftMuLocked, the next write will get blocked.
	defer func() {

nit: no need for the func here, you can write defer r.updateProposalQuoteRaftMuLocked(ctx, lastLeaderID) since you don't need late binding here.


pkg/storage/replica_test.go, line 6259 at r24 (raw file):

func TestReplicaCancelRaft(t *testing.T) {
	defer leaktest.AfterTest(t)()

nit: looks like an errant addition


pkg/storage/replica_test.go, line 6799 at r24 (raw file):

			repl.mu.Lock()

			if repl.mu.commandSizes != nil {

how come it's ok to proceed if repl.mu.commandSizes is nil?


pkg/storage/replica_test.go, line 6800 at r24 (raw file):

			if repl.mu.commandSizes != nil {
				repl.mu.commandSizes[proposal.idKey] = proposal.command.Size()

can you add a comment explaining why this has to happen before insertProposalLocked (and then conditionally undone) rather than occurring in the non-noop branch of the if structure below?


pkg/storage/replica_test.go, line 6803 at r24 (raw file):

			}

			undoQuotaAcquisitionLocked := func() {

nit: called in one place; perhaps better to inline it?


Comments from Reviewable

@irfansharif
Copy link
Contributor Author

Add the approximate quota in the RangeInfo

Done.

nit: looks like the last 4 commits should be squashed into one, wdyt?

I don't mind, less work for me. I'll still keep the separate commit messages however given how many things change across the commits.

Thank you all for your reviews!!


Review status: 2 of 14 files reviewed at latest revision, 40 unresolved discussions.


pkg/storage/client_raft_test.go, line 1775 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

leaderRepl.QuotaAvailable()

does this second call return the same value as the first? might want to extract a local (as you do below)

Done.


pkg/storage/client_test.go, line 194 at r20 (raw file):

can you use the anonymous struct pattern?

Done.

does this need to be a RWMutex, or can it just be a plain mutex?

We were using RLock in getNodeIDAddress before, so leaving that behavior unchanged.


pkg/storage/helpers_test.go, line 260 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

leader replica.

lease holder of raft leader?

Done.


pkg/storage/helpers_test.go, line 275 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

range leader.

raft leader?

Done.


pkg/storage/quota_pool.go, line 154 at r19 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/to notified/to be notified/

Done.


pkg/storage/quota_pool.go, line 158 at r19 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Why do we fill up the channel instead of setting it to nil or removing it from the slice completely? Both sides of this operation are carried out under the lock.

We cannot remove it from the slice because it's position may have changed since we first added it to the queue (it may have moved further ahead in the queue of previous acquisitions ran to completion/got cancelled).
As for setting the channel to nil, within acquire if we have the following:

notifyCh := make(chan struct{}, 1)
qp.queue = append(qp.queue, notifyCh)
...
notifyCh = nil

All we've done is set our local variable to nil, in order for the nil to be reflected in qp.queue we'll need qp.queue to be of type []*chan struct{} and do the following:

notifyCh := make(chan struct{})
qp.queue = append(qp.queue, &notifyCh)
...
notifyCh = nil

where the notifyNext routine would look like:

for i, ch := range qp.queue[1:] {
    if *ch != nil {
	qp.queue = qp.queue[i+1:]
	close(*ch)
	qp.Unlock()
	return ctx.Err()
    }
}
qp.queue = qp.queue[:0]

Which works too but I find one cleaner than the other. Additionally when setting notifyCh = nil, this too has to be done under lock, lest it gets changed underneath a thread currently traversing qp.queue. For side by side comparisons both implementations can be found here and here.


pkg/storage/quota_pool.go, line 243 at r19 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This loop appears three times; how about a notifyNextLocked method?

Done.


pkg/storage/quota_pool.go, line 16 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

consider separating this from the above header, since this isn't part of our license - it really explains the following header.

Sure.


pkg/storage/quota_pool.go, line 69 at r24 (raw file):

are all members protected by this mutex?

Not exactly, we also use the mutex to guarantee only one add operation is going on at the same time and approximateQuota only behaves correctly by excluding other add's. We do this to preserve the fact that total quota never exceeds a maximum amount.


pkg/storage/quota_pool.go, line 72 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

so as to in order

too many phrasings

Done.


pkg/storage/quota_pool.go, line 73 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

from a

by a

Done, this is why I failed the ELPE :(


pkg/storage/quota_pool.go, line 76 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

i.e. their

double space

also, can we avoid the term "thread"? either a "waiter", a "channel", or a "goroutine"

Done.


pkg/storage/quota_pool.go, line 155 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

you don't reset the timer here, so this will only ever fire once. I think you can do without timeWaitedFor.

If you did want to reset the timer here, note that the value sent on slowTimer.C is the current time, so you could record the start time and then subtract it from the value received on the channel instead of doing this slightly inaccurate arithmetic.

Done.


pkg/storage/quota_pool.go, line 161 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

to notified

to be notified

Done.


pkg/storage/quota_pool.go, line 165 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit (throughout): note that some editors will eagerly re-wrap this unless you add an empty comment line above this line.

Done.


pkg/storage/quota_pool.go, line 184 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I think you can simply this to avoid the extra Unlock and return (applies in all 3 places, I think):

  qp.queue = qp.queue[1:]
chanLoop:
  for _, ch := range qp.queue {
    select {
    case ch <- struct{}{}:
      // The next waiter has been notified; we're done.
      break chanLoop
    default:
      // The next waiter is no longer interested, move on.
      qp.queue = qp.queue[1:]
    }
  }
} else {
  notifyCh <- struct{}{}
}
qp.Unlock()
return ctx.Err()

pulled this out into a separate notifyNextLocked method instead.


pkg/storage/quota_pool_test.go, line 86 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

throughout: how come buffered?

Unnecessary, removed.


pkg/storage/quota_pool_test.go, line 88 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

throughout: might be easier to just always send the error and check it at the receiver since you are checking it at the receiver anyhow.

go func() {
  errChan <- gp.acquire(ctx, 1)
}()

Done.


pkg/storage/replica.go, line 868 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this comment is now misplaced (it should come after these assertions)

Done.


pkg/storage/replica.go, line 874 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

might be good to include the unexpected value in all of these assertions. possibly nothing to log here, but the others are numeric and possibly useful.

Done.


pkg/storage/replica.go, line 886 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

how can proposalQuota be non-nil if we're observing a leadership change (and not becoming a follower)?

ah, artifact. Good eye, only happens if we're becoming a follower.


pkg/storage/replica.go, line 898 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this looks like an assertion, but it's randomly a different style than the ones above.

How about:

if r.mu.replicaID == r.mu.leaderID {
  log.Fatal(.....)
}
// We're a follower.
return

Done.


pkg/storage/replica.go, line 950 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

delta seems redundant with numReleases. can we just have one?

Done.


pkg/storage/replica.go, line 956 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Perhaps this can be:

sum := 0
for _, rel := range r.mu.quotaReleaseQueue[:numReleases] {
  sum += rel
}

the int64 is somewhat confusing as written (it makes me think quotaReleaseQueue is a map).

Done.


pkg/storage/replica.go, line 2758 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

when is it possible for this to be nil?

when it's not the leader replica.


pkg/storage/replica.go, line 2985 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit: no need for the func here, you can write defer r.updateProposalQuoteRaftMuLocked(ctx, lastLeaderID) since you don't need late binding here.

Done.


pkg/storage/replica_test.go, line 6259 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit: looks like an errant addition

Done.


pkg/storage/replica_test.go, line 6799 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

how come it's ok to proceed if repl.mu.commandSizes is nil?

it won't, in this specific test there's only one replica therefore repl.mu.commandSizes is always initialized. I wrote this to have similar structure to Replica.propose but it doesn't need to be. Any who, recent changes have removed the need to set up `repl.mu.commandSizes' so removed entirely.


pkg/storage/replica_test.go, line 6800 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

can you add a comment explaining why this has to happen before insertProposalLocked (and then conditionally undone) rather than occurring in the non-noop branch of the if structure below?

Removed entirely.


pkg/storage/replica_test.go, line 6803 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit: called in one place; perhaps better to inline it?

Removed entirely.


Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: 2 of 14 files reviewed at latest revision, 33 unresolved discussions, some commit checks pending.


pkg/storage/quota_pool.go, line 158 at r19 (raw file):

We cannot remove it from the slice because it's position may have changed since we first added it to the queue

Yes, I was assuming we'd iterate over the queue to find its current position.

we'll need qp.queue to be of type []*chan struct{}

I don't think the pointer is necessary. We don't need to overwrite our local variable, just the one in the queue:

for i := range qp.queue {
    if qp.queue[i] == notifyCh {
        qp.queue[i] = nil
        break
    } 
}

Additionally when setting notifyCh = nil, this too has to be done under lock

Yes, but it's already all happening under the lock. (If you were able to move some of this out of the lock, that would be a reason to use the non-standard "fill up the channel" pattern)


Comments from Reviewable

@tbg
Copy link
Member

tbg commented May 26, 2017

Reviewed 12 of 12 files at r25.
Review status: all files reviewed at latest revision, 20 unresolved discussions, some commit checks pending.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented May 26, 2017

Note that keeping the commits separate isn't free either - there was a fair amount of churn in this PR, and keeping these commits separate will make it much harder to dig through the git history if that's ever needed (and experience suggests it will be needed).


Reviewed 1 of 12 files at r8, 12 of 12 files at r25.
Review status: all files reviewed at latest revision, 20 unresolved discussions, some commit checks pending.


pkg/storage/helpers_test.go, line 275 at r24 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Done.

nit: you write "leaseholder" here but "lease holder" above.


pkg/storage/quota_pool.go, line 76 at r24 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Done.

heh, the word thread still appears dozens of times in this PR.


pkg/storage/quota_pool_test.go, line 96 at r25 (raw file):

		t.Fatal("context cancellation did not unblock acquisitions within 5s")
	case err := <-errCh:
		if !testutils.IsError(err, "context canceled") {

I think you can do a direct comparison here against context.Canceled instead of using IsError.


pkg/storage/quota_pool_test.go, line 97 at r25 (raw file):

	case err := <-errCh:
		if !testutils.IsError(err, "context canceled") {
			t.Fatal("expected acquisition to fail with ctx cancellation")

you should include the unexpected error received in this string.


pkg/storage/quota_pool_test.go, line 132 at r25 (raw file):

	case err := <-errCh:
		if !testutils.IsError(err, "quota pool no longer in use") {
			t.Fatal("expected acquisition to fail with pool closing")

ditto


pkg/storage/quota_pool_test.go, line 164 at r25 (raw file):

			t.Fatal("context cancellations did not unblock acquisitions within 5s")
		case err := <-errCh:
			if !testutils.IsError(err, "context canceled") {

ditto


pkg/storage/replica.go, line 2758 at r24 (raw file):

Previously, irfansharif (irfan sharif) wrote…

when it's not the leader replica.

Ah, of course.


pkg/storage/replica.go, line 951 at r25 (raw file):

		// min(minIndex - r.mu.proposalQuotaBaseIndex, len(r.mu.quotaReleaseQueue))
		// quota releases.
		numReleases := int64(minIndex - r.mu.proposalQuotaBaseIndex)

this is casting uint64 to int64, then you cast int to int64 to match, and then you cast this back to uint64 when you add it to r.mu.proposalQuotaBaseIndex. can it just be uint64 the whole time, with a cast to int when it's used as an array index (if that's necessary, i'm not sure)


Comments from Reviewable

@irfansharif
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 19 unresolved discussions, all commit checks successful.


pkg/storage/helpers_test.go, line 275 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit: you write "leaseholder" here but "lease holder" above.

Done.


pkg/storage/quota_pool.go, line 158 at r19 (raw file):

I was assuming we'd iterate over the queue to find its current position.

I see. Not that I've verified this experimentally but I imagine the case where a large command causes a backlog of acquisitions queued up, subsequent cancellations (say due to timeouts for example, I assume this is a thing) could potentially mean looping over a large number of entries, all the while holding the lock (we're modifying qp.queue). Not sure what this buys us over simply stuffing the channel (sized 1). The performance for the common case is unchanged.


pkg/storage/quota_pool.go, line 76 at r24 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

heh, the word thread still appears dozens of times in this PR.

whoops, done.


pkg/storage/quota_pool_test.go, line 96 at r25 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I think you can do a direct comparison here against context.Canceled instead of using IsError.

ah, didn't know this was a thing. Done.


pkg/storage/quota_pool_test.go, line 97 at r25 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

you should include the unexpected error received in this string.

Done.


pkg/storage/quota_pool_test.go, line 132 at r25 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

ditto

Done.


pkg/storage/quota_pool_test.go, line 164 at r25 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

ditto

Done.


pkg/storage/replica.go, line 951 at r25 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this is casting uint64 to int64, then you cast int to int64 to match, and then you cast this back to uint64 when you add it to r.mu.proposalQuotaBaseIndex. can it just be uint64 the whole time, with a cast to int when it's used as an array index (if that's necessary, i'm not sure)

Done, an apparently it's not necessary. TIL.


Comments from Reviewable

@irfansharif irfansharif force-pushed the proposal-quota branch 2 times, most recently from ebcbdc5 to b4df264 Compare May 26, 2017 23:48
@tamird
Copy link
Contributor

tamird commented May 26, 2017

Just nits at this point, thanks for addressing my comments! :lgtm: as far as I'm concerned.


Reviewed 4 of 4 files at r26.
Review status: all files reviewed at latest revision, 17 unresolved discussions.


pkg/storage/quota_pool.go, line 236 at r26 (raw file):

queue. Normally

double space >.<


pkg/storage/quota_pool_test.go, line 132 at r25 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Done.

Doesn't look done (include the unexpected error)


pkg/storage/quota_pool_test.go, line 29 at r26 (raw file):

threads

bunch of occurrences of "thread" in this file, too.


pkg/storage/quota_pool_test.go, line 97 at r26 (raw file):

	case err := <-errCh:
		if err != context.Canceled {
			t.Fatalf("expected context cancellation error, got %s", err.Error())

err might be nil (so you should use %v rather than %s), and there's never a need to call Error when formatting errors. ... got %v", err)


pkg/storage/quota_pool_test.go, line 165 at r26 (raw file):

		case err := <-errCh:
			if err != context.Canceled {
				t.Fatalf("expected context cancellation error, got %s", err.Error())

err might be nil (so you should use %v rather than %s), and there's never a need to call Error when formatting errors. ... got %v", err)


Comments from Reviewable

Use another mutex to protect access to multiTestContext.nodeIDtoAddr.
Reusing the same mutex guarding mtc.{senders,stores,nodeLivenesses}
leads to deadlocks due to inconsistent lock ordering.

Consider multiTestContext.FirstRange()
  - We acquire a read lock on mtc.mu when iterating over mtc.senders
  - Internally we call Replica.Desc() thus acquiring a read lock on
    Replica.mu.
  (lock ordering: multiTestContext.mu, Replica.mu)

Consider the case where we want to check for "healthy" RPC connections
to follower replicas when handling raft.Ready
 - We acquire a lock on Replica.mu to handle raft.Ready
 - We call Replica.store.cfg.Transport.resolver(replica.NodeID). The
   default resolver function for multiTestContext is
   mtc.getNodeIDAddress, which accesses the mtc.nodeIDtoAddr map and
   consequently acquires a read lock on mtc.mu.
 (lock ordering: Replica.mu, multiTestContext.mu)

The non-test implementation of Transport and the corresponding resolver
does not have this lock ordering issue.
The leader maintains a pool of "proposal quota". Before proposing a Raft
command, we acquire proposal quota equal to the size of the
command. When all of the healthy followers have committed the entry,
quota equal to the size of the corresponding command is returned to the
pool. We maintain on Replica a map associating storagebase.CmdIDKey to
the corresponding command size. Downstream of raft we extract the same
storagebase.CmdIDKey from the command and append the command size to
quotaReleaseQueue. This queue is subsequently used to return quota back
to the pool once all the followers have caught up.

We only consider "healthy" followers to be those that have "healthy" RPC
connections when determining if a unit of quota should be returned to the
pool.
@tamird
Copy link
Contributor

tamird commented May 27, 2017

Reviewed 13 of 13 files at r28.
Review status: all files reviewed at latest revision, 12 unresolved discussions.


Comments from Reviewable

@irfansharif
Copy link
Contributor Author

thanks again tamir! rebased to two commits like asked for.


Review status: all files reviewed at latest revision, 16 unresolved discussions, all commit checks successful.


pkg/storage/quota_pool.go, line 236 at r26 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

queue. Normally

double space >.<

Done.


pkg/storage/quota_pool_test.go, line 132 at r25 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Doesn't look done (include the unexpected error)

Done.


pkg/storage/quota_pool_test.go, line 29 at r26 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

threads

bunch of occurrences of "thread" in this file, too.

Done.


pkg/storage/quota_pool_test.go, line 97 at r26 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

err might be nil (so you should use %v rather than %s), and there's never a need to call Error when formatting errors. ... got %v", err)

Done.


pkg/storage/quota_pool_test.go, line 165 at r26 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

err might be nil (so you should use %v rather than %s), and there's never a need to call Error when formatting errors. ... got %v", err)

Done.


Comments from Reviewable

@irfansharif irfansharif merged commit db8da7c into cockroachdb:master May 27, 2017
@irfansharif irfansharif deleted the proposal-quota branch May 27, 2017 00:54
@tbg
Copy link
Member

tbg commented May 27, 2017

image

also, we're gonna need to test/exercise this end-to-end in some way or another. File an issue if you haven't already (perhaps also bring up Toxiproxy, seemed really useful from what you told me about it).

irfansharif added a commit to irfansharif/cockroach that referenced this pull request Aug 23, 2017
This was an oversight introduced in cockroachdb#15802, the `undoQuotaAcquisition`
was intended to be used for all proposal errors but we seem to have
skipped the very first one. It seems to have been a relic of the earlier
structure where in the event of proposal errors the acquired quota was
released within `Replica.propose` itself, and not left to the caller.

Fixes (hopefully) cockroachdb#17826.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants