storage: reapply the rule solver #10252

BramGruneir · 2016-10-27T03:54:14Z

This adds back in 3 commits that were removed to facilitate the merge of develop
back to master. One other commit, is no longer required.

Closes #9336

4446345
storage: add constraint rule solver for allocation

Rules are represented as a single function that returns the candidacy of the
store as well as a float value representing the score. These scores are then
aggregated from all rules and returns the stores sorted by them.

Current rules:

ruleReplicasUniqueNodes ensures that no two replicas are put on the same node.
ruleConstraints enforces that required and prohibited constraints are
followed, and that stores with more positive constraints are ranked higher.
ruleDiversity ensures that nodes that have the fewest locality tiers in common
are given higher priority.
ruleCapacity prioritizes placing data on empty nodes when the choice is
available and prevents data from going onto mostly full nodes.

dd3229a
storage: implemented RuleSolver into allocator
27353a8
storage: removed unused rangeCountBalancer

There was a 4th commit that is no longer required. The simulation was already
converging since adding a rebalance threshold.
4e29a36
storage/simulation: only rebalance 50% of ranges on each iteration so it will
converge

This change is

tamird · 2016-10-27T13:17:33Z

Reviewed 12 of 14 files at r1.
Review status: 12 of 14 files reviewed at latest revision, 13 unresolved discussions, all commit checks successful.

pkg/storage/allocator.go, line 115 at r1 (raw file):

  storePool  *StorePool
  options    AllocatorOptions
  ruleSolver *ruleSolver

this can be a value

pkg/storage/allocator.go, line 174 at r1 (raw file):

func (a *Allocator) AllocateTarget(
  constraints config.Constraints, existing []roachpb.ReplicaDescriptor, rangeID roachpb.RangeID,
) (*roachpb.StoreDescriptor, error) {

hm, seems like this wants to return a value now instead of a pointer

pkg/storage/allocator.go, line 185 at r1 (raw file):

      }
  }
  return &candidates[0].store, nil

how come the randomness that was previously here is no longer needed?

pkg/storage/allocator.go, line 201 at r1 (raw file):

  }

  // TODO(bram): Is this getStoreList call required?

expand this rationale; why is it currently required, why might it not be required? why is it OK to use a bogus range ID?

pkg/storage/allocator.go, line 207 at r1 (raw file):

  found := false
  var worst roachpb.ReplicaDescriptor
  var worstScore float64

seems like you can initialize this to math.MaxFloat64 and remove found.

pkg/storage/allocator.go, line 217 at r1 (raw file):

      }
      // If it's not a valid candidate, score will be zero.
      candidate, _ := a.ruleSolver.computeCandidate(solveState{

why is it OK to skip checking the error?

pkg/storage/allocator.go, line 269 at r1 (raw file):

  }

  // TODO(bram): getStoreList is called twice, once now, and again in solve,

getStoreList is also the only function on storePool that is ever called from the solver, and that function (Solve()) is only ever called from the allocator, which can pass the store list.

Seems easy enough to fix in this PR by passing the store list as an argument to Solve; WDYT?

pkg/storage/allocator.go, line 299 at r1 (raw file):

  // Find a candidate that is better than one of the existing stores, otherwise
  // return nil.
  candidatesFound := 0

I had to squint to figure out what this counter was doing. How about this instead:

if len(candidates) > len(existing) {
  candidates = candidates[:len(existing)]
}

pkg/storage/allocator_test.go, line 674 at r1 (raw file):

      a.storePool.mu.Lock()
      a.storePool.mu.deterministic = true

doesn't this want to be inside createTestAllocator?

pkg/storage/allocator_test.go, line 1435 at r1 (raw file):

  // Output:
  // generation    0:   1 88%,  0  0%,  0  0%,  0  0%,  0  0%,  0  0%,  0  0%,  0  0%,  0  0%,  0  0%,  0  0%,  0  0%,  0  0%,  0  0%,  0  0%,  0  0%,  0  0%,  0  0%,  0  0%  1 11%

please revert this, or put it in a separate commit. This test is important for validating that the allocator's behaviour didn't inadvertently change, and including this graphical change in the rest of the change makes it difficult to spot interesting changes.

pkg/storage/client_raft_test.go, line 1917 at r1 (raw file):

      actual := countReplicas()
      if !reflect.DeepEqual(expected, actual) {
          return errors.Errorf("replicas are not distributed as expected = %+v, want %+v, %s", actual, expected, pretty.Diff(expected, actual))

why'd this change? the diff alone isn't enough?

pkg/storage/store_pool.go, line 216 at r1 (raw file):

      storeDetails  map[roachpb.StoreID]*storeDetail
      queue         storePoolPQ
      deterministic bool

this doesn't need to be under lock, does it?

pkg/storage/store_pool_test.go, line 42 at r1 (raw file):

// TestSetDeterministic makes StorePool return results in a deterministic way.
func (sp *StorePool) TestSetDeterministic(deterministic bool) {

this has one caller.

Comments from Reviewable

petermattis · 2016-10-27T14:31:39Z

Review status: 12 of 14 files reviewed at latest revision, 15 unresolved discussions, all commit checks successful.

pkg/storage/allocator.go, line 174 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

hm, seems like this wants to return a value now instead of a pointer

Let's not change this. Returning values isn't always a win.

pkg/storage/allocator.go, line 185 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

how come the randomness that was previously here is no longer needed?

+1. The returned candidates are sorted by score which makes this deterministic which is Very Bad.

pkg/storage/allocator.go, line 310 at r1 (raw file):

      }
      if !found {
          return &store, nil

We need randomness here too. See my other comment about the candidates being sorted by score at this point making this loop deterministic.

pkg/storage/store_pool.go, line 518 at r1 (raw file):

  if sp.mu.deterministic {
      sort.Sort(sl)
  }

Relying on the map iteration to randomize the StoreList is subtle. We should shuffle if !sp.mu.deterministic. See ReplicaSlice.Suffle.

Comments from Reviewable

petermattis · 2016-10-27T19:15:36Z

This change is significant enough that I think we want to enable the new rule-based allocator via an env var and have it disabled by default.

Review status: 12 of 14 files reviewed at latest revision, 18 unresolved discussions, all commit checks successful.

pkg/storage/rule_solver.go, line 81 at r1 (raw file):

// which they are run. For optimization purposes, less computationally intense
// rules should run first to eliminate candidates.
func makeRuleSolver(storePool *StorePool, rules []rule) *ruleSolver {

The generality of a configurable list of rules feels like overkill to me.

pkg/storage/rule_solver.go, line 122 at r1 (raw file):

  }
  sort.Sort(byScore(candidates))
  return candidates, nil

Does any caller every use the list? Might be better to return a single candidate that is randomly selected here.

One thought on reintroducing randomness: shuffle sl and iterate over sl.stores until you find two potential candidates and then choose the one with the better score. This be a power-of-two-choices style selection (https://www.eecs.harvard.edu/~michaelm/postscripts/mythesis.pdf).

pkg/storage/rule_solver.go, line 127 at r1 (raw file):

// computeCandidate runs all the rules for the store and returns the candidacy
// information. Returns false if not a candidate.
func (rs *ruleSolver) computeCandidate(state solveState) (candidate, bool) {

It is super confusing that candidate is a type and we see candidate, bool here while in ruleReplicasUniqueNodes we see candidate bool.

Comments from Reviewable

BramGruneir · 2016-10-27T19:55:13Z

Oh, I like that idea a lot. Going to take a bit of plumbing since this touches a lot. But I think it's doable.

And a note on this PR. My goal here is to bring back the code from the original commits more than entirely fixing them. So by adding the env flag, that should allow some of the details (like how it's not correctly random) to be address in follow up PRs. Otherwise, this PR would just get out of hand.

So easy fixes are going in, but substantial changes I'm going to reserve for follow-ups.

Comments from Reviewable

petermattis · 2016-10-27T20:01:40Z

So easy fixes are going in, but substantial changes I'm going to reserve for follow-ups.

I'd be fine with seeing this go in without any fixes as long as it is disabled by default.

Review status: 12 of 14 files reviewed at latest revision, 18 unresolved discussions, all commit checks successful.

Comments from Reviewable

d4l3k · 2016-10-27T20:56:01Z

pkg/storage/allocator.go, line 185 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

+1. The returned candidates are sorted by score which makes this deterministic which is Very Bad.

Why is deterministic order "Very Bad"? Isn't the whole idea of ranking by score such that you can return the best possible candidate? If you're randomizing doesn't that defeat the whole point of positive vs required attributes?

Isn't it being deterministic a net win since it's then also much easier to understand what's happening?

Comments from Reviewable

This adds back in 3 commits that were removed to facilitate the merge of develop back to master. One other commit, is no longer required. Follow up fixes are tracked in cockroachdb#10275. Closes cockroachdb#9336 1) 4446345 storage: add constraint rule solver for allocation Rules are represented as a single function that returns the candidacy of the store as well as a float value representing the score. These scores are then aggregated from all rules and returns the stores sorted by them. Current rules: - ruleReplicasUniqueNodes ensures that no two replicas are put on the same node. - ruleConstraints enforces that required and prohibited constraints are followed, and that stores with more positive constraints are ranked higher. - ruleDiversity ensures that nodes that have the fewest locality tiers in common are given higher priority. - ruleCapacity prioritizes placing data on empty nodes when the choice is available and prevents data from going onto mostly full nodes. 2) dd3229a storage: implemented RuleSolver into allocator 3) 27353a8 storage: removed unused rangeCountBalancer There was a 4th commit that is no longer required. The simulation was already converging since adding a rebalance threshold. 4e29a36 storage/simulation: only rebalance 50% of ranges on each iteration so it will converge

d4l3k · 2016-10-27T21:07:41Z

Review status: 12 of 14 files reviewed at latest revision, 18 unresolved discussions, all commit checks successful.

pkg/storage/allocator.go, line 217 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

why is it OK to skip checking the error?

That's not an error. That's a bool whether or not the that store is a potential candidate. If it's not a candidate, the score will be zero and thus will be the worst. Just makes the logic simpler.

pkg/storage/allocator.go, line 269 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

getStoreList is also the only function on storePool that is ever called from the solver, and that function (Solve()) is only ever called from the allocator, which can pass the store list.

Seems easy enough to fix in this PR by passing the store list as an argument to Solve; WDYT?

Passing in the storePool was done to support viewing this data in the UI. Since getStoreList wasn't exported, the easiest thing was passing in storePool which was accessible from the admin RPC endpoints.

4601e62

pkg/storage/rule_solver.go, line 122 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Does any caller every use the list? Might be better to return a single candidate that is randomly selected here.

One thought on reintroducing randomness: shuffle sl and iterate over sl.stores until you find two potential candidates and then choose the one with the better score. This be a power-of-two-choices style selection (https://www.eecs.harvard.edu/~michaelm/postscripts/mythesis.pdf).

There was an another PR (that might have been reverted too) that exposed this data in the UI. Made it really easy to see exactly how constraints would be applied to the stores.

Comments from Reviewable

BramGruneir · 2016-10-27T21:17:36Z

Fixed a number of the issues. Haven't added the env variable yet. That will be next.

Review status: 9 of 14 files reviewed at latest revision, 18 unresolved discussions.

pkg/storage/allocator.go, line 115 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this can be a value

Done. And I also changed the rule solver's functions to have value receivers.

pkg/storage/allocator.go, line 185 at r1 (raw file):

Previously, d4l3k (Tristan Rice) wrote…

Why is deterministic order "Very Bad"? Isn't the whole idea of ranking by score such that you can return the best possible candidate? If you're randomizing doesn't that defeat the whole point of positive vs required attributes?

Isn't it being deterministic a net win since it's then also much easier to understand what's happening?

I've added this to the follow up issue.

pkg/storage/allocator.go, line 201 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

expand this rationale; why is it currently required, why might it not be required? why is it OK to use a bogus range ID?

Done.

pkg/storage/allocator.go, line 207 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

seems like you can initialize this to math.MaxFloat64 and remove found.

Done.

pkg/storage/allocator.go, line 217 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

why is it OK to skip checking the error?

not an error being returned, a bool that is not required since we're looking for the worst candidate and that may be `0, false`

pkg/storage/allocator.go, line 269 at r1 (raw file):

Previously, d4l3k (Tristan Rice) wrote…

Passing in the storePool was done to support viewing this data in the UI. Since getStoreList wasn't exported, the easiest thing was passing in storePool which was accessible from the admin RPC endpoints.

4601e62

Much cleaner. And a simple fix, I like it. Done.

I'm going to work on that part in the future. Might just export GetStoreList if needed. Added to the follow up issues.

pkg/storage/allocator.go, line 299 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I had to squint to figure out what this counter was doing. How about this instead:
if len(candidates) > len(existing) {
  candidates = candidates[:len(existing)]
}

This whole check was very ugly. I've re-written it to be more explicit.

pkg/storage/allocator.go, line 310 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

We need randomness here too. See my other comment about the candidates being sorted by score at this point making this loop deterministic.

Yep. I'll address this in a follow up PR.

pkg/storage/allocator_test.go, line 674 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

doesn't this want to be inside createTestAllocator?

not all tests want a deterministic store pool.

pkg/storage/allocator_test.go, line 1435 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

please revert this, or put it in a separate commit. This test is important for validating that the allocator's behaviour didn't inadvertently change, and including this graphical change in the rest of the change makes it difficult to spot interesting changes.

Sure, I can split it into a second commit.

pkg/storage/client_raft_test.go, line 1917 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

why'd this change? the diff alone isn't enough?

Not sure, was part of the change. I'll just revert it.

pkg/storage/rule_solver.go, line 81 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The generality of a configurable list of rules feels like overkill to me.

I agree. But I didn't want to make too many structural changes to this PR. I'll put in a todo to cut this down in the future.

pkg/storage/rule_solver.go, line 122 at r1 (raw file):

Previously, d4l3k (Tristan Rice) wrote…

There was an another PR (that might have been reverted too) that exposed this data in the UI. Made it really easy to see exactly how constraints would be applied to the stores.

Yeah, power of 2 is bar far the easiest way to go here and I was thinking of something along the same lines. Added as a todo and to the follow up issue..

pkg/storage/rule_solver.go, line 127 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

It is super confusing that candidate is a type and we see candidate, bool here while in ruleReplicasUniqueNodes we see candidate bool.

Indeed. Fixing.

pkg/storage/store_pool.go, line 216 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this doesn't need to be under lock, does it?

It's used in getStoreList which needs to be thread safe.

pkg/storage/store_pool.go, line 518 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Relying on the map iteration to randomize the StoreList is subtle. We should shuffle if !sp.mu.deterministic. See ReplicaSlice.Suffle.

Good idea. I'll address this in a follow up.

pkg/storage/store_pool_test.go, line 42 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this has one caller.

Two now.

Comments from Reviewable

petermattis · 2016-10-27T21:25:30Z

Review status: 9 of 14 files reviewed at latest revision, 15 unresolved discussions, some commit checks pending.

pkg/storage/allocator.go, line 115 at r1 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Done. And I also changed the rule solver's functions to have value receivers.

Changing this field to a value is fine, but I object to the knee jerk to use value receivers whenever possible. Doing so costs stack space especially when used pervasively. More stack space means more stack growth which is not free.

pkg/storage/allocator.go, line 185 at r1 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

I've added this to the follow up issue.

@d4l3k https://brooker.co.za/blog/2012/01/17/two-random.html. The allocator doesn't have perfect knowledge of the current system or have knowledge of what decisions other nodes are making. If multiple nodes are all selecting the same "best" candidate, that candidate will quickly become overloaded. Randomization doesn't mean tossing out constraints, but applying randomness within the constraints. For example, we select a set of candidates that match the required and positive attributes. If that set is empty, we drop the positive attributes. From that list we select 2 candidates and pick the one with the best score.

Comments from Reviewable

tamird · 2016-10-28T14:43:49Z

Please move the second commit into a separate PR altogether. I would like to see this PR go in with no change in the output of Example_rebalancing.

Reviewed 3 of 5 files at r2.
Review status: 11 of 14 files reviewed at latest revision, 14 unresolved discussions, some commit checks failed.

pkg/storage/allocator.go, line 115 at r1 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Done. And I also changed the rule solver's functions to have value receivers.

FYI: there is no need to make the rule solver's method have value receivers - the Allocator is almost certainly on the heap already, and so taking the address of one of its members is likely allocation-free. Not saying you should change anything, just pointing this out.

pkg/storage/allocator.go, line 213 at r2 (raw file):

  // requires a store list, but we should be able to create one using only
  // the stores that belong to the range.
  // Use an invalid range ID as we don't care about a corrupt replicas since

"since as"?

pkg/storage/allocator.go, line 334 at r2 (raw file):

  // TODO(bram): #10275 Need some randomness here!
  for _, cand := range candidates {

the candidates are sorted.

pkg/storage/allocator_test.go, line 1437 at r2 (raw file):

  // Output:
  // 999 129 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000

this is alarming. why did this change?

pkg/storage/store_pool.go, line 216 at r1 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

It's used in getStoreList which needs to be thread safe.

What? I mean that it shouldn't really be modified after creation.

Comments from Reviewable

BramGruneir · 2016-11-02T20:29:57Z

Closing this PR and I'll be reapplying the rule solver is a collection of smaller PRs.

Instead of applying cockroachdb@1ef40f3 or cockroachdb#10252, this finishes the reapplication of the rule solver. However, this also puts the rule solver under the environment flag COCKROACH_ENABLE_RULE_SOLVER for ease of testing and defaults to not enabled. The follow up to this commit is cockroachdb#10275 and a lot of testing to ensure that the rule solver does indeed perform as expected. Closes cockroachdb#9336

Instead of applying 1ef40f3 or cockroachdb#10252, this finishes the reapplication of the rule solver. However, this also puts the rule solver under the environment flag COCKROACH_ENABLE_RULE_SOLVER for ease of testing and defaults to not enabled. This commit re-applies the rule solver, specifically the following commits: 1) 4446345 storage: add constraint rule solver for allocation Rules are represented as a single function that returns the candidacy of the store as well as a float value representing the score. These scores are then aggregated from all rules and returns the stores sorted by them. Current rules: - ruleReplicasUniqueNodes ensures that no two replicas are put on the same node. - ruleConstraints enforces that required and prohibited constraints are followed, and that stores with more positive constraints are ranked higher. - ruleDiversity ensures that nodes that have the fewest locality tiers in common are given higher priority. - ruleCapacity prioritizes placing data on empty nodes when the choice is available and prevents data from going onto mostly full nodes. 2) dd3229a storage: implemented RuleSolver into allocator The follow up to this commit is cockroachdb#10275 and a lot of testing to ensure that the rule solver does indeed perform as expected. Closes cockroachdb#9336

BramGruneir added this to the 1.0 milestone Oct 27, 2016

BramGruneir added the S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting label Oct 27, 2016

BramGruneir mentioned this pull request Oct 27, 2016

storage: rule solver post merge issues #10275

Closed

19 tasks

storage: update output for allocator_test's Example_rebalancing

1f4cac2

BramGruneir force-pushed the 9336 branch from 49a994b to 1f4cac2 Compare October 27, 2016 21:17

BramGruneir closed this Nov 2, 2016

BramGruneir mentioned this pull request Nov 7, 2016

storage: Re-apply the rule solver. #10507

Merged

BramGruneir deleted the 9336 branch December 19, 2016 16:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: reapply the rule solver #10252

storage: reapply the rule solver #10252

BramGruneir commented Oct 27, 2016 •

edited by rjnn

Loading

tamird commented Oct 27, 2016

petermattis commented Oct 27, 2016

petermattis commented Oct 27, 2016

BramGruneir commented Oct 27, 2016

petermattis commented Oct 27, 2016

d4l3k commented Oct 27, 2016

d4l3k commented Oct 27, 2016

BramGruneir commented Oct 27, 2016

petermattis commented Oct 27, 2016

tamird commented Oct 28, 2016

BramGruneir commented Nov 2, 2016

storage: reapply the rule solver #10252

storage: reapply the rule solver #10252

Conversation

BramGruneir commented Oct 27, 2016 • edited by rjnn Loading

tamird commented Oct 27, 2016

petermattis commented Oct 27, 2016

petermattis commented Oct 27, 2016

BramGruneir commented Oct 27, 2016

petermattis commented Oct 27, 2016

d4l3k commented Oct 27, 2016

d4l3k commented Oct 27, 2016

BramGruneir commented Oct 27, 2016

petermattis commented Oct 27, 2016

tamird commented Oct 28, 2016

BramGruneir commented Nov 2, 2016

BramGruneir commented Oct 27, 2016 •

edited by rjnn

Loading