Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver,allocator: decouple allocator from storepool #91941

Merged
merged 2 commits into from
Dec 22, 2022

Conversation

AlexTalks
Copy link
Contributor

@AlexTalks AlexTalks commented Nov 15, 2022

This change refactors the allocator to accept the store pool as an
argument, moving it closer to being stateless and purely functional.
This is done by moving to using the storepool through its primary
owner (the store configuration), or through references held by
users of the allocator. The refactoring enables the the allocator to
consider potential scenarios rather than those simply based on the
current liveness.

Part of #91570.

Release Note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@AlexTalks AlexTalks force-pushed the dpf_allocator_refactor branch 2 times, most recently from 0ac8798 to 874eae8 Compare November 16, 2022 02:29
@AlexTalks AlexTalks marked this pull request as ready for review November 16, 2022 02:32
@AlexTalks AlexTalks requested a review from a team as a code owner November 16, 2022 02:32
@AlexTalks AlexTalks force-pushed the dpf_allocator_refactor branch 3 times, most recently from a1a2ace to c35fe91 Compare November 18, 2022 22:32
AlexTalks added a commit to AlexTalks/cockroach that referenced this pull request Nov 23, 2022
Adds support for using the allocator to compute the action, if any,
needed to "repair" the range and, if the action requires a replica
addition (including replacements), the target of that addition. This can
be checked by using the new `CheckRangeAction(..)` function as part of a
store's replicate queue, and can use the default store pool or an
override store pool, so that potential states can be evaluated prior to
transition to those states. As such, this feature adds support for
allocator action and target validation prior to decommission, in order
to support decommission pre-checks.

While this is similar to the replicate queue's `PlanOneChange(..)`, this
new check supports evaluation based on a range descriptor, rather than
an individual replica.

Depends on cockroachdb#91941

Part of cockroachdb#91570.

Release note: None
@AlexTalks AlexTalks force-pushed the dpf_allocator_refactor branch from c35fe91 to 1468022 Compare November 23, 2022 05:06
AlexTalks added a commit to AlexTalks/cockroach that referenced this pull request Nov 23, 2022
Adds support for using the allocator to compute the action, if any,
needed to "repair" the range and, if the action requires a replica
addition (including replacements), the target of that addition. This can
be checked by using the new `CheckRangeAction(..)` function as part of a
store's replicate queue, and can use the default store pool or an
override store pool, so that potential states can be evaluated prior to
transition to those states. As such, this feature adds support for
allocator action and target validation prior to decommission, in order
to support decommission pre-checks.

While this is similar to the replicate queue's `PlanOneChange(..)`, this
new check supports evaluation based on a range descriptor, rather than
an individual replica.

Depends on cockroachdb#91941

Part of cockroachdb#91570.

Release note: None
@AlexTalks AlexTalks force-pushed the dpf_allocator_refactor branch from 1468022 to 758b541 Compare December 15, 2022 04:26
AlexTalks added a commit to AlexTalks/cockroach that referenced this pull request Dec 15, 2022
Adds support for using the allocator to compute the action, if any,
needed to "repair" the range and, if the action requires a replica
addition (including replacements), the target of that addition. This can
be checked by using the new `CheckRangeAction(..)` function as part of a
store's replicate queue, and can use the default store pool or an
override store pool, so that potential states can be evaluated prior to
transition to those states. As such, this feature adds support for
allocator action and target validation prior to decommission, in order
to support decommission pre-checks.

While this is similar to the replicate queue's `PlanOneChange(..)`, this
new check supports evaluation based on a range descriptor, rather than
an individual replica.

Depends on cockroachdb#91941

Part of cockroachdb#91570.

Release note: None
@AlexTalks AlexTalks force-pushed the dpf_allocator_refactor branch from 758b541 to 21b28d3 Compare December 19, 2022 23:19
AlexTalks added a commit to AlexTalks/cockroach that referenced this pull request Dec 19, 2022
Adds support for using the allocator to compute the action, if any,
needed to "repair" the range and, if the action requires a replica
addition (including replacements), the target of that addition. This can
be checked by using the new `CheckRangeAction(..)` function as part of a
store's replicate queue, and can use the default store pool or an
override store pool, so that potential states can be evaluated prior to
transition to those states. As such, this feature adds support for
allocator action and target validation prior to decommission, in order
to support decommission pre-checks.

While this is similar to the replicate queue's `PlanOneChange(..)`, this
new check supports evaluation based on a range descriptor, rather than
an individual replica.

Depends on cockroachdb#91941

Part of cockroachdb#91570.

Release note: None
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of the decorated methods ...WithStorePool(...). It adds to the complexity of the allocator API by exposing additional methods.

Would it be possible to instead pass it as an argument for every top level function instead?

This also relates to the approach taken on subsequent PRs - is it possible to use planOneChange in place of any forked methods.

Reviewed 2 of 2 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @andrewbaptist)


-- commits line 4 at r1:
nit: drop the "to be"

Copy link
Contributor Author

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is doable - let me know if you think it makes more sense to always pass in the StorePool on every allocator method (which may complicate things with callers), or if it would make sense to simply add an override parameter, like:

func (a *Allocator) ComputeAction(..., overrideStorePool storepool.AllocatorStorePool) {
  storePool := a.StorePool
  if overrideStorePool != nil {
     storePool = overrideStorePool
  }
  ...
}

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist)

@AlexTalks AlexTalks force-pushed the dpf_allocator_refactor branch 2 times, most recently from 26a63b1 to e9507ff Compare December 21, 2022 02:02
@AlexTalks AlexTalks requested a review from a team December 21, 2022 02:02
@AlexTalks AlexTalks changed the title allocator: refactor allocator to accept StorePool arguments kvserver,allocator: decouple allocator from storepool Dec 21, 2022
@AlexTalks AlexTalks force-pushed the dpf_allocator_refactor branch from e9507ff to 79e7b0a Compare December 21, 2022 02:08
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: It would be good to add the storepool to the replicate queue similar to the store rebalancer.

Reviewed 1 of 9 files at r3, 23 of 23 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @AlexTalks and @andrewbaptist)


pkg/kv/kvserver/replicate_queue.go line 614 at r4 (raw file):

) (shouldQueue bool, priority float64) {
	desc, conf := repl.DescAndSpanConfig()
	action, priority := rq.allocator.ComputeAction(ctx, rq.store.cfg.StorePool, conf, desc)

I'm thinking it would be better if the replicate queue were instantiated with a ref to the StorePool similar to the store rebalancer.


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 512 at r4 (raw file):

// MakeAllocator creates a new allocator.
// The deterministic flag indicates that this allocator is intended to be used

What if it is not? This seems pretty fragile. There is a deterministic test which should catch this however, so it should be okay if any unlucky soul runs into it.

This change adds methods to evaluate allocator actions and targets
utilizing a passed-in `StorePool` object, allowing for the allocator to
consider potential scenarios rather than those simply based on the
current liveness.

Depends on cockroachdb#91461, cockroachdb#91965.

Part of cockroachdb#91570.

Release note: None
This fully decouples the StorePool from the allocator by moving
ownership of the configured view of peer stores to the store
configuration (the original source), the store rebalancer, and various
structures in the allocator simulator.

Part of cockroachdb#91570.

Release note: None
@AlexTalks AlexTalks force-pushed the dpf_allocator_refactor branch from 79e7b0a to 5236ee1 Compare December 21, 2022 22:48
@AlexTalks
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Dec 22, 2022

Build succeeded:

@craig craig bot merged commit a2d9a0e into cockroachdb:master Dec 22, 2022
craig bot pushed a commit that referenced this pull request Jan 7, 2023
94114: kvserver: refactor replicate queue to enable allocator code reuse r=AlexTalks a=AlexTalks

This change refactors parts of the replicate queue's `PlanOneChange(..)`
and `addOrRemove{Non}Voters(..)` functions to reusable helper functions
that simplify usage of the allocator and deduplicate repeated code
paths. The change also adds convenience methods to the `AllocatorAction`
enum, to move certain determinations (such as if a computed allocator
action is a remove or a replace) closer to the allocator type it is
based on. These changes further enable the ability to use the allocator
outside of the replicate queue, and enable the ability for some of this
logic to move into the allocator itself in the future.

Depends on #91941.

Epic: none

Release note: None

Co-authored-by: Alex Sarkesian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants