Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Consider which replica will be removed when adding a replica… #18364

Merged
merged 1 commit into from
Oct 11, 2017
Merged

storage: Consider which replica will be removed when adding a replica… #18364

merged 1 commit into from
Oct 11, 2017

Conversation

a6802739
Copy link
Contributor

@a6802739 a6802739 commented Sep 8, 2017

@a-robinson

Fixes #17971. @a-robinson , Thanks for your guidance, I just simulate (*Allocator).RemoveTarget before we actually add the replica during the rebalance.

Still don't know how to add a test for this.

@a6802739 a6802739 requested a review from a team September 8, 2017 08:53
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@a6802739 a6802739 requested a review from a-robinson September 8, 2017 09:14
@a-robinson
Copy link
Contributor

Hmm, I think we may need to take this a little farther. RebalanceTarget might internally find multiple good stores to rebalance a range to. It only returns the one that it thinks is best, but it may have found others that would be really good fits. In such cases, we should rebalance to one of the other candidates, but this code will just return without doing a rebalance. I think we need to do this sort of filtering inside RebalanceTarget so that it can return a different candidate if there is one that would work instead.

As for testing, the replicate queue is unfortunately in need of some test love (#11987). If you could write one that covers this code, that would be great, otherwise some manual testing may be needed. The best tests for it currently are the allocator acceptance tests and running the cmd/allocsim tool and inspecting the output by hand.


Review status: 0 of 1 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


pkg/storage/replicate_queue.go, line 498 at r1 (raw file):

					StoreID: rebalanceReplica.StoreID,
				}
				candidates := filterUnremovableReplicas(repl.RaftStatus(), desc.Replicas, 0)

I think we'd be best off if we simulate the removal process as closely as possible, which would mean including the new replica in the RaftStatus with a Progress of 0 and passing an ID for it as the brandNewReplicaID to filterUnremovableReplicas rather than appending it to the list of candidates afterwards.


pkg/storage/replicate_queue.go, line 500 at r1 (raw file):

				candidates := filterUnremovableReplicas(repl.RaftStatus(), desc.Replicas, 0)
				candidates = append(candidates, rep)
				removeReplica, _, err := rq.allocator.RemoveTarget(ctx, zone.Constraints, candidates, rangeInfo)

I'm a little concerned about how this will interact with #18425, since we aren't using the updated stats here. Do you have any thoughts on how to make this more accurately reflect the updated stats? I think we may be best off not doing anything about it, but I imagine this discrepancy will probably be a problem at some point.


pkg/storage/replicate_queue.go, line 505 at r1 (raw file):

				}
				if removeReplica.StoreID == rebalanceReplica.StoreID {
					log.VEventf(ctx, 1, "we may immediately remove the same replica after we add this replica for rebalancing")

If you don't mind me rewording this, I'd phrase it as "not rebalancing to candidate %+v because it would be removed immediately after being added"


Comments from Reviewable

@a6802739
Copy link
Contributor Author

a6802739 commented Sep 13, 2017

@a-robinson , Sorry, I couldn't login in Reviewable right Now.

If we already pass the new replica ID as brandNewReplicaID to filterUnremovableReplicas, why we still include the new replica in the RaftStatus with a Progress of 0?

And what do you mean I'm a little concerned about how this will interact with #18425, since we aren't using the updated stats here, I'm sorry, I don't quite understand here? Do you mean we should add the update stats first when we try to simulate the remove procedure?

@a-robinson
Copy link
Contributor

Sorry, I'll try to stick to Github comments if Reviewable is inconvenient for you.

If we already pass the new replica ID as brandNewReplicaID to filterUnremovableReplicas, why we still include the new replica in the RaftStatus with a Progress of 0?

Because it makes a difference in what filterUnremovableReplicas will do. If the new replica isn't included in the RaftStatus then filterUnremovableReplicas will behave as if it doesn't exist (even if we pass it in as the brandNewReplicaID). If we add the new replica to the RaftStatus but don't pass it in as the brandNewReplicaID, then filterUnremovableReplicas will consider the new replica to be behind, which will make it likely to be selected.

And what do you mean I'm a little concerned about how this will interact with #18425, since we aren't using the updated stats here, I'm sorry, I don't quite understand here? Do you mean we should add the update stats first when we try to simulate the remove procedure?

The goal here is to simulate the actual removal process as accurately as possible. If the actual removal process will be acting on updated stats and this simulated removal process isn't, then the simulated process may be incorrect sometimes. Modifying the stats for the simulation in a way that's easy to roll back and doesn't affect anything else happening in the system may be difficult, though. What do you think?

@a6802739
Copy link
Contributor Author

The goal here is to simulate the actual removal process as accurately as possible. If the actual removal process will be acting on updated stats and this simulated removal process isn't, then the simulated process may be incorrect sometimes. Modifying the stats for the simulation in a way that's easy to roll back and doesn't affect anything else happening in the system may be difficult, though. What do you think?

I think the actual intent is just to simulate RemoveTarget, but we didn't actually did the rebalance before the simulation. If we want to be very close to the real RemoveTarget, I think we we just to to update the local copy of target rebalance store, I mean add WritesPerSecond for the target store, and recover it after the simulation?

@a-robinson
Copy link
Contributor

That would be ideal, yes. It'll probably require a fair amount of refactoring, so use your judgment, but that would be best.

@a6802739
Copy link
Contributor Author

@a-robinson, Thank you for your review.
I have made a simulation in RemoveTarget. Still not sure how to add a test for it.

Copy link
Contributor

@a-robinson a-robinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for forgetting about this for a while, @a6802739! We definitely still want this to get in.

Regarding testing, we have a few options of increasing difficulty:

  1. Do something like in storage: immediately update target store write stats after rebalance #18425, where we create a configuration that would previously have caused thrashing and test that RebalanceTarget will prefer not rebalancing over rebalancing to something that will immediately be removed.
  2. Create a test cluster with multiple nodes via something like multiTestContext or TestCluster with localities configured in a way that would normally trigger this bug and ensure that the rebalancing settles down. For an example configuration, see trouble adding a fourth node - will not become functional #19013. To summarize that issue, a cluster with 2 nodes in locality X, 1 node in locality Y, and 1 node in locality Z will exhibit the kind of thrashing that this PR should fix.
  3. Most complicated would be to randomly generate cluster/replica configurations to run through a unit test that tries runs RebalanceTarget, makes (or simulates) the requested change, and then runs RemoveTarget. The test would fail if any configuration chose to remove the replica that it just added.

@@ -374,6 +374,22 @@ func (a *Allocator) AllocateTarget(
}
}

func (a Allocator) simulationRemoveTarget(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but I'd change this to simulateRemoveTarget

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Done.

rangeInfo RangeInfo,
) (roachpb.ReplicaDescriptor, string, error) {
// Update statics first
a.storePool.updateLocalStoreAfterRebalance(targetStore, replica, roachpb.ADD_REPLICA)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the following comment here?

// TODO(a-robinson): This could theoretically interfere with decisions made by other goroutines, but as of October 2017 calls to the Allocator are mostly serialized by the ReplicateQueue (with the main exceptions being Scatter and the status server's allocator debug endpoint). Try to make this interfere less with other callers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, Done.

candidates []roachpb.ReplicaDescriptor,
rangeInfo RangeInfo,
) (roachpb.ReplicaDescriptor, string, error) {
// Update statics first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/statics/statistics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake. Done.

@@ -491,6 +511,33 @@ func (a Allocator) RebalanceTarget(
if target == nil {
return nil, ""
}
// We could make a simulation here to verify whether we'll remove the target we'll rebalance to.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could avoid this code duplication with what's in replicate_queue.go and instead create a shared function they can both use? If they're separate like this, it seems likely that they'll accidentally diverge in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, Sorry, I couldn't quite understand what do you mean here. I didn't make any change to replicate_queue.go.
Would you please point out the code duplication with replicate_queue.go? Thank you very much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the code for the AllocatorRemove case in processOneChange, but on looking at it again it doesn't look like there's much that could be de-duped. Sorry for the confusion!

if newTarget == nil {
return nil, ""
}
target = newTarget
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we actually want this logic to happen in a loop. What if the newTarget is also going to be immediately removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, You are right. My mistake, Done.

@a-robinson
Copy link
Contributor

Reviewed 4 of 4 files at r2.
Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


Comments from Reviewable

@a6802739
Copy link
Contributor Author

a6802739 commented Oct 8, 2017

@a-robinson, Thank you for your reference about unit test. Please have a review again. Thank you very much.

@a6802739
Copy link
Contributor Author

a6802739 commented Oct 9, 2017

@a-robinson , Sorry, I don't know why the teamcity failed. And I didn't see any important information from the teamcity.

Copy link
Contributor

@a-robinson a-robinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that test failure in teamcity is just a flake - #18554

ctx context.Context, constraints config.Constraints, rangeInfo RangeInfo, filter storeFilter,
ctx context.Context,
constraints config.Constraints,
repl *Replica,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing in a Replica is pretty broad. It looks like we could shrink this down to just using the RangeInfo if we made simulateRemoveTarget just take a RangeInfo instead of a Replica.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should guarantee the duration for WritesPerSecond in RangeInfo exceeds MinStatsDuration?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I suppose so. Since this code only runs on the leaseholder, it pretty much always should be, anyway.

@@ -685,9 +694,164 @@ func TestAllocatorRebalance(t *testing.T) {
}
}

// TestAllocatorRebalanceTarget could help us to verify whether we'll rebalance to a target that
// we'll immediately remove.
func TestAllocatorRebalanceTarget(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm that this test passes with your change but fails without it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I could confirm that.

},
},
{
StoreID: 5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment explaining these stores and the fake replica stats below? I'm not really sure why this fifth store is here or why are the values are set the way that they are. And if I'm not sure, then future readers are even less likely to understand what you were thinking while creating the test.

Copy link
Contributor Author

@a6802739 a6802739 Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, my original thinking is that if we didn't add the fifth store, it will have no store to make an rebalance, So I just add a fifth store here. If we didn't make the simulation, it will choose store 2 as the target to make an rebalance, but will be removed immediately after the rebalance. So I just want to make sure that we will not choose store 2 as the target store.

sg.GossipStores(stores, t)

st := a.storePool.st
EnableStatsBasedRebalancing.Override(&st.SV, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? In 1.1, this problem occurs even if stats-based rebalancing is disabled. Testing the default setting seems more valuable than testing the non-default setting. Keeping this false would also let us remove the capacity, available, logicalbytes, and writespersecond fields from all the store descriptors above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, Yeah, that will be more easier. Thank you very much.

testRangeInfo(replicas, firstRange),
storeFilterThrottled,
)
if expected := roachpb.StoreID(5); result.StoreID != expected {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think having a fifth store here is any better than just having 4 stores and verifying that we don't want to move a replica to the fourth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right. 4 stores will be enough.

@a-robinson
Copy link
Contributor

Reviewed 3 of 3 files at r3.
Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.


Comments from Reviewable

Copy link
Contributor

@a-robinson a-robinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple small comments left. Should be good to go after the last couple changes :)

ctx context.Context, constraints config.Constraints, rangeInfo RangeInfo, filter storeFilter,
ctx context.Context,
constraints config.Constraints,
repl *Replica,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I suppose so. Since this code only runs on the leaseholder, it pretty much always should be, anyway.

@@ -491,6 +516,37 @@ func (a Allocator) RebalanceTarget(
if target == nil {
return nil, ""
}
// We could make a simulation here to verify whether we'll remove the target we'll rebalance to.
for len(candidates) >= 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not noticing this last time, but for len(candidates) >= 0 is a pretty meaningless loop condition. Did you mean for len(candidates) > 0? Or to just make this an infinite loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I just thought it will break in the loop condition. I have changed it as your suggestion.

// We're going to manually mark stores dead in this test.
stopper, g, _, a, _ := createTestAllocator( /* deterministic */ false)
defer stopper.Stop(context.Background())
stores := []*roachpb.StoreDescriptor{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining in the code review, but a comment in the code explaining this configuration of stores would be very helpful to future maintainers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for that. I have already added a comment about the configuration of stores.

@a-robinson
Copy link
Contributor

Reviewed 1 of 1 files at r4.
Review status: all files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


Comments from Reviewable

Copy link
Contributor

@a-robinson a-robinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@a-robinson
Copy link
Contributor

Reviewed 5 of 5 files at r5.
Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.


Comments from Reviewable

Copy link
Contributor

@a-robinson a-robinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it fails to build, though, according to teamcity:

[TestStyle/TestErrCheck] style_test.go:819: err=exit status 2, stderr=/go/src/github.com/cockroachdb/cockroach/pkg/storage/store_pool_test.go:434:45: cannot use replica (variable of type *Replica) as RangeInfo value in argument to sp.updateLocalStoreAfterRebalance

@a6802739
Copy link
Contributor Author

@a-robinson, Thank you very much. I have changed it already. Wait for a successful CI of the teamcity. :)

@a6802739 a6802739 merged commit de7337d into cockroachdb:master Oct 11, 2017
@a6802739 a6802739 deleted the rebalance_improve branch October 11, 2017 04:56
a-robinson added a commit to a-robinson/cockroach that referenced this pull request Dec 14, 2017
If the first target attempted was rejected due to the simulation
claiming that it would be immediately removed, we would reuse the
modified `rangeInfo.Desc.Replicas` that had the target added to it,
messing with future iterations of the loop.

Also, we weren't properly modifying the `candidates` slice, meaning that
we could end up trying the same replica multiple times.

I have a test for this, but it doesn't pass yet because the code in cockroachdb#18364
actually isn't quite sufficient for fixing cases like cockroachdb#20241. I'll send
that out tomorrow once I have a fix done.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants