Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qa: Load-based rebalancing of leases and replicas #30007

Closed
a-robinson opened this issue Sep 10, 2018 · 9 comments
Closed

qa: Load-based rebalancing of leases and replicas #30007

a-robinson opened this issue Sep 10, 2018 · 9 comments
Labels
Milestone

Comments

@a-robinson
Copy link
Contributor

Original issues:
#17979 for replicas
#21419 for leases

PRs:
#28340
#28852

Docs issue:
cockroachdb/docs#2051

@vivekmenezes vivekmenezes added this to the 2.1 milestone Sep 10, 2018
@vivekmenezes vivekmenezes assigned vilterp and unassigned vivekmenezes Sep 10, 2018
@vivekmenezes
Copy link
Contributor

@vilterp you've been randomly assigned this issue. The goal here is to take this feature for a spin and find problems.

@a-robinson
Copy link
Contributor Author

Any estimate of when this will be QA'ed? Sooner is obviously better if it manages to turn up any problems.

@vilterp
Copy link
Contributor

vilterp commented Sep 19, 2018

Hey @a-robinson, looking at it today. It's taken me a bit to wrap my head around how this is different than the old stats-based rebalancing approach (seems the docs aren't written yet) but I'm getting a handle on it.

@a-robinson
Copy link
Contributor Author

Thanks! Let me know if you want to chat at all about it.

@vilterp
Copy link
Contributor

vilterp commented Sep 24, 2018

At this point I've just done the basics — ran the roachtests for this and observed various metrics while they were running, as well as looking at the unit test to understand how the allocator is making this decision.

I think we should chat about it though, since I still don't fully grok how this interacts with "follow the workload" (is that different than the prior "stats-based rebalancing")? I.e. if there are a lot of QPS coming from nodes in a certain locality, we want to move leases there, but we also want to balance leaseholders. These goals seem like they could be competing in some scenarios; a more thorough QA should probably explore that.

@vivekmenezes
Copy link
Contributor

Is there any followup work remaining on this issue?

@vilterp
Copy link
Contributor

vilterp commented Nov 26, 2018

I took it for a spin and things seemed to work as promised. Wasn't able to find the time to construct more complicated scenarios that would cause failure modes like thrashing, conflict between different rebalancing methods, etc. Would be good to do that testing to shake out any issues, but not sure who has the bandwidth.

@a-robinson
Copy link
Contributor Author

Yeah, from my perspective this never really got the adversarial testing I was hoping for. Including it in the QA rotation would be ideal, although I understand if other QA issues come first.

@vivekmenezes
Copy link
Contributor

@vilterp thanks for taking it so far. I'll keep this issue open and unassign you. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants