Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Rebalance replicas based on resource utilization not completed QPS #34590

Closed
bdarnell opened this issue Feb 5, 2019 · 3 comments
Closed
Labels
A-admission-control A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@bdarnell
Copy link
Contributor

bdarnell commented Feb 5, 2019

The load-based rebalancing system uses QPS as its metric. This is subject to an interesting kind of negative feedback: When a node is overloaded, it starts to slow down, reducing its QPS and the urgency of rebalancing. We recently saw one cluster where this effect was so severe that the overloaded node actually had below-average QPS for the cluster, so ranges weren't getting rebalanced away from it.

Instead of QPS, we should be tracking lower-level metrics like the utilization of cpu and disk. In this case the workload was write-heavy and it was getting throttled by the disk (I think disk I/O may have increased super-linearly with the query load because of the write amplification caused by compactions).

@awoods187 awoods187 added this to the 2.2 milestone Feb 7, 2019
@awoods187 awoods187 added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-distribution Relating to rebalancing and leasing. labels Mar 6, 2019
@irfansharif
Copy link
Contributor

This is still relevant.

@irfansharif
Copy link
Contributor

Labeling with A-admission-control somewhat liberally because I think it's well suited to provide resource utilization + attribution signals (though it doesn't have to, we could get to it through other means).

@irfansharif
Copy link
Contributor

Filed #83490 to carry this forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-admission-control A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

No branches or pull requests

4 participants