Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] LoadBalancerMiniClusterTest.CheckLoadBalanceDriveAware fails in release builds #25441

Open
1 task done
spolitov opened this issue Dec 27, 2024 · 0 comments
Open
1 task done
Assignees
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature kind/failing-test Tests and testing infra priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage

Comments

@spolitov
Copy link
Contributor

spolitov commented Dec 27, 2024

Jira Link: DB-14682

Description

Reason 1

../../src/yb/integration-tests/load_balancer_mini_cluster-test.cc:656
Value of: found
Actual: false
Expected: true

Reason 2

../../src/yb/integration-tests/load_balancer_mini_cluster-test.cc:137
Failed
Bad status: Timed out (yb/util/backoff_waiter.cc:78): Operation 'IsLoadBalancerIdle' didn't complete within 30000ms

Issue Type

kind/failing-test

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@spolitov spolitov added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Dec 27, 2024
@spolitov spolitov self-assigned this Dec 27, 2024
@yugabyte-ci yugabyte-ci added kind/failing-test Tests and testing infra priority/medium Medium priority issue kind/enhancement This is an enhancement of an existing feature labels Dec 27, 2024
spolitov added a commit that referenced this issue Dec 28, 2024
Summary:
Cluster balancer tries to pick tablets for move in the manner to guarantee even disk load after move.
But move consists of 2 steps, on the first step we add tablet to a new replica.
On the second step we remove overreplicated tablet from some replica.
So there is no guarantee that this tablet will be removed from the replica where it recides on the most loaded disk.

Fixed by taking disk load into account when picking replica to remove tablet.

Also test LoadBalancerMiniClusterTest.CheckLoadBalanceDriveAware could fail because times out waiting for cluster balance.
It happens because we cannot balance leaders. Since have to wait for 20s before step down if protege already lost leader election.
Fixed by decreasing this time to 0s in this test.
Jira: DB-14682

Test Plan: ./yb_build.sh release -n 800 --cxx-test integration-tests_load_balancer_mini_cluster-test --gtest_filter LoadBalancerMiniClusterTest.CheckLoadBalanceDriveAware -- -p 8

Reviewers: zdrudi, asrivastava

Reviewed By: asrivastava

Subscribers: ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D40922
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature kind/failing-test Tests and testing infra priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage
Projects
None yet
Development

No branches or pull requests

2 participants