-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI Failure (TimeoutError in wait_for_partitions_rebalanced) in ScalingUpTest.test_on_demand_rebalancing
#10024
Comments
This issue may be fixed by 4bdccb2. If it is not, we need to consider that a redpanda bug because the unevenness of |
This also may be a duplicate of #7756. The https://buildkite.com/redpanda/redpanda/builds/27434#0187969e-69c6-4591-97e1-eb29f9ed90f3 (reported above by @michael-redpanda) is in v22.3 branch where #9622 is not backported yet. |
That was a Still a bit of analysis:
In this case domain 0 looks totally unbalanced in the end. However this is how the last reconciliation loop looks like before the cluster got to that state:
So the balancer has decided that this distribution is fine and gave up. |
seen during a backport to v22.3.x https://buildkite.com/redpanda/redpanda/builds/27949#0187b8cf-b153-437c-ad1e-daefc5d194f0 |
on (amd64, container) in job https://buildkite.com/redpanda/redpanda/builds/29202#0188230e-0d2e-49a1-91c3-149aeea19128 |
Sometimes it fails in debug mode because of a stuck partition movement problem (that @bharathv tracked to a possible RPC bug). |
Closing old issues that have not occurred in 2 months. |
https://buildkite.com/redpanda/redpanda/builds/26854#018770fe-e373-4dff-8222-d485b1468767
May also happen in
test_adding_nodes_to_cluster
.This one is related to #7418 and is a follow-up to its fix made in #9947.
The failure is manifested with this:
The distribution of replicas across nodes here is
[2,2,4,2,2]
. 12 replicas across 5 nodes givesexpected_per_node==2.4
, the upper bound ofexpected_range
is2.4*1.2=2.88
and that value rounds up to3
. So it appears that when the overall count of replicas is low, 20% tolerance is not enough in the test criteria.The text was updated successfully, but these errors were encountered: