-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI Failure (timeout on wait_until recommissioned) in RandomNodeOperationsTest.test_node_operations
#8179
Comments
https://buildkite.com/redpanda/vtools/builds/5252#0185b99b-8c5e-4e0a-b084-b0ac2f2cebe0
|
https://buildkite.com/redpanda/redpanda/builds/21730#0185e271-3c89-4a4d-b423-66adeeb98842
|
The test is trying to update a topic which was deleted. It looks like it's the same issue as #8437 fixes
|
It looks like we have a comeback - https://buildkite.com/redpanda/redpanda/builds/22029
|
I scrolled up further in the log before the timeout error and I see
which sort of makes me think that the timeout error is a symptom of this other in the test that is failing? looks a bit like another parsing issue? |
@dotnwat keep scrolling :) this error happens after the original, then the test shuts down redpanda, empties started_nodes and the still active background thread fails with |
oh my! i didn't scroll far enough! |
This is a race condition, i need to fix this, it is already fixed with the new simplified raft configuration handling |
this issue is related with the recent change in Raft where we do not allow canceling raft reconfiguration if a replica is a laerner in old raft configuration |
…d to learner A recent change in raft configuration cancellation logic prevents raft re-configuration canceling when nodes to be removed are demoted to learners. Since 'moving back' is not longer possible we should prevent recommissioning a node when it is already a learner in previous part of raft joint configuration. The fix doesn't fix the race condition entirely but it reduces the possibility of it occurring. The race condition will be fixed with an introduction of simplified raft configuration. Also the race condition isn't really breaking anything. A user may receive success from recommission API however the node will simply be removed. It is a behavior that we accepted for partition movement cancel API. Fixes: redpanda-data#8179 Signed-off-by: Michal Maslanka <[email protected]>
…arner A recent change in raft configuration cancellation logic prevents raft re-configuration canceling when nodes to be removed are demoted to learners. Since 'moving back' is not longer possible we should prevent recommissioning a node when it is already a learner in previous part of raft joint configuration. The fix doesn't fix the race condition entirely but it reduces the possibility of it occurring. The race condition will be fixed with an introduction of simplified raft configuration. Also the race condition isn't really breaking anything. A user may receive success from recommission API however the node will simply be removed. It is a behavior that we accepted for partition movement cancel API. Fixes: redpanda-data#8179 Signed-off-by: Michal Maslanka <[email protected]>
Again on (amd64, VM) in job https://buildkite.com/redpanda/vtools/builds/6551#0186b501-1833-4bba-bfcd-c07fed590e5d |
…arner A recent change in raft configuration cancellation logic prevents raft re-configuration canceling when nodes to be removed are demoted to learners. Since 'moving back' is not longer possible we should prevent recommissioning a node when it is already a learner in previous part of raft joint configuration. The fix doesn't fix the race condition entirely but it reduces the possibility of it occurring. The race condition will be fixed with an introduction of simplified raft configuration. Also the race condition isn't really breaking anything. A user may receive success from recommission API however the node will simply be removed. It is a behavior that we accepted for partition movement cancel API. Fixes: redpanda-data#8179 Signed-off-by: Michal Maslanka <[email protected]> (cherry picked from commit 2a3bf5e)
https://buildkite.com/redpanda/redpanda/builds/21032#0185a377-7e27-497a-8ff3-ce678f93793a
see 58th retry
The text was updated successfully, but these errors were encountered: