-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests/integration: deflake Corruption cases #14824
tests/integration: deflake Corruption cases #14824
Conversation
If the corrupted member has been elected as leader, the memberID in alert response won't be the corrupted one. It will be a smaller follower ID since the raftCluster.Members always sorts by ID. We should check the leader ID and decide to use which memberID. Fixes: etcd-io#14823 Signed-off-by: Wei Fu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great find!
This flake shows that there is a issue with meaning of
Might be worth creating an issue to fix it, however I don't think this can be easily fixed. At least I don't a solution now. First thoughts: validating that which member is corrupted requires separate quorum of members. In 3 node clusters you need all 3 members up (2 members for quorum + 1 corrupted member), in 5 node clusters you need (3 members for quorum and 1 corrupted member). One possible fix would be reporting |
Thanks @fuweid for fixing the flaky test case, but it should be a product/etcd issue instead of a test case issue, so we should fix product code. At least, we should leave a TODO comment in the modified test case. Actually I raised this comment when I was reviewing the code more than 4 months ago, #14120 (comment). We should depend on quorum to identify the corrupted member, let me deliver a PR for this. |
The change did in etcd-io#14824 fixed the test instead of the product code. It isn't correct. After we fixed the product code in this PR, we can revert the change in that PR. Signed-off-by: Benjamin Wang <[email protected]>
@ahrtr Looking forward to your pr! |
The change did in etcd-io#14824 fixed the test instead of the product code. It isn't correct. After we fixed the product code in this PR, we can revert the change in that PR. Signed-off-by: Benjamin Wang <[email protected]>
The change did in etcd-io#14824 fixed the test instead of the product code. It isn't correct. After we fixed the product code in this PR, we can revert the change in that PR. Signed-off-by: Benjamin Wang <[email protected]>
The change did in etcd-io#14824 fixed the test instead of the product code. It isn't correct. After we fixed the product code in this PR, we can revert the change in that PR. Signed-off-by: Benjamin Wang <[email protected]>
The change did in etcd-io#14824 fixed the test instead of the product code. It isn't correct. After we fixed the product code in this PR, we can revert the change in that PR. Signed-off-by: Benjamin Wang <[email protected]> Signed-off-by: Marek Siarkowicz <[email protected]>
If the corrupted member has been elected as leader, the memberID in alert response won't be the corrupted one. It will be a smaller follower ID since the raftCluster.Members always sorts by ID. We should check the leader ID and decide to use which memberID.
Fixes: #14823
Signed-off-by: Wei Fu [email protected]
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.