Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver: return membership.ErrIDNotFound when the memberID not found #15095

Merged
merged 1 commit into from
Jan 16, 2023

Conversation

ahrtr
Copy link
Member

@ahrtr ahrtr commented Jan 13, 2023

This PR should fix the failure in https://github.com/etcd-io/etcd/actions/runs/3908224614/jobs/6678181444

cluster_test.go:259: expecting promote not ready learner to fail, got no error

The existing test cases TestMemberPromote* in cluster_test.go already covers the code path updated in this PR.

Signed-off-by: Benjamin Wang [email protected]

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

ahrtr added a commit to ahrtr/etcd that referenced this pull request Jan 13, 2023
Backport etcd-io#15095.

When promoting a learner, we need to wait until the leader's applied ID
catches up to the commitId. Afterwards, check whether the learner ID
exist or not, and return `membership.ErrIDNotFound` directly in the API
if the member ID not found, to avoid the request being unnecessarily
delivered to raft.

Signed-off-by: Benjamin Wang <[email protected]>
ahrtr added a commit to ahrtr/etcd that referenced this pull request Jan 13, 2023
Backport etcd-io#15095 to 3.4.

When promoting a learner, we need to wait until the leader's applied ID
catches up to the commitId. Afterwards, check whether the learner ID
exist or not, and return `membership.ErrIDNotFound` directly in the API
if the member ID not found, to avoid the request being unnecessarily
delivered to raft.

Signed-off-by: Benjamin Wang <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Jan 13, 2023

Codecov Report

Merging #15095 (8ed20e8) into main (677e528) will decrease coverage by 0.11%.
The diff coverage is 71.42%.

@@            Coverage Diff             @@
##             main   #15095      +/-   ##
==========================================
- Coverage   74.72%   74.61%   -0.12%     
==========================================
  Files         415      415              
  Lines       34340    34343       +3     
==========================================
- Hits        25661    25624      -37     
- Misses       7045     7077      +32     
- Partials     1634     1642       +8     
Flag Coverage Δ
all 74.61% <71.42%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
server/etcdserver/server.go 85.37% <71.42%> (-0.28%) ⬇️
server/proxy/grpcproxy/register.go 69.76% <0.00%> (-11.63%) ⬇️
client/v3/leasing/util.go 91.66% <0.00%> (-6.67%) ⬇️
server/etcdserver/api/v3rpc/watch.go 82.53% <0.00%> (-5.08%) ⬇️
client/v3/concurrency/session.go 85.10% <0.00%> (-4.26%) ⬇️
client/v3/experimental/recipes/priority_queue.go 63.33% <0.00%> (-3.34%) ⬇️
server/etcdserver/api/v3rpc/util.go 70.96% <0.00%> (-3.23%) ⬇️
client/v3/experimental/recipes/key.go 75.34% <0.00%> (-2.74%) ⬇️
client/v3/experimental/recipes/double_barrier.go 68.83% <0.00%> (-2.60%) ⬇️
client/v3/concurrency/election.go 79.68% <0.00%> (-2.35%) ⬇️
... and 13 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@ptabor ptabor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. The logic seems better than it used to.

Still I'm not confident the code will work well if e.g. during waitAppliedIndex the leader changes]... It's likely rs.Progress will report NULL in such a situation.

Shell we fetch

waitAppliedIndex(...)
progress:=rs.Progress
if (progress == nil) ... 
...
fir memberId, progress := range progress { ...

When promoting a learner, we need to wait until the leader's applied ID
catches up to the commitId. Afterwards, check whether the learner ID
exist or not, and return `membership.ErrIDNotFound` directly in the API
if the member ID not found, to avoid the request being unnecessarily
delivered to raft.

Signed-off-by: Benjamin Wang <[email protected]>
@ahrtr ahrtr force-pushed the promote_non_exist_id_20230113 branch from 2398d57 to 8ed20e8 Compare January 16, 2023 22:18
@ahrtr
Copy link
Member Author

ahrtr commented Jan 16, 2023

during waitAppliedIndex the leader changes]...

It seems not a problem. Even the leader changes during waitAppliedIndex, it might be also OK to promote the learner member as long as it already catches up to the (old) leader's log.

But actually it doesn't make sense to execute waitAppliedIndex after rs := s.raftStatus(), because the rs is a copy and will not change anymore. So Moved the waitAppliedIndex to the beginning of the method. PTAL.

ahrtr added a commit to ahrtr/etcd that referenced this pull request Jan 16, 2023
Backport etcd-io#15095.

When promoting a learner, we need to wait until the leader's applied ID
catches up to the commitId. Afterwards, check whether the learner ID
exist or not, and return `membership.ErrIDNotFound` directly in the API
if the member ID not found, to avoid the request being unnecessarily
delivered to raft.

Signed-off-by: Benjamin Wang <[email protected]>
ahrtr added a commit to ahrtr/etcd that referenced this pull request Jan 16, 2023
Backport etcd-io#15095 to 3.4.

When promoting a learner, we need to wait until the leader's applied ID
catches up to the commitId. Afterwards, check whether the learner ID
exist or not, and return `membership.ErrIDNotFound` directly in the API
if the member ID not found, to avoid the request being unnecessarily
delivered to raft.

Signed-off-by: Benjamin Wang <[email protected]>
@ahrtr
Copy link
Member Author

ahrtr commented Jan 16, 2023

thx @ptabor for the quick review.

@ahrtr ahrtr merged commit 81b3592 into etcd-io:main Jan 16, 2023
tjungblu pushed a commit to tjungblu/etcd that referenced this pull request Jul 26, 2023
Backport etcd-io#15095.

When promoting a learner, we need to wait until the leader's applied ID
catches up to the commitId. Afterwards, check whether the learner ID
exist or not, and return `membership.ErrIDNotFound` directly in the API
if the member ID not found, to avoid the request being unnecessarily
delivered to raft.

Signed-off-by: Benjamin Wang <[email protected]>
tjungblu pushed a commit to tjungblu/etcd that referenced this pull request Jul 26, 2023
Backport etcd-io#15095.

When promoting a learner, we need to wait until the leader's applied ID
catches up to the commitId. Afterwards, check whether the learner ID
exist or not, and return `membership.ErrIDNotFound` directly in the API
if the member ID not found, to avoid the request being unnecessarily
delivered to raft.

Signed-off-by: Benjamin Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants