Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: support raft learner in etcd - part 3 #10730

Merged
merged 13 commits into from
May 29, 2019
Merged

Conversation

jingyih
Copy link
Contributor

@jingyih jingyih commented May 16, 2019

Last part of #10645. Continuation of #10727.

This PR incldues the last 12 commits in #10645. Some of the commits are modified during rebasing to #10727.

(ordered by: latest commits on top)

b433162c0b9ea4d13e9a0dd49b1f51e20896cbcc integration: update TestMemberPromote test
ae08d06eb991b66d1de77912da0fe4c8ff03a35b etcdserver: update raftStatus
3f937cc459db33b9fd385d14a98b6747e11e93df etcdserver: check http StatusCode before unmarshal
ec1928e7b44fb14b53bd878eb37283ef1a144dec etcdserver: use etcdserver ErrLearnerNotReady
a8f87b83be4fe9f826104c8ac62361c30d779de3 etcdserver: forward member promote to leader
dc79587cd1b05007516544db0bb576b70436aca3 etcdserver: add mayPromote check
c6298843d77162ebb62a256fc2bcde4bcd190153 Doc: add learner in runtime-configuration.md
90956f77372a2deb5094259f9f02ebcf384bf9d3 clientv3/integration: better way to deflake test
2e87a9a2f66b70e0151b8a388e8760f99910b2d7 etcdserver: allow 1 learner in cluster
678d86e2db594368e12e868223e2edde4e3bf183 etcdserver: check IsMemberExist before IsLearner
6d8abba69fd85b1cc5e33b79054ebadc321aa148 etcdserver: learner return Unavailable for unsupported RPC
d93fecc3750f0cfb7b302de599b7ade99037127b etcdserver: adjust StrictReconfigCheck

cc @xiang90

(see [error cases when promoting a member] section for more details).
In this case, user should wait and retry later.

In v3.4, etcd server limits the number of learners that cluster can have to one. The main consideration is to limit the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should make this configurable. One seems to be a small number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some people want to increase read perf. So they want more learners for example.

Copy link
Contributor Author

@jingyih jingyih May 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @gyuho

I kind of agree that 1 is not enough for some use cases, such as upsizing cluster from 1 to 3, or 3 to 5, and live-migrating a 3-node cluster to a new 3-node cluster. On the other hand, we do not want users to add too many learners at the same time, which might result in too much overhead for the leader. One goal of using learner is to make add members safer - not if it results in too much overhead on leader and then cause the cluster to fail. I think we can hard-code the limit to a small number, and make it configurable as an experimental feature in 3.4.

Copy link
Contributor Author

@jingyih jingyih May 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configuring this limit is cluster-wide reconfiguration (very similar to member change), which means we need to add an API, and maybe a new config change type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiang90 Do we keep the limit of 1? I do not have strong opinions on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us keep this as it is for now. we can remove the limit later.

@jingyih
Copy link
Contributor Author

jingyih commented May 24, 2019

ping @xiang90

@gyuho
Copy link
Contributor

gyuho commented May 28, 2019

CIs have been broken. Should be fix now. Could you rebase?

@@ -1635,6 +1635,38 @@ func (s *EtcdServer) RemoveMember(ctx context.Context, id uint64) ([]*membership

// PromoteMember promotes a learner node to a voting node.
func (s *EtcdServer) PromoteMember(ctx context.Context, id uint64) ([]*membership.Member, error) {
resp, err := s.promoteMember(ctx, id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wont this go through raft? why do we need to forward the promote request to leader in a side channel after it goes through raft?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to decide if the learner is ready to be promoted, before sending the request to raft. Only etcd server whose local raft node is leader has the information on learner's progress. So the request is first routed to leader.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. now i understand we actually check if this local node is leader inside the s.promoteMember func. I guess we should just move that checking code out of the s.promoteMember func. so it would be more clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead I added comments 23511d2 to explain checking local node is leader is part of the function.


cctx, cancel := context.WithTimeout(ctx, s.Cfg.ReqTimeout())
defer cancel()
// forward to leader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably comment on why do we need to forward to leader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@xiang90
Copy link
Contributor

xiang90 commented May 29, 2019

LGTM after resolving the merge conflicts and the nits.

jingyih and others added 13 commits May 28, 2019 18:47
Adjust StrictReconfigCheck logic to accommodate learner members in the
cluster.
Make learner return code.Unavailable when the request is not supported
by learner. Client balancer will retry a different endpoint.
If member does not exist in cluster, IsLearner will panic.
Hard-coded the maximum number of learners to 1.
Use ReadyNotify instead of time.Sleep to wait for server ready.
If learner is not ready to be promoted, use etcdserver.ErrLearnerNotReady
instead of using membership.ErrLearnerNotReady.
Check http StatusCode. Only Unmarshal body if StatusCode is statusOK.
Update TestMemberPromote to include both learner not-ready and learner
ready test cases.

Removed unit test TestPromoteMember, it requires underlying raft node to
be started and running. The member promote is covered by the integration
test.
@jingyih
Copy link
Contributor Author

jingyih commented May 29, 2019

Rebased.

@xiang90
Copy link
Contributor

xiang90 commented May 29, 2019

@jingyih can you check why the CI failed?

@jingyih
Copy link
Contributor Author

jingyih commented May 29, 2019

@jingyih can you check why the CI failed?

Travis failed with a known flaky test:
https://travis-ci.com/etcd-io/etcd/jobs/203753990#L2118

Semaphore failed with 20m timeout, which is also known:
#10700 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants