Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added member interaction into EtcdProcessCluster #14404

Closed
wants to merge 1 commit into from

Conversation

biosvs
Copy link
Contributor

@biosvs biosvs commented Aug 30, 2022

As I promised in #14354, trying to introduce to EtcdProcessCluster functions for changing cluster member list.

As an example rewrote e2e tests from mentioned pull request.

@biosvs biosvs force-pushed the member-interaction-into-epc branch from ac47a2e to 6ce3d4a Compare August 30, 2022 11:33
@ahrtr
Copy link
Member

ahrtr commented Sep 5, 2022

Please resolve the conflict.

@biosvs biosvs force-pushed the member-interaction-into-epc branch from 6ce3d4a to 3c46475 Compare September 5, 2022 08:21
@biosvs biosvs force-pushed the member-interaction-into-epc branch from 3c46475 to 76ab474 Compare September 5, 2022 08:41
@biosvs
Copy link
Contributor Author

biosvs commented Sep 5, 2022

@ahrtr done, rebased on main

@@ -516,6 +611,10 @@ func (epc *EtcdProcessCluster) Stop() (err error) {
return err
}

func (epc *EtcdProcessCluster) CtlClient() *EtcdctlV3 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (epc *EtcdProcessCluster) CtlClient() *EtcdctlV3 {
func (epc *EtcdProcessCluster) Client() *EtcdctlV3 {

Comment on lines +187 to +196
func (cfg *EtcdServerProcessConfig) SetInitialCluster(nodes []string, initialClusterState string) {
cfg.InitialCluster = strings.Join(nodes, ",")
cfg.Args = append(cfg.Args, "--initial-cluster", cfg.InitialCluster)
cfg.Args = append(cfg.Args, "--initial-cluster-state", initialClusterState)
}

func (cfg *EtcdServerProcessConfig) EnableDiscovery(token string, endpoints []string) {
cfg.Args = append(cfg.Args, fmt.Sprintf("--discovery-token=%s", token))
cfg.Args = append(cfg.Args, fmt.Sprintf("--discovery-endpoints=%s", strings.Join(endpoints, ",")))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not implement setters for configuration structs. I think we should avoid mutating args when constructing EtcdServerProcess. One solution would be to separate args provided by caller (extraArgs) from args generated from configuration fields (configArgs).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@biosvs After you resolve #14404 (comment), then this comment should can be resolved automatically.

Copy link
Member

@serathius serathius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm not big fan of complicating EtcdProcessCluster code. From brief look at the code looks like you implement cluster resizes just for this test. I think dynamic cluster sizes over-complicate the code and are not worth it.

Instead of complicating library used by hundreds of tests, could we make just your test simpler?

@biosvs
Copy link
Contributor Author

biosvs commented Sep 5, 2022

I was asked by @ahrtr to do it. In general I'm completely okay with the way my test is currently written.
Honestly, idk how to simplify test without changing in common libraries. Maybe it's better to not to change anything at all (then feel free to reject PR, I'm okay with it).

@serathius
Copy link
Member

For me problem is making list of members dynamic, which would be only used for this test. Could we maybe check if the test itself can be simpler? For testing endpoints-auto-sync-interval do we really need to start a whole etcd process?

My understanding is that we are just looking for a Resolver state updated log in proxy. I think that just calling a MemberAdd on cluster should be enough.

@biosvs
Copy link
Contributor Author

biosvs commented Sep 5, 2022

Hmm, can't agree here.
Even using logs for making progress in tests make me uncomfortable, honestly. But I definitely would not say that something is working based on logs. I prefer to think that "it doesn't work until you run it explicitly and get expected result".

@ahrtr
Copy link
Member

ahrtr commented Sep 9, 2022

I do not get time to have a deeper review on this PR, but my immediate feeling is it's a little over complicated.

The gap mentioned in #14354 (comment) is that the current test framework has interface MemberAdd, but doesn't have interface method something like MemberLaunch or LaunchMember or StartMember or MemberStart. I think we do need to support dynamically changing the member in the framework.

Please always keep OCP (Open Close principle) in mind: Open for extension, but close for modification. When updating existing framework, please try not to change the existing logic. Instead, please try to make your new code adapt to the existing framework.

@biosvs
Copy link
Contributor Author

biosvs commented Sep 12, 2022

I'm afraid we're stuck with this PR (because we're in a kinda closed loop).

Changed e2e test cannot be rewritten with common or integration framework, because we're testing interaction through different running instances (including updates via network).

The same story is here: https://github.com/etcd-io/etcd/blob/main/tests/e2e/ctl_v3_snapshot_test.go#L258

Both these tests need actual member list changes. Both tests do it via raw commands. I tried to add these commands to e2e framework, but you think it's unnecessary. You're contributors, and if you're saying that it's easer to use raw commands for couple of tests - let it be.

func (cfg *EtcdServerProcessConfig) EnableDiscovery(token string, endpoints []string) {
cfg.Args = append(cfg.Args, fmt.Sprintf("--discovery-token=%s", token))
cfg.Args = append(cfg.Args, fmt.Sprintf("--discovery-endpoints=%s", strings.Join(endpoints, ",")))
}
Copy link
Member

@ahrtr ahrtr Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two methods (SetInitialCluster and EnableDiscovery) are only used by SetInitialOrDiscovery, it seems that we can just get the code included in it directly instead of creating two separate methods.

@ahrtr
Copy link
Member

ahrtr commented Sep 13, 2022

When I tried to enhance the existing e2e test framework, eventually it's similar to this PR. But the interface definition in my commit looks clearer, I also get rid of the nextSeq. Please consider to enhance this PR per ahrtr@9649fd7

@codecov-commenter
Copy link

Codecov Report

Merging #14404 (9649fd7) into main (4d57eb8) will decrease coverage by 0.72%.
The diff coverage is 48.48%.

❗ Current head 9649fd7 differs from pull request most recent head 76ab474. Consider uploading reports for the commit 76ab474 to get more accurate results

@@            Coverage Diff             @@
##             main   #14404      +/-   ##
==========================================
- Coverage   75.47%   74.75%   -0.73%     
==========================================
  Files         457      457              
  Lines       37183    37207      +24     
==========================================
- Hits        28065    27813     -252     
- Misses       7361     7612     +251     
- Partials     1757     1782      +25     
Flag Coverage Δ
all 74.75% <48.48%> (-0.73%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
client/v3/mock/mockserver/mockserver.go 0.00% <0.00%> (ø)
server/etcdserver/raft.go 88.29% <71.42%> (-1.48%) ⬇️
etcdctl/ctlv3/command/move_leader_command.go 73.68% <100.00%> (+2.63%) ⬆️
server/etcdserver/server.go 84.63% <100.00%> (-0.93%) ⬇️
server/proxy/grpcproxy/election.go 9.09% <0.00%> (-72.73%) ⬇️
server/proxy/grpcproxy/lock.go 33.33% <0.00%> (-66.67%) ⬇️
...ver/proxy/grpcproxy/adapter/lock_client_adapter.go 33.33% <0.00%> (-66.67%) ⬇️
...proxy/grpcproxy/adapter/election_client_adapter.go 6.89% <0.00%> (-62.07%) ⬇️
etcdutl/etcdutl/migrate_command.go 21.05% <0.00%> (-53.95%) ⬇️
...xy/grpcproxy/adapter/maintenance_client_adapter.go 17.14% <0.00%> (-17.15%) ⬇️
... and 42 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@biosvs
Copy link
Contributor Author

biosvs commented Sep 13, 2022

Maybe I didn't get the idea, but code looks broken for me:

  1. Cluster state is omitted
    It's necessary to pass "initial-cluster-state existing" when new member is added (I think that was the reason to add SetInitialCluster to EtcdServerProcessConfig).

  2. nextSeq
    With "epc.Cfg.ClusterSize", you'll try to reuse occupied ports in case of this sequence:

  • Start cluster of >1 nodes
  • Remove member (with least ports number)
  • Add member (it will try to use the same ports as the member with highest ports number)

@ahrtr
Copy link
Member

ahrtr commented Sep 13, 2022

The commit is just to demonstrate the high level idea instead of a PR ready for review.

@biosvs
Copy link
Contributor Author

biosvs commented Sep 13, 2022

Then I didn't get the idea, sorry.

You got rid of things that was introduced in order to make code workable.

@ahrtr
Copy link
Member

ahrtr commented Sep 14, 2022

2. nextSeq
With "epc.Cfg.ClusterSize", you'll try to reuse occupied ports in case of this sequence:

The initial value for nextSq is cfg.ClusterSize, and incremental by 1 each time calling StartNewProc. I don't understand why it can't be removed and use cfg.ClusterSize or len(epc.Procs) directly. You just need to update cfg.ClusterSize and epc.Procs each time after calling StartNewProc.

@biosvs
Copy link
Contributor Author

biosvs commented Sep 15, 2022

Let's assume that initial configuration is:

  • base port 2000
  • 2 nodes

Then node A has port 2000 + 15 = 2005, node B has port 2000 + 25 = 2010.

Now imaging that I'm first removing node A and then adding new node C. What will I get?

If I use cfg.ClusterSize or len(epc.Procs), I'll get port for new node = 2000 + 2*5 = 2010, which is overlapping with already running node B.
That's why I need some counter that doesn't depend on the current state of cluster. That's why nextSq was introduced.

@ahrtr
Copy link
Member

ahrtr commented Sep 15, 2022

Let's assume that initial configuration is:

* base port 2000

* 2 nodes

Then node A has port 2000 + 1_5 = 2005, node B has port 2000 + 2_5 = 2010.

Now imaging that I'm first removing node A and then adding new node C. What will I get?

If I use cfg.ClusterSize or len(epc.Procs), I'll get port for new node = 2000 + 2*5 = 2010, which is overlapping with already running node B. That's why I need some counter that doesn't depend on the current state of cluster. That's why nextSq was introduced.

Make sense to me, thanks for the clarification.

@ahrtr
Copy link
Member

ahrtr commented Oct 12, 2022

@biosvs Could you resolve the inline comments? Please also rebase this PR.

@ahrtr
Copy link
Member

ahrtr commented Oct 14, 2022

Continue to work on this PR in #14589

@ahrtr ahrtr closed this Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants