refactor: yurtctl convert use node-servant in dispatch job #617

DrmagicE · 2021-11-23T02:37:14Z

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:
/kind bug
/kind documentation
/kind enhancement
/kind good-first-issue
/kind feature
/kind question
/kind design
/sig ai
/sig iot
/sig network
/sig storage
/sig storage

/kind feature

What this PR does / why we need it:

This PR makes the following changes:

Remove convert cloudnode/edgenode sub command.
Replace the "yurt-servant" job in yurtctl with "node-servant".
Automatically set "--cert-ip" to yurt-tunnel-server if the user specify the tunnel server address in "yurtctl convert".

The documentation needs to be updated too, I will create another PR to revise the doc later.

Which issue(s) this PR fixes:

Fixes #546

Special notes for your reviewer:

/assign @rambohe-ch @adamzhoul

Does this PR introduce a user-facing change?

The subcommand  "cloudnode" and "edgenode" are removed from "yurtctl convert" .
Flag "--yurtctl-servant-image" in "yurtctl convert" is replaced by "--node-servant-image".

other Note

openyurt-bot · 2021-11-23T02:37:18Z

@DrmagicE: GitHub didn't allow me to assign the following users: your_reviewer.

Note that only openyurtio members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:
/kind bug
/kind documentation
/kind enhancement
/kind good-first-issue
/kind feature
/kind question
/kind design
/sig ai
/sig iot
/sig network
/sig storage
/sig storage

/kind feature

What this PR does / why we need it:

This PR makes the following changes:

Remove convert cloudnode/edgenode sub command.

Replace the "yurt-servant" job in yurtctl with "node-servant".

Automatically set "--cert-ip" to yurt-tunnel-server if the user specify the tunnel server address in "yurtctl convert".

The documentation needs to be updated too, I will create another PR to revise the doc later.

Which issue(s) this PR fixes:

Fixes #546

Special notes for your reviewer:

/assign @rambohe-ch @adamzhoul

Does this PR introduce a user-facing change?
The subcommand  "cloudnode" and "edgenode" are removed from "yurtctl convert" .
Flag "--yurtctl-servant-image" in "yurtctl convert" is replaced by "--node-servant-image".
other Note

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

DrmagicE · 2021-11-23T06:29:36Z

pkg/yurtctl/cmd/convert/convert.go


 	// 10. deploy yurt-hub and reset the kubelet service on cloud nodes
 	klog.Infof("deploying the yurt-hub and resetting the kubelet service on cloud nodes")
-	ctx["sub_command"] = "cloudnode"
+	ctx["working_mode"] = "cloud"
 	if err = kubeutil.RunServantJobs(co.clientSet, ctx, co.CloudNodes); err != nil {
 		klog.Errorf("fail to run ServantJobs: %s", err)


The klog.Error() seems a bit redundant, because the returned error message will be printed by the caller.
How about we remove those klog.Error() and just return the error message? For examples:

if err = kubeutil.RunServantJobs(co.clientSet, ctx, co.CloudNodes); err != nil { return fmt.Errorf("fail to run ServantJobs: %s", err) }

Peeknut · 2021-11-26T07:01:38Z

pkg/yurtctl/cmd/convert/convert.go

+				&node, projectinfo.GetEdgeWorkerLabelKey(), "true"); err != nil {
+				return
+			}
+			// mark edge node as autonomous


Why are edge nodes labeled with autonomy by default？
Can user can choose to turn on or off the autonomy?

@Peeknut Thanks for the review.
Label with autonomy by default is the current behaviour of yurtctl convert, and I think to enable node autonomy by default is reasonable, but I agree that we should add an option for customization.
yurtctl markautonomous provides --autonomous-nodes option to let users choose the autonomous nodes.
I suggest adding this option to yurtctl convert too. What do you think?

Yes, I agree.

rambohe-ch · 2021-11-26T07:18:15Z

/assign @adamzhoul @Peeknut

rambohe-ch · 2021-11-26T07:45:57Z

@DrmagicE if yurttunnel-server-address is setup, we also need to add parameter --tunnelserver-addr=xxxx for yurt-tunenl-agent.

pkg/yurtctl/cmd/convert/convert.go

DrmagicE · 2021-11-26T09:04:21Z

@DrmagicE if yurttunnel-server-address is setup, we also need to add parameter --tunnelserver-addr=xxxx for yurt-tunenl-agent.

@rambohe-ch thanks for the review.
I think --tunnelserver-addr=xxxx is already there.

openyurt/pkg/yurtctl/constants/yurt-tunnel-agent-tmpl.go

Lines 45 to 47 in 3b02997

    
                     {{if  .tunnelServerAddress }} 
        
                   - --tunnelserver-addr={{.tunnelServerAddress}} 
        
                     {{end}}

rambohe-ch · 2021-11-26T09:19:47Z

@DrmagicE if yurttunnel-server-address is setup, we also need to add parameter --tunnelserver-addr=xxxx for yurt-tunenl-agent.

@rambohe-ch thanks for the review. I think --tunnelserver-addr=xxxx is already there.

openyurt/pkg/yurtctl/constants/yurt-tunnel-agent-tmpl.go

Lines 45 to 47 in 3b02997

{{if .tunnelServerAddress }}

- --tunnelserver-addr={{.tunnelServerAddress}}

{{end}}

@DrmagicE ok, i have found it.

adamzhoul · 2021-11-27T04:28:09Z

pkg/yurtctl/cmd/convert/convert.go


 	// 10. deploy yurt-hub and reset the kubelet service on cloud nodes
 	klog.Infof("deploying the yurt-hub and resetting the kubelet service on cloud nodes")
-	ctx["sub_command"] = "cloudnode"
+	ctx["working_mode"] = "cloud"


what if use const? util.WorkingModeCloud

Thanks for the review. Agree, I will fix it.

adamzhoul · 2021-11-27T04:33:03Z

pkg/yurtctl/util/kubernetes/util.go

@@ -501,8 +500,8 @@ func RunServantJobs(cliSet *kubernetes.Clientset, tmplCtx map[string]string, nod
 	}
 	switch action {
 	case "convert":
-		servantJobTemplate = constants.ConvertServantJobTemplate
-		jobBaseName = ConvertJobNameBase
+		servantJobTemplate = nodeservant.ConvertServantJobTemplate


can we try to use node-servant/job.go

func RenderNodeServantJob(action string, tmplCtx map[string]string, nodeName string) (*batchv1.Job, error)

I set that as the interface between modules, instead of reading the template directly.

@adamzhoul Oh, I didn't see this. I agree this is a better way.
I will adjust the RunServantJobs function as follow to suit the change.

// RunServantJobs launch servant jobs on specified nodes - func RunServantJobs(cliSet *kubernetes.Clientset, tmplCtx map[string]string, nodeNames []string) error { + func RunServantJobs(cliSet *kubernetes.Clientset, job *batchv1.Job, nodeNames []string) error {

@adamzhoul Hi, I just realize that the job name is related to the name of the node that it is running on. So the RunServantJobs should be able to get the job object for specific nodes. For example, the new RunServantJobs may look like:

// RunServantJobs launch servant jobs on specified nodes func RunServantJobs(cliSet *kubernetes.Clientset, getJob func(nodeName string) *batchv1.Job, nodeNames []string) error { var wg sync.WaitGroup for _, nodeName := range nodeNames { job := getJob(nodeName) wg.Add(1) go func() { defer wg.Done() if err := RunJobAndCleanup(cliSet, job, WaitServantJobTimeout, CheckServantJobPeriod); err != nil { klog.Errorf("fail to run servant job(%s): %s", job.GetName(), err) } else { klog.Infof("servant job(%s) has succeeded", job.GetName()) } }() } wg.Wait() return nil }

As RunServantJobs is also used by yurtctl revert, this change will lead to lots of changes. How about we leave a TODO comment here and refactor it in another PR?

adamzhoul · 2021-11-27T04:34:41Z

pkg/yurtctl/cmd/convert/convert.go

 				return
 			}
 		}
 	}
+
+	// 1.3 check the nodes label


you see, we label the node before we actually do the converting thing.
so, converting failed may lead to the node being labeled but users have to yurtctl convert again
because the log tells them failed.

so it leads to chooses:

keep it. Users must know, they have to run yurt revert after failing when they want to do it again. And they may not know and often forget.

we make yurtctl convert idempotent which means I don't care if nodes have been converted or not, I can do it whenever I want, with no mind pressure or side effect.

we make sure only if the node has been converted successfully we label the node. so we have to unlabeled the node after failure or move labeling to the end.

personal I like the way 2, because it brings a good experience. but it introduces complicated work too.

way 2 <-- agree +1

@adamzhoul @rambohe-ch I agree with you that way 2 is the most user-friendly way, but it also increases the complexity of yurtctl convert. If we make yurtctl convert idempotent, we have to consider every change made by yurtctl convert and figure an idempotent way and sometimes lead to tricky situations.

For example, the user has converted the cluster and was enabling autonomy on node1 and node2. Then the user finds that only node2 should enable autonomy feature, so in the next converting, the user only configures node1 for node autonomy. In this case, should we cancel the node autonomy for node2?

IMO, as we are going to introduce the precheck command for yurtctl and I think it is enough to prevent users from yurtctl convert failure.

Besides, kubeadm init/join is not idempotent. I think as long as we can provide an acceptable success rate for yurtctl convert, then idempotent is not very necessary.

What do I think of idempotent?

I don't know why I failed last time, I don't know why and don't know what to do.
try again is the most natural reaction and the only way I know.

and it is something like ultimate state, when I typed it I want it.

if implemented idempotent

For example, the user has converted the cluster and was enabling autonomy on node1 and node2. Then the user finds that only node2 should enable autonomy feature, so in the next converting, the user only configures node1 for node autonomy. In this case, should we cancel the node autonomy for node2?

I think the command: yurtctl convert ... represents the ultimate state user expect.
so, we don't care what the current state is, failed job、 different label, etc...
just, leads the cluster to what users expect.

as for this case: yes, we should cancel node2

IMO, as we are going to introduce the precheck command for yurtctl and I think it is enough to prevent users from yurtctl convert failure.

some failing may not be checked by precheck. like, pull image timeout.
I think we have got much feedback from the community
something like:

E1125 21:09:13.772921 19268 util.go:539] fail to run servant job(yurtctl-servant-convert-k8smaster): jobs.batch "yurtctl-servant-convert-k8smaster" already exists

they do know yurtctl convert failed, that's all.
they don't know what the job log means and don't know what to do.

Do we have to implement it this time?

not necessary I think.
it may lead to too many updates, we should think more carefully.

I think as long as we can provide an acceptable success rate for yurtctl convert, then idempotent is not very necessary.

if we want to handle failure issues, it's unnecessary if we achieved an acceptable success rate. like you said.
if we consider it is something like ultimate state, users can execute many many times for updates. it's gonna be a capability we give to users.

What should we fix this time I think?

the label or failed job should not stop users from trying again.
unless we are sure the label can really represent the job is done well.

@adamzhoul Hi, according to our previous discussion, we will not cover idempotent in this PR, so I leave it unchanged.

1. Remove convert cloudnode/edgenode sub command. 2. Replace the "yurt-servant" job in yurtctl with "node-servant". 3. Automatically set "--cert-ip" to yurt-tunnel-server if the user specify the tunnel server address in "yurtctl convert". 4. Add "--autonomous-nodes" option to "yurtctl convert".

DrmagicE · 2021-11-30T10:09:19Z

@Peeknut @rambohe-ch @adamzhoul Thanks again for your all reviews. The code has been updated according to your review comments. Please have a look.

rambohe-ch · 2021-11-30T12:07:01Z

/lgtm

@adamzhoul @Peeknut

adamzhoul · 2021-11-30T15:04:34Z

/lgtm

Peeknut · 2021-12-01T02:28:44Z

/lgtm

rambohe-ch · 2021-12-01T03:05:52Z

/approve

openyurt-bot · 2021-12-01T03:06:04Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: DrmagicE, rambohe-ch

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [rambohe-ch]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

adamzhoul · 2021-12-01T15:10:10Z

a question suddenly comes to mind.
after this is merged.
is yurtctl revert still working?

since node-servant handles kubelet in a different approach.

@DrmagicE @rambohe-ch

DrmagicE · 2021-12-01T15:23:59Z

is yurtctl revert still working?

@adamzhoul No, it doesn't. yurtctl revert will work again after finishing #547
I will start this weekend.

…o#617) 1. Remove convert cloudnode/edgenode sub command. 2. Replace the "yurt-servant" job in yurtctl with "node-servant". 3. Automatically set "--cert-ip" to yurt-tunnel-server if the user specify the tunnel server address in "yurtctl convert". 4. Add "--autonomous-nodes" option to "yurtctl convert". Co-authored-by: [email protected] <[email protected]>

openyurt-bot assigned adamzhoul and rambohe-ch Nov 23, 2021

openyurt-bot added the kind/feature kind/feature label Nov 23, 2021

openyurt-bot requested review from charleszheng44 and kadisi November 23, 2021 02:37

openyurt-bot added the size/XL size/XL: 500-999 label Nov 23, 2021

DrmagicE changed the title ~~feature: yurtctl convert use node-servant in dispatch job~~ refactor: yurtctl convert use node-servant in dispatch job Nov 23, 2021

DrmagicE commented Nov 23, 2021

View reviewed changes

Peeknut reviewed Nov 26, 2021

View reviewed changes

openyurt-bot assigned Peeknut Nov 26, 2021

rambohe-ch reviewed Nov 26, 2021

View reviewed changes

pkg/yurtctl/cmd/convert/convert.go Outdated Show resolved Hide resolved

adamzhoul reviewed Nov 27, 2021

View reviewed changes

openyurt-bot added the lgtm lgtm label Nov 30, 2021

rambohe-ch approved these changes Dec 1, 2021

View reviewed changes

openyurt-bot added the approved approved label Dec 1, 2021

openyurt-bot merged commit dd2b2b4 into openyurtio:master Dec 1, 2021

Congrool mentioned this pull request Jan 29, 2022

[bug] add NoArgs check for cmds #728

Merged

refactor: yurtctl convert use node-servant in dispatch job #617

refactor: yurtctl convert use node-servant in dispatch job #617

Conversation

DrmagicE commented Nov 23, 2021

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

other Note

openyurt-bot commented Nov 23, 2021

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

other Note

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DrmagicE Nov 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rambohe-ch commented Nov 26, 2021

rambohe-ch commented Nov 26, 2021

DrmagicE commented Nov 26, 2021

rambohe-ch commented Nov 26, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

What do I think of idempotent?

if implemented idempotent

Do we have to implement it this time?

What should we fix this time I think?

Choose a reason for hiding this comment

DrmagicE commented Nov 30, 2021

rambohe-ch commented Nov 30, 2021 • edited Loading

adamzhoul commented Nov 30, 2021

Peeknut commented Dec 1, 2021

rambohe-ch commented Dec 1, 2021

openyurt-bot commented Dec 1, 2021

adamzhoul commented Dec 1, 2021

DrmagicE commented Dec 1, 2021

DrmagicE Nov 26, 2021 •

edited

Loading

rambohe-ch commented Nov 30, 2021 •

edited

Loading