Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: yurtctl convert use node-servant in dispatch job #617

Merged
merged 1 commit into from
Dec 1, 2021
Merged

refactor: yurtctl convert use node-servant in dispatch job #617

merged 1 commit into from
Dec 1, 2021

Conversation

DrmagicE
Copy link
Member

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:
/kind bug
/kind documentation
/kind enhancement
/kind good-first-issue
/kind feature
/kind question
/kind design
/sig ai
/sig iot
/sig network
/sig storage
/sig storage

/kind feature

What this PR does / why we need it:

This PR makes the following changes:

  1. Remove convert cloudnode/edgenode sub command.
  2. Replace the "yurt-servant" job in yurtctl with "node-servant".
  3. Automatically set "--cert-ip" to yurt-tunnel-server if the user specify the tunnel server address in "yurtctl convert".

The documentation needs to be updated too, I will create another PR to revise the doc later.

Which issue(s) this PR fixes:

Fixes #546

Special notes for your reviewer:

/assign @rambohe-ch @adamzhoul

Does this PR introduce a user-facing change?

The subcommand  "cloudnode" and "edgenode" are removed from "yurtctl convert" .
Flag "--yurtctl-servant-image" in "yurtctl convert" is replaced by "--node-servant-image".

other Note

@openyurt-bot
Copy link
Collaborator

@DrmagicE: GitHub didn't allow me to assign the following users: your_reviewer.

Note that only openyurtio members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:
/kind bug
/kind documentation
/kind enhancement
/kind good-first-issue
/kind feature
/kind question
/kind design
/sig ai
/sig iot
/sig network
/sig storage
/sig storage

/kind feature

What this PR does / why we need it:

This PR makes the following changes:

  1. Remove convert cloudnode/edgenode sub command.
  2. Replace the "yurt-servant" job in yurtctl with "node-servant".
  3. Automatically set "--cert-ip" to yurt-tunnel-server if the user specify the tunnel server address in "yurtctl convert".

The documentation needs to be updated too, I will create another PR to revise the doc later.

Which issue(s) this PR fixes:

Fixes #546

Special notes for your reviewer:

/assign @rambohe-ch @adamzhoul

Does this PR introduce a user-facing change?

The subcommand  "cloudnode" and "edgenode" are removed from "yurtctl convert" .
Flag "--yurtctl-servant-image" in "yurtctl convert" is replaced by "--node-servant-image".

other Note

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openyurt-bot openyurt-bot added the kind/feature kind/feature label Nov 23, 2021
@openyurt-bot openyurt-bot added the size/XL size/XL: 500-999 label Nov 23, 2021
@DrmagicE DrmagicE changed the title feature: yurtctl convert use node-servant in dispatch job refactor: yurtctl convert use node-servant in dispatch job Nov 23, 2021

// 10. deploy yurt-hub and reset the kubelet service on cloud nodes
klog.Infof("deploying the yurt-hub and resetting the kubelet service on cloud nodes")
ctx["sub_command"] = "cloudnode"
ctx["working_mode"] = "cloud"
if err = kubeutil.RunServantJobs(co.clientSet, ctx, co.CloudNodes); err != nil {
klog.Errorf("fail to run ServantJobs: %s", err)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The klog.Error() seems a bit redundant, because the returned error message will be printed by the caller.
How about we remove those klog.Error() and just return the error message? For examples:

if err = kubeutil.RunServantJobs(co.clientSet, ctx, co.CloudNodes); err != nil {
    return fmt.Errorf("fail to run ServantJobs: %s", err)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree +1

&node, projectinfo.GetEdgeWorkerLabelKey(), "true"); err != nil {
return
}
// mark edge node as autonomous
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are edge nodes labeled with autonomy by default?
Can user can choose to turn on or off the autonomy?

Copy link
Member Author

@DrmagicE DrmagicE Nov 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Peeknut Thanks for the review.
Label with autonomy by default is the current behaviour of yurtctl convert, and I think to enable node autonomy by default is reasonable, but I agree that we should add an option for customization.
yurtctl markautonomous provides --autonomous-nodes option to let users choose the autonomous nodes.
I suggest adding this option to yurtctl convert too. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree.

@rambohe-ch
Copy link
Member

/assign @adamzhoul @Peeknut

@rambohe-ch
Copy link
Member

@DrmagicE if yurttunnel-server-address is setup, we also need to add parameter --tunnelserver-addr=xxxx for yurt-tunenl-agent.

@DrmagicE
Copy link
Member Author

@DrmagicE if yurttunnel-server-address is setup, we also need to add parameter --tunnelserver-addr=xxxx for yurt-tunenl-agent.

@rambohe-ch thanks for the review.
I think --tunnelserver-addr=xxxx is already there.

{{if .tunnelServerAddress }}
- --tunnelserver-addr={{.tunnelServerAddress}}
{{end}}

@rambohe-ch
Copy link
Member

@DrmagicE if yurttunnel-server-address is setup, we also need to add parameter --tunnelserver-addr=xxxx for yurt-tunenl-agent.

@rambohe-ch thanks for the review. I think --tunnelserver-addr=xxxx is already there.

{{if .tunnelServerAddress }}
- --tunnelserver-addr={{.tunnelServerAddress}}
{{end}}

@DrmagicE ok, i have found it.


// 10. deploy yurt-hub and reset the kubelet service on cloud nodes
klog.Infof("deploying the yurt-hub and resetting the kubelet service on cloud nodes")
ctx["sub_command"] = "cloudnode"
ctx["working_mode"] = "cloud"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if use const? util.WorkingModeCloud

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. Agree, I will fix it.

@@ -501,8 +500,8 @@ func RunServantJobs(cliSet *kubernetes.Clientset, tmplCtx map[string]string, nod
}
switch action {
case "convert":
servantJobTemplate = constants.ConvertServantJobTemplate
jobBaseName = ConvertJobNameBase
servantJobTemplate = nodeservant.ConvertServantJobTemplate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we try to use node-servant/job.go

func RenderNodeServantJob(action string, tmplCtx map[string]string, nodeName string) (*batchv1.Job, error) 

I set that as the interface between modules, instead of reading the template directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamzhoul Oh, I didn't see this. I agree this is a better way.
I will adjust the RunServantJobs function as follow to suit the change.

// RunServantJobs launch servant jobs on specified nodes
- func RunServantJobs(cliSet *kubernetes.Clientset, tmplCtx map[string]string, nodeNames []string) error {
+ func RunServantJobs(cliSet *kubernetes.Clientset, job *batchv1.Job, nodeNames []string) error {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamzhoul Hi, I just realize that the job name is related to the name of the node that it is running on. So the RunServantJobs should be able to get the job object for specific nodes. For example, the new RunServantJobs may look like:

// RunServantJobs launch servant jobs on specified nodes
func RunServantJobs(cliSet *kubernetes.Clientset, getJob func(nodeName string) *batchv1.Job, nodeNames []string) error {
	var wg sync.WaitGroup
	for _, nodeName := range nodeNames {
		job := getJob(nodeName)
		wg.Add(1)
		go func() {
			defer wg.Done()
			if err := RunJobAndCleanup(cliSet, job,
				WaitServantJobTimeout, CheckServantJobPeriod); err != nil {
				klog.Errorf("fail to run servant job(%s): %s",
					job.GetName(), err)
			} else {
				klog.Infof("servant job(%s) has succeeded", job.GetName())
			}
		}()
	}
	wg.Wait()
	return nil
}

As RunServantJobs is also used by yurtctl revert, this change will lead to lots of changes. How about we leave a TODO comment here and refactor it in another PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

return
}
}
}

// 1.3 check the nodes label
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you see, we label the node before we actually do the converting thing.
so, converting failed may lead to the node being labeled but users have to yurtctl convert again
because the log tells them failed.

so it leads to chooses:

  1. keep it. Users must know, they have to run yurt revert after failing when they want to do it again. And they may not know and often forget.
  2. we make yurtctl convert idempotent which means I don't care if nodes have been converted or not, I can do it whenever I want, with no mind pressure or side effect.
  3. we make sure only if the node has been converted successfully we label the node. so we have to unlabeled the node after failure or move labeling to the end.

personal I like the way 2, because it brings a good experience. but it introduces complicated work too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

way 2 <-- agree +1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamzhoul @rambohe-ch I agree with you that way 2 is the most user-friendly way, but it also increases the complexity of yurtctl convert. If we make yurtctl convert idempotent, we have to consider every change made by yurtctl convert and figure an idempotent way and sometimes lead to tricky situations.

For example, the user has converted the cluster and was enabling autonomy on node1 and node2. Then the user finds that only node2 should enable autonomy feature, so in the next converting, the user only configures node1 for node autonomy. In this case, should we cancel the node autonomy for node2?

IMO, as we are going to introduce the precheck command for yurtctl and I think it is enough to prevent users from yurtctl convert failure.

Besides, kubeadm init/join is not idempotent. I think as long as we can provide an acceptable success rate for yurtctl convert, then idempotent is not very necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do I think of idempotent?

I don't know why I failed last time, I don't know why and don't know what to do.
try again is the most natural reaction and the only way I know.

and it is something like ultimate state, when I typed it I want it.

if implemented idempotent

For example, the user has converted the cluster and was enabling autonomy on node1 and node2. Then the user finds that only node2 should enable autonomy feature, so in the next converting, the user only configures node1 for node autonomy. In this case, should we cancel the node autonomy for node2?

I think the command: yurtctl convert ... represents the ultimate state user expect.
so, we don't care what the current state is, failed job、 different label, etc...
just, leads the cluster to what users expect.

as for this case: yes, we should cancel node2

IMO, as we are going to introduce the precheck command for yurtctl and I think it is enough to prevent users from yurtctl convert failure.

some failing may not be checked by precheck. like, pull image timeout.
I think we have got much feedback from the community
something like:

E1125 21:09:13.772921   19268 util.go:539] fail to run servant job(yurtctl-servant-convert-k8smaster): jobs.batch "yurtctl-servant-convert-k8smaster" already exists

they do know yurtctl convert failed, that's all.
they don't know what the job log means and don't know what to do.

Do we have to implement it this time?

not necessary I think.
it may lead to too many updates, we should think more carefully.

I think as long as we can provide an acceptable success rate for yurtctl convert, then idempotent is not very necessary.

if we want to handle failure issues, it's unnecessary if we achieved an acceptable success rate. like you said.
if we consider it is something like ultimate state, users can execute many many times for updates. it's gonna be a capability we give to users.

What should we fix this time I think?

the label or failed job should not stop users from trying again.
unless we are sure the label can really represent the job is done well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamzhoul Hi, according to our previous discussion, we will not cover idempotent in this PR, so I leave it unchanged.

1. Remove convert cloudnode/edgenode sub command.
2. Replace the "yurt-servant" job in yurtctl with "node-servant".
3. Automatically set "--cert-ip" to yurt-tunnel-server if the user specify the tunnel server address in "yurtctl convert".
4. Add "--autonomous-nodes" option to "yurtctl convert".
@DrmagicE
Copy link
Member Author

@Peeknut @rambohe-ch @adamzhoul Thanks again for your all reviews. The code has been updated according to your review comments. Please have a look.

@rambohe-ch
Copy link
Member

rambohe-ch commented Nov 30, 2021

/lgtm

@adamzhoul @Peeknut

@openyurt-bot openyurt-bot added the lgtm lgtm label Nov 30, 2021
@adamzhoul
Copy link
Member

/lgtm

1 similar comment
@Peeknut
Copy link
Member

Peeknut commented Dec 1, 2021

/lgtm

@rambohe-ch
Copy link
Member

/approve

@openyurt-bot openyurt-bot added the approved approved label Dec 1, 2021
@openyurt-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: DrmagicE, rambohe-ch

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openyurt-bot openyurt-bot merged commit dd2b2b4 into openyurtio:master Dec 1, 2021
@adamzhoul
Copy link
Member

a question suddenly comes to mind.
after this is merged.
is yurtctl revert still working?

since node-servant handles kubelet in a different approach.

@DrmagicE @rambohe-ch

@DrmagicE
Copy link
Member Author

DrmagicE commented Dec 1, 2021

is yurtctl revert still working?

@adamzhoul No, it doesn't. yurtctl revert will work again after finishing #547
I will start this weekend.

MrGirl pushed a commit to MrGirl/openyurt that referenced this pull request Mar 29, 2022
…o#617)

1. Remove convert cloudnode/edgenode sub command.
2. Replace the "yurt-servant" job in yurtctl with "node-servant".
3. Automatically set "--cert-ip" to yurt-tunnel-server if the user specify the tunnel server address in "yurtctl convert".
4. Add "--autonomous-nodes" option to "yurtctl convert".

Co-authored-by: [email protected] <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved approved kind/feature kind/feature lgtm lgtm size/XL size/XL: 500-999
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature request] yurtctl convert use node-servant in dispatch job
5 participants