-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: yurtctl convert use node-servant in dispatch job #617
Conversation
@DrmagicE: GitHub didn't allow me to assign the following users: your_reviewer. Note that only openyurtio members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
pkg/yurtctl/cmd/convert/convert.go
Outdated
|
||
// 10. deploy yurt-hub and reset the kubelet service on cloud nodes | ||
klog.Infof("deploying the yurt-hub and resetting the kubelet service on cloud nodes") | ||
ctx["sub_command"] = "cloudnode" | ||
ctx["working_mode"] = "cloud" | ||
if err = kubeutil.RunServantJobs(co.clientSet, ctx, co.CloudNodes); err != nil { | ||
klog.Errorf("fail to run ServantJobs: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The klog.Error()
seems a bit redundant, because the returned error message will be printed by the caller.
How about we remove those klog.Error()
and just return the error message? For examples:
if err = kubeutil.RunServantJobs(co.clientSet, ctx, co.CloudNodes); err != nil {
return fmt.Errorf("fail to run ServantJobs: %s", err)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree +1
pkg/yurtctl/cmd/convert/convert.go
Outdated
&node, projectinfo.GetEdgeWorkerLabelKey(), "true"); err != nil { | ||
return | ||
} | ||
// mark edge node as autonomous |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are edge nodes labeled with autonomy by default?
Can user can choose to turn on or off the autonomy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Peeknut Thanks for the review.
Label with autonomy by default is the current behaviour of yurtctl convert
, and I think to enable node autonomy by default is reasonable, but I agree that we should add an option for customization.
yurtctl markautonomous
provides --autonomous-nodes
option to let users choose the autonomous nodes.
I suggest adding this option to yurtctl convert
too. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree.
/assign @adamzhoul @Peeknut |
@DrmagicE if |
@rambohe-ch thanks for the review. openyurt/pkg/yurtctl/constants/yurt-tunnel-agent-tmpl.go Lines 45 to 47 in 3b02997
|
@DrmagicE ok, i have found it. |
pkg/yurtctl/cmd/convert/convert.go
Outdated
|
||
// 10. deploy yurt-hub and reset the kubelet service on cloud nodes | ||
klog.Infof("deploying the yurt-hub and resetting the kubelet service on cloud nodes") | ||
ctx["sub_command"] = "cloudnode" | ||
ctx["working_mode"] = "cloud" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if use const? util.WorkingModeCloud
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. Agree, I will fix it.
@@ -501,8 +500,8 @@ func RunServantJobs(cliSet *kubernetes.Clientset, tmplCtx map[string]string, nod | |||
} | |||
switch action { | |||
case "convert": | |||
servantJobTemplate = constants.ConvertServantJobTemplate | |||
jobBaseName = ConvertJobNameBase | |||
servantJobTemplate = nodeservant.ConvertServantJobTemplate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we try to use node-servant/job.go
func RenderNodeServantJob(action string, tmplCtx map[string]string, nodeName string) (*batchv1.Job, error)
I set that as the interface between modules, instead of reading the template directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamzhoul Oh, I didn't see this. I agree this is a better way.
I will adjust the RunServantJobs
function as follow to suit the change.
// RunServantJobs launch servant jobs on specified nodes
- func RunServantJobs(cliSet *kubernetes.Clientset, tmplCtx map[string]string, nodeNames []string) error {
+ func RunServantJobs(cliSet *kubernetes.Clientset, job *batchv1.Job, nodeNames []string) error {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamzhoul Hi, I just realize that the job name is related to the name of the node that it is running on. So the RunServantJobs
should be able to get the job object for specific nodes. For example, the new RunServantJobs
may look like:
// RunServantJobs launch servant jobs on specified nodes
func RunServantJobs(cliSet *kubernetes.Clientset, getJob func(nodeName string) *batchv1.Job, nodeNames []string) error {
var wg sync.WaitGroup
for _, nodeName := range nodeNames {
job := getJob(nodeName)
wg.Add(1)
go func() {
defer wg.Done()
if err := RunJobAndCleanup(cliSet, job,
WaitServantJobTimeout, CheckServantJobPeriod); err != nil {
klog.Errorf("fail to run servant job(%s): %s",
job.GetName(), err)
} else {
klog.Infof("servant job(%s) has succeeded", job.GetName())
}
}()
}
wg.Wait()
return nil
}
As RunServantJobs
is also used by yurtctl revert
, this change will lead to lots of changes. How about we leave a TODO comment here and refactor it in another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
pkg/yurtctl/cmd/convert/convert.go
Outdated
return | ||
} | ||
} | ||
} | ||
|
||
// 1.3 check the nodes label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you see, we label the node before we actually do the converting thing.
so, converting failed may lead to the node being labeled but users have to yurtctl convert
again
because the log tells them failed.
so it leads to chooses:
- keep it. Users must know, they have to run
yurt revert
after failing when they want to do it again. And they may not know and often forget. - we make
yurtctl convert
idempotent which means I don't care if nodes have been converted or not, I can do it whenever I want, with no mind pressure or side effect. - we make sure only if the node has been converted successfully we label the node. so we have to unlabeled the node after failure or move labeling to the end.
personal I like the way 2, because it brings a good experience. but it introduces complicated work too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
way 2 <-- agree +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamzhoul @rambohe-ch I agree with you that way 2 is the most user-friendly way, but it also increases the complexity of yurtctl convert
. If we make yurtctl convert
idempotent, we have to consider every change made by yurtctl convert
and figure an idempotent way and sometimes lead to tricky situations.
For example, the user has converted the cluster and was enabling autonomy on node1 and node2. Then the user finds that only node2 should enable autonomy feature, so in the next converting, the user only configures node1 for node autonomy. In this case, should we cancel the node autonomy for node2?
IMO, as we are going to introduce the precheck command for yurtctl
and I think it is enough to prevent users from yurtctl convert
failure.
Besides, kubeadm init/join
is not idempotent. I think as long as we can provide an acceptable success rate for yurtctl convert
, then idempotent is not very necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do I think of idempotent?
I don't know why I failed last time, I don't know why and don't know what to do.
try again is the most natural reaction and the only way I know.
and it is something like ultimate state
, when I typed it I want it.
if implemented idempotent
For example, the user has converted the cluster and was enabling autonomy on node1 and node2. Then the user finds that only node2 should enable autonomy feature, so in the next converting, the user only configures node1 for node autonomy. In this case, should we cancel the node autonomy for node2?
I think the command: yurtctl convert ...
represents the ultimate state user expect.
so, we don't care what the current state is, failed job、 different label, etc...
just, leads the cluster to what users expect.
as for this case: yes, we should cancel node2
IMO, as we are going to introduce the precheck command for yurtctl and I think it is enough to prevent users from yurtctl convert failure.
some failing may not be checked by precheck. like, pull image timeout.
I think we have got much feedback from the community
something like:
E1125 21:09:13.772921 19268 util.go:539] fail to run servant job(yurtctl-servant-convert-k8smaster): jobs.batch "yurtctl-servant-convert-k8smaster" already exists
they do know yurtctl convert
failed, that's all.
they don't know what the job log means and don't know what to do.
Do we have to implement it this time?
not necessary I think.
it may lead to too many updates, we should think more carefully.
I think as long as we can provide an acceptable success rate for yurtctl convert, then idempotent is not very necessary.
if we want to handle failure issues, it's unnecessary if we achieved an acceptable success rate. like you said.
if we consider it is something like ultimate state
, users can execute many many times for updates. it's gonna be a capability we give to users.
What should we fix this time I think?
the label or failed job should not stop users from trying again.
unless we are sure the label can really represent the job is done well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamzhoul Hi, according to our previous discussion, we will not cover idempotent in this PR, so I leave it unchanged.
1. Remove convert cloudnode/edgenode sub command. 2. Replace the "yurt-servant" job in yurtctl with "node-servant". 3. Automatically set "--cert-ip" to yurt-tunnel-server if the user specify the tunnel server address in "yurtctl convert". 4. Add "--autonomous-nodes" option to "yurtctl convert".
@Peeknut @rambohe-ch @adamzhoul Thanks again for your all reviews. The code has been updated according to your review comments. Please have a look. |
/lgtm |
/lgtm |
1 similar comment
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: DrmagicE, rambohe-ch The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
a question suddenly comes to mind. since node-servant handles |
@adamzhoul No, it doesn't. |
…o#617) 1. Remove convert cloudnode/edgenode sub command. 2. Replace the "yurt-servant" job in yurtctl with "node-servant". 3. Automatically set "--cert-ip" to yurt-tunnel-server if the user specify the tunnel server address in "yurtctl convert". 4. Add "--autonomous-nodes" option to "yurtctl convert". Co-authored-by: [email protected] <[email protected]>
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR makes the following changes:
The documentation needs to be updated too, I will create another PR to revise the doc later.
Which issue(s) this PR fixes:
Fixes #546
Special notes for your reviewer:
/assign @rambohe-ch @adamzhoul
Does this PR introduce a user-facing change?
other Note