Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DaemonSet, Add Priority class #42

Merged
merged 1 commit into from
Jun 17, 2021
Merged

DaemonSet, Add Priority class #42

merged 1 commit into from
Jun 17, 2021

Conversation

oshoval
Copy link
Contributor

@oshoval oshoval commented Jun 16, 2021

As part of https://bugzilla.redhat.com/show_bug.cgi?id=1953482
We are adding Priority Class [1] to each network component.

The motivation is to make the control plane pods less sensitive to preemption
than user workloads.

Pods that are node specific will have the higher build-in priority,
since preempting them from a specific node, makes them unavailable
until they are rescheduled on that specific node.
Pods that are network control plane, but are not node specific
will have system-cluster-critical, which would still make them more important
than user and custom workloads, but less than system-node-critical.

Add system-node-critical to macvtap-cni DaemonSet.
Since macvtap-cni pod should run on each node,
assign system-node-critical pc to it.

[1] https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/

Signed-off-by: Or Shoval [email protected]

None

@kubevirt-bot kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 16, 2021
@kubevirt-bot kubevirt-bot requested review from maiqueb and phoracek June 16, 2021 10:57
@kubevirt-bot kubevirt-bot added size/XS release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 16, 2021
@oshoval
Copy link
Contributor Author

oshoval commented Jun 16, 2021

@maiqueb can you review please?

Copy link
Collaborator

@maiqueb maiqueb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this might be a really good catch.

I think it makes sense, but I find it strange I found no mention whatsoever of this when I checked the multus / kubevirt / ... projects.

For instance, I find it strange to find the macvtap-(dp+cni) pod more important than virt-handler.

Do you mind pointing out some examples of where this is used ?

templates/macvtap.yaml.in Show resolved Hide resolved
@oshoval
Copy link
Contributor Author

oshoval commented Jun 17, 2021

Do you mind pointing out some examples of where this is used ?

Some PRs that are freshly merged (there are more on the way):
kubevirt/bridge-marker#23
nmstate/kubernetes-nmstate#777
kubevirt/cluster-network-addons-operator#895

As part of the effort on https://bugzilla.redhat.com/show_bug.cgi?id=1953482
We created a design that assign to all network pods either system-cluster-critical or system-node-critical,
the way we choose which one of those is if it should run on all nodes or not.
In case it should run on all nodes we put it on higher priority (system-node-critical as this PR),
because if it will be preempted it won't help to just schedule it on another node.

Its true that now network component will have higher priority than kubevirt, but its per design,
since those are mandatory pods for healthy system
(Kubevirt uses custom kubevirt-cluster-critical)

Will update the PR with more info as the 2nd comment say as well.

@oshoval oshoval requested a review from maiqueb June 17, 2021 09:39
@phoracek
Copy link
Member

Its true that now network component will have higher priority than kubevirt, but its per design,
since those are mandatory pods for healthy system
(Kubevirt uses custom kubevirt-cluster-critical)

Note that KubeVirt should switch to native classes too https://bugzilla.redhat.com/show_bug.cgi?id=1953479.

The relative priority between this CNI and KubeVirt is not really that important. To give you a backstory for this - on environments where a single node is used for both user workload and control-plane, control-plane pods should never starve out on account of user workload. If we preempt all user workload and there is still not enough space for both CNI and KubeVirt, the cluster is just flawed and can never function.

Copy link
Collaborator

@maiqueb maiqueb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 17, 2021
Copy link
Member

@phoracek phoracek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add system-node-critical to macvtap-cni DaemonSet.
Since macvtap-cni pod should run on each node,
assign system-node-critical pc to it.
This will make the control plane less sensitive to preemption
than user workloads.

Signed-off-by: Or Shoval <[email protected]>
@oshoval
Copy link
Contributor Author

oshoval commented Jun 17, 2021

You have to rebuild manifests: https://github.com/kubevirt/macvtap-cni/blob/main/manifests/macvtap.yaml

Nice done thanks (using make manifests)

we should add check for that i think ?
like we have on kubevirt

@maiqueb
Copy link
Collaborator

maiqueb commented Jun 17, 2021

You have to rebuild manifests: https://github.com/kubevirt/macvtap-cni/blob/main/manifests/macvtap.yaml

Nice done thanks

we should add check for that i think ?
like we have on kubevirt

I guess we should, at least when the templates are updated.

You're welcome to implement that, if you want.

Copy link
Member

@phoracek phoracek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jun 17, 2021
@kubevirt-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: maiqueb, oshoval, phoracek

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot merged commit a21466c into kubevirt:main Jun 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/XS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants