Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node: cpumanager: add options to reject non SMT-aligned workload #2626

Merged

Conversation

ffromani
Copy link
Contributor

@ffromani ffromani commented Apr 14, 2021

We propose a change in cpumanager to make the behaviour of latency-sensitive applications more predictable when running on SMT-enabled systems. The existing static policy is allowed to allocate single cpus (aka virtual threads on SMT-enabled systems), and this can cause physical core sharing which in turn is a major cause for noisy neighbours, which harms the behaviour of latency-sensitive applications.

We add extra constraints to the aformenentioned policy, while preserving all its properties, to avoid physical core sharing.

Implementation PR link: kubernetes/kubernetes#101432

Signed-off-by: Francesco Romani [email protected]

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 14, 2021
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 14, 2021
@ffromani
Copy link
Contributor Author

ffromani commented Apr 14, 2021

On Apr 13 sig-node meeting we agreed this proposal need more discussion, hence I posted this PR to foster that discussion, but for the same reason I'm not adding approver yet.

@xiaoxubeii
Copy link
Member

There are some discussion about external cpu manager policies. I think you may be interesting: https://mail.google.com/mail/u/0?ik=4e2f4a7bba&view=om&permmsgid=msg-f%3A1696952803188250466

@ffromani ffromani force-pushed the cpumanager-policies-thread-placement branch from 7dddd86 to 50bd260 Compare April 16, 2021 13:17
Summary to be filled once WIP is removed

Signed-off-by: Francesco Romani <[email protected]>
@ffromani ffromani force-pushed the cpumanager-policies-thread-placement branch from 50bd260 to bfa538b Compare April 16, 2021 13:22
swatisehgal and others added 2 commits April 19, 2021 12:01
Capture implementation details of enabling smtaware CPU Manager policy
@ffromani
Copy link
Contributor Author

There are some discussion about external cpu manager policies. I think you may be interesting: https://mail.google.com/mail/u/0?ik=4e2f4a7bba&view=om&permmsgid=msg-f%3A1696952803188250466

yes, we need to agree in the community about which way is the best to extend the cpumanager in the first place. This is a very important point I'll make sure to mention in the sig-node meeting.

This KEP describes the behaviour we want to achieve in the end and the PoC implementation we have: to reimplement it using external policies is a perfectly fine approach.

@ffromani ffromani force-pushed the cpumanager-policies-thread-placement branch from 6a2667c to 1c26db7 Compare April 19, 2021 17:02
swatisehgal and others added 4 commits April 20, 2021 13:29
…policy

- Fix minor formatting and typos
- Disambiguate the text in smtaware policy to explain when SMTAlignmentError occurs.

Signed-off-by: Swati Sehgal <[email protected]>
Minor formatting/typo fixes and disambiguate explanation of smtaware policy
Signed-off-by: Francesco Romani <[email protected]>
@ffromani ffromani changed the title WIP: node: add cpumanager policies with grained thread allocation node: add cpumanager policies with grained thread allocation Apr 23, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 23, 2021
@ffromani
Copy link
Contributor Author

overhaul and update after sig-node discussion on April 20, 2021. Link to the PR TBD. Will cc relevant people ASAP.

swatisehgal and others added 2 commits April 23, 2021 17:14
…ementation

Recent identification of a bug in the resourceAdmitHandlers Admit handler,
led to changes in the implementation whereby ContainerManagerAdmissionError
is no longer returned. This patch ensures that implementation strategy
captured in the KEP matches the latest smtaware CPUManager implementation.

Signed-off-by: Swati Sehgal <[email protected]>
Ensure Implementation stategy captured in the KEP matches latest implementation
Signed-off-by: Francesco Romani <[email protected]>
@johnbelamaric
Copy link
Member

PRR looks good, will await SIG approval before approving

Copy link
Member

@kad kad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nits in wordings...

In overall: I'd suggest to re-consider to have dedicated policy name that would re-use as much as possible code from static policy, instead of creating new flag/option field in kubelet config. Reasons for that is that there are other components that can consume settings of Kubelet for some reasons (e.g. NFD or third party operators). Adding new field that alter behaviour of static policy means that all of those needs to be changed.

But beside that, the whole cause of adding SMT aligned workload placement is good thing to add.


### Proposed Change

We propose to add a new flag in Kubelet called `CPUManagerPolicyOptions` in the kubelet config or command line argument called `cpumanager-policy-options` which allows the user to specify the CPU Manager policy option. If the value of this option is specified to be `reject-non-smt-aligned`, it results in further refinements of the existing static policy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly speaking, I'd prefer as separate policy name "smt-aligned-static" instead of option field to existing policy.
There are might be other components in the wild that reads kubelet config to determine node state and configured TopologyManager/CPUManager/... states. Instead of trying to teach those new field to parse (policy=static, option=xxx), might be better to have different policy names for different behaviours (policy=a vs. policy=b).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a discussion on sig-node slack channel. The main issue of the options approach is that options can alter the behaviour of the static policy, but if legacy (= not updated) consumers/unaware operators looking only at the policy name can fail to detect this and run with wrong assumptions about the kubelet behaviour.
So, we will add a new policy name (static-with-options? static-enhanced?) which will be an alias of static, but will enable the consumption of the options. This means that to enable this new behaviour we are proposing, a operator would need to:

  1. enable the feature gate (needed anyway)
  2. change the policy name
  3. actually provide options ( :) )

Copy link
Contributor Author

@ffromani ffromani May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ffromani ffromani force-pushed the cpumanager-policies-thread-placement branch from f90b869 to af201ed Compare May 11, 2021 17:48
@ffromani
Copy link
Contributor Author

/assign @derekwaynecarr
/assign @dchen1107
Sorry, I can't recall who we discussed will be the approver, so tagging both! :)

@ffromani ffromani force-pushed the cpumanager-policies-thread-placement branch from af201ed to def5c71 Compare May 11, 2021 18:03
During the review it was pointed out that existing operators (software
component or humans) make logic on the policy name. Options may change
the behaviour of the policy in a non-backward-compatible way[1], thus
we need a way to signal this to consumers; we agreed to add an alias
for the static policy whose sole purpose is to make this change evident.

+++

[1] the change we are proposing in this KEP is not, but the cpumanager
infra change we are proposing which in turn makes the change possible
would enable these future options.

Signed-off-by: Francesco Romani <[email protected]>
@ffromani ffromani force-pushed the cpumanager-policies-thread-placement branch from def5c71 to 7dd48d6 Compare May 12, 2021 12:44
@ffromani
Copy link
Contributor Author

looking for a more descriptive name, changed from static-enhanced to configurable

@klueska
Copy link
Contributor

klueska commented May 12, 2021

/lgtm

Let’s get this merged and we can always come back and tweak naming details like this after we’ve iterated on the implementation a bit.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 12, 2021
@kad
Copy link
Member

kad commented May 12, 2021

/lgtm from my side as well. Thanks @fromanirh for updates.

Copy link
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the configurable alias and then I think this is good to proceed.

- add a new flag in Kubelet called `CPUManagerPolicyOptions` in the kubelet config or command line argument called `cpumanager-policy-options` which allows the user to specify the CPU Manager policy option.
- finally, add a new cpu manager option called `reject-non-smt-aligned`; if present, this option will enable further refinements of the existing static policy.

The addition of an alias for the `static` policy is motivated by the fact the new CPUManager options can alter the behaviour of the policy in a backward incompatible way.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The static policy as understood today is a baseline. I am not sure I understand the configurable alias as that alias has the same meaning as the existing static policy. At the moment, I think its fine to just have the policy options bit because 'configurable' has no meaning yet until such time we enumerate each behavior as named policy tokens.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically, let's remove configurable alias at the moment until a future time lets you declare all attributes as policy options that inform a policy.

### Implementation strategy of reject-non-smt-aligned CPU Manager policy option

- In order to introduce the SMT-alignment check in CPU Manager, we introduce a new flag in Kubelet to allow the user to specify `cpumanager-policy-options` which when specified with `reject-non-smt-aligned` as its value provides the capability to modify the behaviour of static policy to strictly guarantee allocation of whole cores to a workload.
- We add a new policy called `configurable`. Only if this policy is selected the options, if given, will be considered. Otherwise, they will be ignored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove configurable until a future time.

- [X] Feature gate (also fill in values in `kep.yaml`).
- Feature gate name: `CPUManagerPolicyOptions`.
- Components depending on the feature gate: kubelet
- [X] Change the kubelet configuration to set the CPUManager policy to `configurable`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this and just set 'static' as existing policy.

- [X] Change the kubelet configuration to set the CPUManager policy to `configurable`
- [X] Change the kubelet configuration adding the CPUManager policy option to `reject-non-smt-aligned`
* **Does enabling the feature change any default behavior?**
- Yes, it makes the behaviour of the CPUManager static policy more restrictive and can lead to pod admission rejection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for others that cpu manager static policy can already reject pods, this just adds another reason for said rejection.

Address review comments

Signed-off-by: Francesco Romani <[email protected]>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 13, 2021
@derekwaynecarr
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 13, 2021
Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
for PRR

#2626 (comment)

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, ehashman, fromanirh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 13, 2021
@k8s-ci-robot k8s-ci-robot merged commit 01abf7d into kubernetes:master May 13, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone May 13, 2021
@ffromani ffromani deleted the cpumanager-policies-thread-placement branch June 30, 2021 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.