Fedora CoreOS nodes report half of actual CPU capacity #902

bendrucker · 2020-12-04T00:35:14Z

Description

On Fedora CoreOS on AWS, nodes report half the CPU cores compared to the actual vCPUs offered by the instance. Running nproc on the host returns half the expected value and Kubernetes nodes, via /proc/cpuinfo, report half the expected CPU capacity.

Steps to Reproduce

Deploy a Fedora CoreOS cluster on AWS with any instance type with 2 or more vCPUs (e.g. t3.medium with 2). Then:

kubectl get node -o json | jq -r '.items[] | .status.capacity.cpu'

Result is, given a 2 node cluster:

1
1

The scheduler observes this capacity and will not schedule a pod requesting 1100m CPU.

Expected behavior

2
2

Each node should report 2 CPUs

Environment

Platform: aws
OS: fedora-coreos (32.20201104.3.0)
Release: fa8f68f
Terraform: 0.13.5
Plugins: [email protected]

Possible Solution

This probably applies to other cloud providers, but I've only reproduced it on AWS. It seems like it might not affect bare metal.

This issue has been discussed in the CoreOS issue tracker:

coreos/fedora-coreos-tracker#413
coreos/fedora-coreos-tracker#181

Running the following disables Fedora CoreOS's default simultaneous multithreading restriction and results in a correct node size:

rpm-ostree kargs --delete mitigations --reboot

The current official recommendation is to run this as a systemd unit:

https://docs.fedoraproject.org/be/fedora-coreos/kernel-args/

I've tested this and adding the following unit config fixes the capacity:

systemd:
  units:
    - name: enable-smt.service
      enabled: true
      contents: |
        # https://docs.fedoraproject.org/be/fedora-coreos/kernel-args/
        [Unit]
        Description=Enable simultaneous multithreading
        Before=kubelet.service
        # We run after `systemd-machine-id-commit.service` to ensure that
        # `ConditionFirstBoot=true` services won't rerun on the next boot.
        After=systemd-machine-id-commit.service
        ConditionKernelCommandLine=mitigations
        
        [Service]
        Type=oneshot
        RemainAfterExit=yes
        ExecStart=/bin/rpm-ostree kargs --delete mitigations --reboot

        [Install]
        RequiredBy=kubelet.service
        WantedBy=multi-user.target

Undoing this default has potentially important implications, but it's the only way I can find to ensure correct capacity detection.

The text was updated successfully, but these errors were encountered:

dghubble · 2020-12-04T02:40:51Z

Fedora CoreOS has SMT disabled on certain platforms. I'm fine with those defaults and their judgement. You should discuss with https://github.com/coreos/fedora-coreos-tracker if you think the default protections are no longer needed.

In your example, the apparent vCPUs detected on the host and by Kubernetes match, which is expected. It is half the count EC2 quotes / apparent count on SMT enabled OSes.

Related: In the past, Kubernetes detection did mismatch vCPU count compared with the host on SMT disabled systems. That was fixed upstream but has some tangential reading kubernetes/kubernetes#91795

dghubble · 2020-12-04T02:49:46Z

As always, folks can use snippets to add systemd units (at your own risk). Fedora CoreOS treating an t3.medium as having 1 vCPU isn't flatly incorrect, just different choices.

bendrucker mentioned this issue Dec 4, 2020

enable SMT on Fedora CoreOS Kubernetes nodes TakeScoop/typhoon#61

Merged

dghubble closed this as completed Dec 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fedora CoreOS nodes report half of actual CPU capacity #902

Fedora CoreOS nodes report half of actual CPU capacity #902

bendrucker commented Dec 4, 2020

dghubble commented Dec 4, 2020

dghubble commented Dec 4, 2020

Fedora CoreOS nodes report half of actual CPU capacity #902

Fedora CoreOS nodes report half of actual CPU capacity #902

Comments

bendrucker commented Dec 4, 2020

dghubble commented Dec 4, 2020

dghubble commented Dec 4, 2020