disable disk eviction by default #293

BenTheElder · 2019-02-12T03:09:11Z

hopefully resolves #55

with this PR the default configuration for kind nodes simply won't have hard eviction.

k8s-ci-robot · 2019-02-12T03:09:16Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BenTheElder

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [BenTheElder]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

BenTheElder · 2019-02-12T03:09:58Z

pkg/cluster/internal/kubeadm/config.go

@@ -169,6 +181,11 @@ kind: JoinConfiguration
 # no-op entry that exists soley so it can be patched
 apiVersion: kubelet.config.k8s.io/v1beta1
 kind: KubeletConfiguration
+evictionHard:
+  memory.available: "1"


cc @neolit123 @dashpole for some reason cluster up fails when this is memory.available: "0". I can't find any reason that should be invalid yet though.

i cannot find where in the source 0 would be a special case.
also this seems undocumented.

we're now not setting this key, but this is still... curious

pkg/cluster/internal/kubeadm/config.go

BenTheElder · 2019-02-12T03:30:16Z

I also confirmed that a default cluster from this has configz like:

kind create cluster --name=test
export KUBECONFIG=$(kind get kubeconfig-path --name=test)
kubectl proxy --port=8001 &
curl -sSL "http://localhost:8001/api/v1/nodes/test-control-plane/proxy/configz"

-> (pretty printed after the fact

{
  "kubeletconfig": {
    "staticPodPath": "/etc/kubernetes/manifests",
    "syncFrequency": "1m0s",
    "fileCheckFrequency": "20s",
    "httpCheckFrequency": "20s",
    "address": "0.0.0.0",
    "port": 10250,
    "tlsCertFile": "/var/lib/kubelet/pki/kubelet.crt",
    "tlsPrivateKeyFile": "/var/lib/kubelet/pki/kubelet.key",
    "rotateCertificates": true,
    "authentication": {
      "x509": {
        "clientCAFile": "/etc/kubernetes/pki/ca.crt"
      },
      "webhook": {
        "enabled": true,
        "cacheTTL": "2m0s"
      },
      "anonymous": {
        "enabled": false
      }
    },
    "authorization": {
      "mode": "Webhook",
      "webhook": {
        "cacheAuthorizedTTL": "5m0s",
        "cacheUnauthorizedTTL": "30s"
      }
    },
    "registryPullQPS": 5,
    "registryBurst": 10,
    "eventRecordQPS": 5,
    "eventBurst": 10,
    "enableDebuggingHandlers": true,
    "healthzPort": 10248,
    "healthzBindAddress": "127.0.0.1",
    "oomScoreAdj": -999,
    "clusterDomain": "cluster.local",
    "clusterDNS": [
      "10.96.0.10"
    ],
    "streamingConnectionIdleTimeout": "4h0m0s",
    "nodeStatusUpdateFrequency": "10s",
    "nodeStatusReportFrequency": "1m0s",
    "nodeLeaseDurationSeconds": 40,
    "imageMinimumGCAge": "2m0s",
    "imageGCHighThresholdPercent": 85,
    "imageGCLowThresholdPercent": 80,
    "volumeStatsAggPeriod": "1m0s",
    "cgroupsPerQOS": true,
    "cgroupDriver": "cgroupfs",
    "cpuManagerPolicy": "none",
    "cpuManagerReconcilePeriod": "10s",
    "runtimeRequestTimeout": "2m0s",
    "hairpinMode": "promiscuous-bridge",
    "maxPods": 110,
    "podPidsLimit": -1,
    "resolvConf": "/etc/resolv.conf",
    "cpuCFSQuota": true,
    "cpuCFSQuotaPeriod": "100ms",
    "maxOpenFiles": 1000000,
    "contentType": "application/vnd.kubernetes.protobuf",
    "kubeAPIQPS": 5,
    "kubeAPIBurst": 10,
    "serializeImagePulls": true,
    "evictionHard": {
      "imagefs.available": "0%",
      "memory.available": "1",
      "nodefs.available": "0%",
      "nodefs.inodesFree": "0%"
    },
    "evictionPressureTransitionPeriod": "5m0s",
    "enableControllerAttachDetach": true,
    "makeIPTablesUtilChains": true,
    "iptablesMasqueradeBit": 14,
    "iptablesDropBit": 15,
    "failSwapOn": false,
    "containerLogMaxSize": "10Mi",
    "containerLogMaxFiles": 5,
    "configMapAndSecretChangeDetectionStrategy": "Watch",
    "enforceNodeAllocatable": [
      "pods"
    ]
  }
}

BenTheElder · 2019-02-12T03:33:04Z

/hold

neolit123

one concern here.

is this cluster passing conformance; i.e. do we have conformance tests that rely on hard eviction?
if we merge this PR we are solving some strange user case that i think we need more evidence about, but hopefully we are not breaking something else.

BenTheElder · 2019-02-12T03:46:31Z

If this solves Mitar's problem the next thing to consider is smaller thresholds rather than none. We do not have conformance depending on this. I don't think you could make a reasonable test for that very easily. Eviction behavior is supposed to be tuneable.

…

On Mon, Feb 11, 2019, 19:40 Lubomir I. Ivanov ***@***.*** wrote: ***@***.**** commented on this pull request. once concern here. is this cluster passing conformance; i.e. do we have conformance tests that rely on hard eviction? if we merge this PR we are solving some strange user case that i think we need more evidence about, but hopefully we are not breaking something else. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#293 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA4Bqz9n4Y6OHdSs30vNmt02UYbVz4dIks5vMjeVgaJpZM4a1giY> .

mitar · 2019-02-12T04:42:26Z

It does solve a problem for me and now I can run kind on my laptop.

BenTheElder · 2019-02-12T05:00:26Z

It does solve a problem for me and now I can run kind on my laptop.

Awesome! Now the question is which resource and should we adjust this back a bit... the defaults are:

evictionHard:
  memory.available:  "100Mi"
  nodefs.available:  "10%"
  nodefs.inodesFree: "5%"
  imagefs.available: "15%"

mitar · 2019-02-12T05:07:12Z

See my comment in #55.

BenTheElder · 2019-02-12T05:16:43Z

So specifically we probably just need the imagefs.available threshold and the rest are probably fine. Can you try:

# config.yaml
kind: Config
apiVersion: kind.sigs.k8s.io/v1alpha2
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    apiVersion: kubelet.config.k8s.io/v1beta1
    kind: KubeletConfiguration
    evictionHard:
      memory.available:  "100Mi"
      nodefs.available:  "10%"
      nodefs.inodesFree: "5%"
      imagefs.available: "0%"

kind create cluster --config=config.yaml

mitar · 2019-02-12T05:20:46Z

I think this example config is invalid YAML.

BenTheElder · 2019-02-12T05:23:57Z

fixed, it was missing a nodes: line before - role: control-plane

BenTheElder · 2019-02-12T05:45:25Z

Thanks to @mitar's testing here we're now just setting disk related thresholds, memory can be left alone for now.

BenTheElder · 2019-02-12T05:55:00Z

/hold cancel

neolit123 · 2019-02-12T11:58:29Z

hm, so between this in the other thread - what are the technical details on why the kubelet is "failing" and what are the conditions it fails under? i saw the user's host node has a lot of GB of images.

i.e. @mitar was claiming that he has enough disk space on the host:

# disable disk resource management by default
# kubelet will see the host disk that the inner container runtime
# is ultimately backed by and attempt to recover disk space. we don't want that.

yet it seems to me we were passing the default thresholds.

/lgtm
/hold

tao12345666333

lgtm

mitar · 2019-02-12T17:01:46Z

So the problem is that having percentage as a limit might not be really reasonable for end-user laptop. For example, I have 2 TB drive, so 10% is 200 GB. So if I have 190 GB of free disk space (which I would claim is enough for my intended work with kind) Kubernetes would see it under the limit.

I would say that I agree with setting the limits to 0. It is not like you are loosing any data when running kind. So I do not think we should worry about available disk space. If it can run, it can. Otherwise it fails. And this is it.

Moreover I can imagine that this can also help on CI if things there become close to full as well. One does not expect that free space on your CI worker influences will your CI run fail or succeed (unless it is out of disk space).

BenTheElder · 2019-02-12T17:04:34Z

yet it seems to me we were passing the default thresholds.

Those thresholds are intended for use on dedicated VMs with reserved resources, kind runs on hosts that have lots of extra noise / resource usage on them.

Basically @mitar's comment above :-) #293 (comment)

BenTheElder · 2019-02-12T17:57:22Z

/hold cancel

BenTheElder · 2019-02-12T18:21:42Z

This won't automerge due to the dual netlifys we somehow have now. I think I know how this happened (one we don't control), and will follow up... Manual merge in the meantime.

* Upgrade versions doc * update versions * update versions * update versions

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 12, 2019

k8s-ci-robot requested review from krzyzacy and neolit123 February 12, 2019 03:09

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 12, 2019

BenTheElder commented Feb 12, 2019

View reviewed changes

pkg/cluster/internal/kubeadm/config.go Show resolved Hide resolved

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 12, 2019

neolit123 reviewed Feb 12, 2019

View reviewed changes

BenTheElder mentioned this pull request Feb 12, 2019

Unstable cluster #55

Closed

BenTheElder force-pushed the who-needs-eviction-right branch from 7001e02 to 43dd01d Compare February 12, 2019 05:44

BenTheElder changed the title ~~disable hard eviction by default~~ disable disk eviction by default Feb 12, 2019

disable hard eviction by default

ffb0a9f

BenTheElder force-pushed the who-needs-eviction-right branch from 43dd01d to ffb0a9f Compare February 12, 2019 05:53

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 12, 2019

k8s-ci-robot assigned neolit123 Feb 12, 2019

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 12, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 12, 2019

tao12345666333 approved these changes Feb 12, 2019

View reviewed changes

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 12, 2019

BenTheElder merged commit 8eadb59 into kubernetes-sigs:master Feb 12, 2019

BenTheElder deleted the who-needs-eviction-right branch February 12, 2019 18:22

tstromberg mentioned this pull request Feb 13, 2019

Disable kubelet disk eviction by default kubernetes/minikube#3671

Merged

BenTheElder mentioned this pull request Feb 18, 2019

overlay network cannot be applied when host is behind a proxy #136

Closed

stg-0 pushed a commit to stg-0/kind that referenced this pull request Sep 13, 2023

Update versiones.md (kubernetes-sigs#293)

2d99ed3

* Upgrade versions doc * update versions * update versions * update versions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disable disk eviction by default #293

disable disk eviction by default #293

BenTheElder commented Feb 12, 2019

k8s-ci-robot commented Feb 12, 2019

BenTheElder Feb 12, 2019

neolit123 Feb 12, 2019

BenTheElder Feb 12, 2019

BenTheElder commented Feb 12, 2019

BenTheElder commented Feb 12, 2019

neolit123 left a comment •

edited

Loading

BenTheElder commented Feb 12, 2019 via email

mitar commented Feb 12, 2019

BenTheElder commented Feb 12, 2019

mitar commented Feb 12, 2019

BenTheElder commented Feb 12, 2019 •

edited

Loading

mitar commented Feb 12, 2019

BenTheElder commented Feb 12, 2019

BenTheElder commented Feb 12, 2019

BenTheElder commented Feb 12, 2019

neolit123 commented Feb 12, 2019 •

edited

Loading

tao12345666333 left a comment

mitar commented Feb 12, 2019

BenTheElder commented Feb 12, 2019

BenTheElder commented Feb 12, 2019

BenTheElder commented Feb 12, 2019

disable disk eviction by default #293

disable disk eviction by default #293

Conversation

BenTheElder commented Feb 12, 2019

k8s-ci-robot commented Feb 12, 2019

BenTheElder Feb 12, 2019

Choose a reason for hiding this comment

neolit123 Feb 12, 2019

Choose a reason for hiding this comment

BenTheElder Feb 12, 2019

Choose a reason for hiding this comment

BenTheElder commented Feb 12, 2019

BenTheElder commented Feb 12, 2019

neolit123 left a comment • edited Loading

Choose a reason for hiding this comment

BenTheElder commented Feb 12, 2019 via email

mitar commented Feb 12, 2019

BenTheElder commented Feb 12, 2019

mitar commented Feb 12, 2019

BenTheElder commented Feb 12, 2019 • edited Loading

mitar commented Feb 12, 2019

BenTheElder commented Feb 12, 2019

BenTheElder commented Feb 12, 2019

BenTheElder commented Feb 12, 2019

neolit123 commented Feb 12, 2019 • edited Loading

tao12345666333 left a comment

Choose a reason for hiding this comment

mitar commented Feb 12, 2019

BenTheElder commented Feb 12, 2019

BenTheElder commented Feb 12, 2019

BenTheElder commented Feb 12, 2019

neolit123 left a comment •

edited

Loading

BenTheElder commented Feb 12, 2019 •

edited

Loading

neolit123 commented Feb 12, 2019 •

edited

Loading