Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

kube-dns & kubernetes-dashboard pods crashing on Kubernetes 1.9 #2100

Closed
ghost opened this issue Jan 19, 2018 · 26 comments · Fixed by #2139
Closed

kube-dns & kubernetes-dashboard pods crashing on Kubernetes 1.9 #2100

ghost opened this issue Jan 19, 2018 · 26 comments · Fixed by #2139

Comments

@ghost
Copy link

ghost commented Jan 19, 2018

Is this a request for help?:
NO

Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE

What version of acs-engine?:
Version: v0.12.0
GitCommit: 1d33229
GitTreeState: clean


Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes 1.9

What happened:
When deploying hybrid (Linux/Windows) Kubernetes cluster, kube-dns and kubernetes-dashboard pods land in CrashLoopBackOffstate:

$ kubectl get pods --namespace=kube-system 
NAME                                            READY     STATUS             RESTARTS   AGE
heapster-7768c79696-79bxl                       2/2       Running            0          34m
kube-addon-manager-k8s-master-18208945-0        1/1       Running            0          33m
kube-apiserver-k8s-master-18208945-0            1/1       Running            0          33m
kube-controller-manager-k8s-master-18208945-0   1/1       Running            0          33m
kube-dns-v20-55498dbf49-rpzhb                   1/3       CrashLoopBackOff   20         34m
kube-dns-v20-55498dbf49-zpgz4                   1/3       CrashLoopBackOff   20         34m
kube-proxy-5lnt9                                1/1       Running            0          34m
kube-proxy-xnvl6                                1/1       Running            0          34m
kube-scheduler-k8s-master-18208945-0            1/1       Running            0          33m
kubernetes-dashboard-868965c888-ghgnz           0/1       CrashLoopBackOff   10         34m
tiller-deploy-589f6788d7-lr4h5                  1/1       Running            0          34m

This prevents me from using Kubernetes dashboard.
Service discovery within the cluster is impossible.

What you expected to happen:
kube-dns and kubernetes-dashboard to be in Running state.

How to reproduce it (as minimally and precisely as possible):
acs-engine emplate:

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorRelease": "1.9"
    },

    "masterProfile": {
      "count": 1,
      "dnsPrefix": "k8s-sl31065",
      "vmSize": "Standard_D2s_v3"
    },
    "agentPoolProfiles": [
      {
        "name": "linuxpool1",
        "count": 1,
        "vmSize": "Standard_D2s_v3",
	      "storageProfile": "ManagedDisks",
        "availabilityProfile": "AvailabilitySet",
        "osType": "Linux"
      },
      {
        "name": "windowspool1",
        "count": 1,
        "vmSize": "Standard_D2_v3",
        "storageProfile": "ManagedDisks",
        "OSDiskSizeGB": 100,
        "availabilityProfile": "AvailabilitySet",
        "osType": "Windows"
      }
    ],

    "linuxProfile": {
      "adminUsername": "[REDACTED]",
      "ssh": {
        "publicKeys": [
          {
            "keyData": "[REDACTED]"
          }
        ]
      }
    },
    "windowsProfile": {
      "adminUsername": "[REDACTED]",
      "adminPassword": "[REDACTED]"
    },
    "servicePrincipalProfile": {
      "clientId": "[REDACTED]",
      "secret": "[REDACTED]"
    }
  }
}

Anything else we need to know:

@feiskyer
Copy link
Member

Dashboard is depending on kube-dns, so should fix kube-dns first. Could you check kube-dns status first? E.g.

kubectl -n kube-system describe pod kube-dns-v20-55498dbf49-rpzhb
kubectl -n kube-system logs kube-dns-v20-55498dbf49-rpzhb -c kubedns

@ghost
Copy link
Author

ghost commented Jan 22, 2018

Here you are:

$ kubectl -n kube-system describe pod kube-dns-v20-55498dbf49-qmk9k
Name:           kube-dns-v20-55498dbf49-qmk9k
Namespace:      kube-system
Node:           k8s-linuxpool1-18208945-0/10.240.0.4
Start Time:     Fri, 19 Jan 2018 16:43:36 +0100
Labels:         k8s-app=kube-dns
                kubernetes.io/cluster-service=true
                pod-template-hash=1105486905
                version=v20
Annotations:    scheduler.alpha.kubernetes.io/critical-pod=
Status:         Running
IP:             10.244.1.4
Controlled By:  ReplicaSet/kube-dns-v20-55498dbf49
Containers:
  kubedns:
    Container ID:  docker://2f9701cff82fdb45f0fa441856a0fc349481ab1479a30c7aacd0a2975e795d30
    Image:         k8s-gcrio.azureedge.net/k8s-dns-kube-dns-amd64:1.14.5
    Image ID:      docker-pullable://k8s-gcrio.azureedge.net/k8s-dns-kube-dns-amd64@sha256:1a3fc069de481ae690188f6f1ba4664b5cc7760af37120f70c86505c79eea61d
    Ports:         10053/UDP, 10053/TCP
    Args:
      --domain=cluster.local.
      --dns-port=10053
      --v=2
      --config-dir=/kube-dns-config
    State:          Running
      Started:      Mon, 22 Jan 2018 07:55:34 +0100
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Mon, 22 Jan 2018 07:51:46 +0100
      Finished:     Mon, 22 Jan 2018 07:52:46 +0100
    Ready:          False
    Restart Count:  86
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/healthz-kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8081/readiness delay=30s timeout=5s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /kube-dns-config from kube-dns-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-lsg94 (ro)
  dnsmasq:
    Container ID:  docker://d1e46a32bf07314d5dbaf3ee385af448c303463c54171a7ac8d998ef46f2fd61
    Image:         k8s-gcrio.azureedge.net/k8s-dns-dnsmasq-nanny-amd64:1.14.5
    Image ID:      docker-pullable://k8s-gcrio.azureedge.net/k8s-dns-dnsmasq-nanny-amd64@sha256:46b933bb70270c8a02fa6b6f87d440f6f1fce1a5a2a719e164f83f7b109f7544
    Ports:         53/UDP, 53/TCP
    Args:
      -v=2
      -logtostderr
      -configDir=/kube-dns-config
      -restartDnsmasq=true
      --
      -k
      --cache-size=1000
      --no-resolv
      --server=127.0.0.1#10053
      --server=/in-addr.arpa/127.0.0.1#10053
      --server=/ip6.arpa/127.0.0.1#10053
      --log-facility=-
    State:          Running
      Started:      Mon, 22 Jan 2018 07:44:56 +0100
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 19 Jan 2018 16:43:41 +0100
      Finished:     Mon, 22 Jan 2018 07:43:18 +0100
    Ready:          True
    Restart Count:  1
    Environment:    <none>
    Mounts:
      /kube-dns-config from kube-dns-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-lsg94 (ro)
  healthz:
    Container ID:  docker://15239f66cd0586133da7d053c4156d72dc56ce9229e33155ea63584007326078
    Image:         k8s-gcrio.azureedge.net/exechealthz-amd64:1.2
    Image ID:      docker-pullable://k8s-gcrio.azureedge.net/exechealthz-amd64@sha256:503e158c3f65ed7399f54010571c7c977ade7fe59010695f48d9650d83488c0a
    Port:          8080/TCP
    Args:
      --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
      --url=/healthz-dnsmasq
      --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
      --url=/healthz-kubedns
      --port=8080
      --quiet
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Mon, 22 Jan 2018 07:51:44 +0100
      Finished:     Mon, 22 Jan 2018 07:53:21 +0100
    Ready:          False
    Restart Count:  117
    Limits:
      memory:  50Mi
    Requests:
      cpu:        10m
      memory:     50Mi
    Liveness:     http-get http://:8080/healthz-dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-lsg94 (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  kube-dns-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-dns
    Optional:  true
  kube-dns-token-lsg94:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kube-dns-token-lsg94
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  beta.kubernetes.io/os=linux
Tolerations:     CriticalAddonsOnly
Events:
  Type     Reason                 Age                 From                                Message
  ----     ------                 ----                ----                                -------
  Warning  BackOff                2d (x1287 over 2d)  kubelet, k8s-linuxpool1-18208945-0  Back-off restarting failed container
  Warning  Unhealthy              2d (x442 over 2d)   kubelet, k8s-linuxpool1-18208945-0  Liveness probe failed: HTTP probe failed with statuscode: 503
  Warning  BackOff                2d (x1922 over 2d)  kubelet, k8s-linuxpool1-18208945-0  Back-off restarting failed container
  Warning  NetworkNotReady        11m                 kubelet, k8s-linuxpool1-18208945-0  network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
  Normal   SuccessfulMountVolume  11m                 kubelet, k8s-linuxpool1-18208945-0  MountVolume.SetUp succeeded for volume "kube-dns-config"
  Normal   SuccessfulMountVolume  11m                 kubelet, k8s-linuxpool1-18208945-0  MountVolume.SetUp succeeded for volume "kube-dns-token-lsg94"
  Normal   SandboxChanged         11m                 kubelet, k8s-linuxpool1-18208945-0  Pod sandbox changed, it will be killed and re-created.
  Normal   Created                10m                 kubelet, k8s-linuxpool1-18208945-0  Created container
  Normal   Started                10m                 kubelet, k8s-linuxpool1-18208945-0  Started container
  Normal   Pulling                10m                 kubelet, k8s-linuxpool1-18208945-0  pulling image "k8s-gcrio.azureedge.net/k8s-dns-dnsmasq-nanny-amd64:1.14.5"
  Normal   Pulled                 10m                 kubelet, k8s-linuxpool1-18208945-0  Successfully pulled image "k8s-gcrio.azureedge.net/k8s-dns-dnsmasq-nanny-amd64:1.14.5"
  Normal   Created                10m                 kubelet, k8s-linuxpool1-18208945-0  Created container
  Normal   Started                10m                 kubelet, k8s-linuxpool1-18208945-0  Started container
  Normal   Pulling                10m                 kubelet, k8s-linuxpool1-18208945-0  pulling image "k8s-gcrio.azureedge.net/exechealthz-amd64:1.2"
  Normal   Pulled                 10m                 kubelet, k8s-linuxpool1-18208945-0  Successfully pulled image "k8s-gcrio.azureedge.net/exechealthz-amd64:1.2"
  Normal   Created                10m                 kubelet, k8s-linuxpool1-18208945-0  Created container
  Normal   Started                10m                 kubelet, k8s-linuxpool1-18208945-0  Started container
  Warning  Unhealthy              10m (x3 over 10m)   kubelet, k8s-linuxpool1-18208945-0  Readiness probe failed: Get http://10.244.1.4:8081/readiness: dial tcp 10.244.1.4:8081: getsockopt: connection refused
  Warning  BackOff                9m (x2 over 9m)     kubelet, k8s-linuxpool1-18208945-0  Back-off restarting failed container
  Normal   Pulling                9m (x2 over 11m)    kubelet, k8s-linuxpool1-18208945-0  pulling image "k8s-gcrio.azureedge.net/k8s-dns-kube-dns-amd64:1.14.5"
  Normal   Pulled                 9m (x2 over 11m)    kubelet, k8s-linuxpool1-18208945-0  Successfully pulled image "k8s-gcrio.azureedge.net/k8s-dns-kube-dns-amd64:1.14.5"
  Warning  Unhealthy              6m (x12 over 9m)    kubelet, k8s-linuxpool1-18208945-0  Liveness probe failed: HTTP probe failed with statuscode: 503
  Warning  BackOff                59s (x9 over 2m)    kubelet, k8s-linuxpool1-18208945-0  Back-off restarting failed container
$ kubectl -n kube-system logs kube-dns-v20-55498dbf49-qmk9k
Error from server (BadRequest): a container name must be specified for pod kube-dns-v20-55498dbf49-qmk9k, choose one of: [kubedns dnsmasq healthz]
$ kubectl -n kube-system logs kube-dns-v20-55498dbf49-qmk9k -c kubedns
I0122 06:55:34.771927       1 dns.go:48] version: 1.14.4-2-g5584e04
I0122 06:55:34.772927       1 server.go:70] Using configuration read from directory: /kube-dns-config with period 10s
I0122 06:55:34.773012       1 server.go:113] FLAG: --alsologtostderr="false"
I0122 06:55:34.773030       1 server.go:113] FLAG: --config-dir="/kube-dns-config"
I0122 06:55:34.773039       1 server.go:113] FLAG: --config-map=""
I0122 06:55:34.773069       1 server.go:113] FLAG: --config-map-namespace="kube-system"
I0122 06:55:34.773086       1 server.go:113] FLAG: --config-period="10s"
I0122 06:55:34.773095       1 server.go:113] FLAG: --dns-bind-address="0.0.0.0"
I0122 06:55:34.773121       1 server.go:113] FLAG: --dns-port="10053"
I0122 06:55:34.773136       1 server.go:113] FLAG: --domain="cluster.local."
I0122 06:55:34.773145       1 server.go:113] FLAG: --federations=""
I0122 06:55:34.773154       1 server.go:113] FLAG: --healthz-port="8081"
I0122 06:55:34.773160       1 server.go:113] FLAG: --initial-sync-timeout="1m0s"
I0122 06:55:34.773167       1 server.go:113] FLAG: --kube-master-url=""
I0122 06:55:34.773177       1 server.go:113] FLAG: --kubecfg-file=""
I0122 06:55:34.773183       1 server.go:113] FLAG: --log-backtrace-at=":0"
I0122 06:55:34.773194       1 server.go:113] FLAG: --log-dir=""
I0122 06:55:34.773202       1 server.go:113] FLAG: --log-flush-frequency="5s"
I0122 06:55:34.773210       1 server.go:113] FLAG: --logtostderr="true"
I0122 06:55:34.773217       1 server.go:113] FLAG: --nameservers=""
I0122 06:55:34.773223       1 server.go:113] FLAG: --stderrthreshold="2"
I0122 06:55:34.773231       1 server.go:113] FLAG: --v="2"
I0122 06:55:34.773238       1 server.go:113] FLAG: --version="false"
I0122 06:55:34.773289       1 server.go:113] FLAG: --vmodule=""
I0122 06:55:34.773619       1 server.go:176] Starting SkyDNS server (0.0.0.0:10053)
I0122 06:55:34.773886       1 server.go:200] Skydns metrics not enabled
I0122 06:55:34.773935       1 dns.go:147] Starting endpointsController
I0122 06:55:34.774276       1 dns.go:150] Starting serviceController
I0122 06:55:34.774023       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0122 06:55:34.774920       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0122 06:55:35.275264       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:35.775423       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:36.275361       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:36.775323       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:37.275334       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:37.775288       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:38.275349       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:38.775307       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:39.275312       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:39.775244       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:40.275310       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:40.775289       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:41.275252       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:41.775293       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:42.275275       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:42.775252       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:43.275257       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:43.775353       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:44.275332       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:44.775317       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:45.275278       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:45.775285       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:46.275333       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:46.775348       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:47.275280       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:47.775301       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:48.275276       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:48.775312       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:49.275307       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:49.775266       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:50.275400       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:50.775380       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:51.275287       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:51.775344       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:52.275282       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:52.775271       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:53.275285       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:53.775396       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:54.275458       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:54.775282       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:55.275364       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:55.775328       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:56.275335       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:56.775290       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:57.275286       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:57.775374       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:58.275282       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:58.775366       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:59.275263       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:55:59.775296       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:00.275276       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:00.775358       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:01.275356       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:01.775296       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:02.275363       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:02.775371       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:03.275279       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:03.775275       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:04.275260       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
E0122 06:56:04.774977       1 reflector.go:199] k8s.io/dns/vendor/k8s.io/client-go/tools/cache/reflector.go:94: Failed to list *v1.Endpoints: Get https://10.0.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.0.0.1:443: i/o timeout
I0122 06:56:04.775358       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
E0122 06:56:04.775574       1 reflector.go:199] k8s.io/dns/vendor/k8s.io/client-go/tools/cache/reflector.go:94: Failed to list *v1.Service: Get https://10.0.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.0.0.1:443: i/o timeout
I0122 06:56:05.275295       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:05.775182       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:06.275288       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:06.775296       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:07.275346       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:07.775285       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:08.275321       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:08.775284       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:09.275290       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:09.775270       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:10.275244       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:10.775382       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:11.275359       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:11.775307       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:12.275382       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:12.775373       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:13.275318       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:13.775274       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:14.275290       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:14.775273       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:15.275348       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:15.775400       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:16.275370       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:16.775302       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:17.275343       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:17.775269       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:18.275392       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:18.775463       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:19.275298       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:19.775249       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:20.275260       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:20.775264       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:21.275410       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:21.775459       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:22.275278       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:22.775315       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:23.275368       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:23.775360       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:24.275374       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:24.775447       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:25.275279       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:25.775347       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:26.275349       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:26.775428       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:27.275321       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:27.775330       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:28.275306       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:28.775299       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:29.275267       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:29.775275       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:30.275369       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:30.775339       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:31.275264       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:31.775354       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:32.275421       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:32.775260       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:33.275249       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:33.775322       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0122 06:56:34.275314       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
F0122 06:56:34.775266       1 dns.go:168] Timeout waiting for initialization

$ kubectl -n kube-system logs kube-dns-v20-55498dbf49-qmk9k -c dnsmasq
I0122 06:44:57.048367       1 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --no-resolv --server=127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053 --log-facility=-] true} /kube-dns-config 10000000000}
I0122 06:44:57.048639       1 nanny.go:86] Starting dnsmasq [-k --cache-size=1000 --no-resolv --server=127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053 --log-facility=-]
I0122 06:44:58.399819       1 nanny.go:111] 
W0122 06:44:58.399976       1 nanny.go:112] Got EOF from stdout
I0122 06:44:58.400651       1 nanny.go:108] dnsmasq[9]: started, version 2.78-security-prerelease cachesize 1000
I0122 06:44:58.400716       1 nanny.go:108] dnsmasq[9]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0122 06:44:58.400755       1 nanny.go:108] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain ip6.arpa 
I0122 06:44:58.400791       1 nanny.go:108] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa 
I0122 06:44:58.400824       1 nanny.go:108] dnsmasq[9]: using nameserver 127.0.0.1#10053
I0122 06:44:58.400902       1 nanny.go:108] dnsmasq[9]: read /etc/hosts - 7 addresses
$ kubectl -n kube-system logs kube-dns-v20-55498dbf49-qmk9k -c healthz
2018/01/22 06:57:21 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2018-01-22 06:57:14.929495647 +0000 UTC, error exit status 1
2018/01/22 06:57:31 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2018-01-22 06:57:14.929495647 +0000 UTC, error exit status 1
2018/01/22 06:57:41 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2018-01-22 06:57:14.929495647 +0000 UTC, error exit status 1
2018/01/22 06:57:51 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2018-01-22 06:57:14.929495647 +0000 UTC, error exit status 1

@feiskyer
Copy link
Member

@odauby From kube-dns logs, 10.0.0.1:443 i/o timeout. This is the kubernetes services used for connecting kube-apiserver. Could you verify whether it is health? e.g.

  • kubectl get endpoints kubernetes, and check whether the endpoints is working
  • If ok, check connection of 10.0.0.1:443 on the node running kube-dns (get node by kubectl -n kubesystem get pod -o wide kube-dns-v20-55498dbf49-qmk9k)

@ghost
Copy link
Author

ghost commented Jan 22, 2018

Here you are:

$ kubectl get endpoints kubernetes
NAME         ENDPOINTS          AGE
kubernetes   10.240.255.5:443   2d
$ kubectl -n kube-system get pod -o wide kube-dns-v20-55498dbf49-qmk9k
NAME                            READY     STATUS             RESTARTS   AGE       IP           NODE
kube-dns-v20-55498dbf49-qmk9k   2/3       CrashLoopBackOff   255        2d        10.244.1.4   k8s-linuxpool1-18208945-0

From the node itself:

stylelabs@k8s-linuxpool1-18208945-0:~$ curl -I --insecure https://10.0.0.1:443
HTTP/1.1 401 Unauthorized
Content-Type: application/json
Date: Mon, 22 Jan 2018 09:02:15 GMT
Content-Length: 165

@oivindoh
Copy link
Contributor

oivindoh commented Jan 22, 2018

I think I'm affected by the same issue, but I might be muddling this issue a bit because I appear to only get the problem using nonstandard dockerengineversion.

E.g. the following apimodel gives me a working 1 master 1 node k82 1.9.1 ubuntu cluster on ACS-engine 0.12.2 (and .1):

{
  "apiVersion": "vlabs",
  "properties": {
    "windowsProfile": {
      "adminPassword": "x",
      "adminUsername": "x"
    },
    "servicePrincipalProfile": {
      "clientId": "x",
      "secret": "x"
    },
    "agentPoolProfiles": [
      {
        "availabilityProfile": "AvailabilitySet",
        "distro": "ubuntu",
        "name": "linpool1",
        "storageProfile": "ManagedDisks",
        "vmSize": "Standard_D2_v3",
        "count": 1
      }
    ],
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "kubernetesConfig": {},
      "orchestratorRelease": "1.9"
    },
    "linuxProfile": {
      "ssh": {
        "publicKeys": [
          {
            "keyData": "x"
          }
        ]
      },
      "adminUsername": "x"
    },
    "masterProfile": {
      "distro": "ubuntu",
      "count": 1,
      "vmSize": "Standard_D2_v3",
      "dnsPrefix": "dockerorig"
    }
  }
}

Whereas the following gives the exact same symptoms mentioned by odauby:

{
  "apiVersion": "vlabs",
  "properties": {
    "windowsProfile": {
      "adminUsername": "x",
      "adminPassword": "x"
    },
    "linuxProfile": {
      "adminUsername": "x",
      "ssh": {
        "publicKeys": [
          {
            "keyData": "x"
          }
        ]
      }
    },
    "masterProfile": {
      "count": 1,
      "vmSize": "Standard_D2_v3",
      "distro": "ubuntu",
      "dnsPrefix": "dockerce"
    },
    "servicePrincipalProfile": {
      "clientId": "x",
      "secret": "x"
    },
    "orchestratorProfile": {
      "orchestratorRelease": "1.9",
      "orchestratorType": "Kubernetes",
      "kubernetesConfig": {
        "dockerEngineVersion": "17.05.*"
      }
    },
    "agentPoolProfiles": [
      {
        "vmSize": "Standard_D2_v3",
        "availabilityProfile": "AvailabilitySet",
        "distro": "ubuntu",
        "count": 1,
        "name": "linpool1",
        "storageProfile": "ManagedDisks"
      }
    ]
  }
}

The apiserver appears fine in both cases, but does log instances of this:

I0122 10:47:36.851306 1 logs.go:41] http: TLS handshake error from 168.63.129.16:62114: EOF

Similarly kube-dnslogs instances of I0122 15:26:48.932129 1 logs.go:41] skydns: failure to forward request "read udp 10.240.0.18:47848->168.63.129.16:53: i/o timeout"

@oivindoh
Copy link
Contributor

Tested stopping the VMs and booting them in order - same result as above when nodes come back to life. Didn't expect this to have any effect, but worth a shot.

Also attempted to deploy unspecified docker version (so 1.12.6) vs specified (still 17.05.*) with networkpolicy explicitly set to "none" instead of the implicit "azure" from above, yielding the same results:

  • docker 17.05: kube-dns/dashboard crashes continuously
  • docker 1.12: all running OK, no restarts

@odauby are you getting this on docker 1.12, not just with 17.* relases? I was hoping I'd find acs-engine 0.12.0 to have 17.* as the default for k8s 1.9.1 to explain the differences we observe in behaviour, but it appears that is not the case.

@feiskyer
Copy link
Member

@odauby @oivindoh For docker version >=1.13, docker changes FORWARD to DROP by default. Could you run iptables -P FORWARD ACCEPT and check whether kube-dns and dashboard come back?

@oivindoh
Copy link
Contributor

@feiskyer yep, that's it! I added ACCEPT, deleted the pods to make them restart faster, and now they're connecting properly, leaving me with a working cluster. Is this something I can fix via acs-engine while deploying, or will I have to keep doing this manually? I suppose defaulting to ACCEPT isn't the most ideal solution to the problem, but guessing this is a k8s issue more than an acs-engine issue.

@ghost
Copy link
Author

ghost commented Jan 23, 2018

@feiskyer Does not help for me: since I did not set any dockerEngineVersion, I end up with docker 1.12.6 for master and Linux nodes, and 17.6.2 for Windows agent.
So, on a Linux agent the default policy for the FOWARD chain in already ACCEPT:

stylelabs@k8s-linuxpool1-18208945-0:~$ iptables -L
iptables v1.6.0: can't initialize iptables table `filter': Permission denied (you must be root)
Perhaps iptables or your kernel needs to be upgraded.
stylelabs@k8s-linuxpool1-18208945-0:~$ sudo !!
sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-SERVICES  all  --  anywhere             anywhere             /* kubernetes service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere            

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
KUBE-FORWARD  all  --  anywhere             anywhere             /* kubernetes forward rules */
DOCKER-ISOLATION  all  --  anywhere             anywhere            
DOCKER     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-SERVICES  all  --  anywhere             anywhere             /* kubernetes service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere            

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION (1 references)
target     prot opt source               destination         
RETURN     all  --  anywhere             anywhere            

Chain KUBE-FIREWALL (2 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere             /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000

Chain KUBE-FORWARD (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT     all  --  10.244.0.0/16        anywhere             /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             10.244.0.0/16        /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination         
REJECT     udp  --  anywhere             10.0.0.10            /* kube-system/kube-dns:dns has no endpoints */ udp dpt:domain reject-with icmp-port-unreachable
REJECT     tcp  --  anywhere             10.0.0.10            /* kube-system/kube-dns:dns-tcp has no endpoints */ tcp dpt:domain reject-with icmp-port-unreachable
REJECT     tcp  --  anywhere             anywhere             /* kube-system/kubernetes-dashboard: has no endpoints */ ADDRTYPE match dst-type LOCAL tcp dpt:30431 reject-with icmp-port-unreachable
REJECT     tcp  --  anywhere             10.0.135.69          /* kube-system/kubernetes-dashboard: has no endpoints */ tcp dpt:https reject-with icmp-port-unreachable

@feiskyer
Copy link
Member

@oivindoh Kubernetes community has already added FORWARD rules in kube-proxy (kubernetes/kubernetes#52569). We need to figure out why it is not working on Azure.

@odauby While I couldn't repro the problem with docker v1.12. Did recreating kube-dns pod help in your case? e.g.

# new pods will be created automatically after this.
kubectl delete pod kube-dns-v20-55498dbf49-rpzhb kube-dns-v20-55498dbf49-zpgz4 

@ghost
Copy link
Author

ghost commented Jan 23, 2018

@feiskyer

$ kubectl delete -n kube-system pod kube-dns-v20-55498dbf49-qmk9k kube-dns-v20-55498dbf49-z7sq6
pod "kube-dns-v20-55498dbf49-qmk9k" deleted
pod "kube-dns-v20-55498dbf49-z7sq6" deleted

$ kubectl delete  -n kube-system pod kubernetes-dashboard-868965c888-ghgnz
pod "kubernetes-dashboard-868965c888-ghgnz" deleted

They get recreated and then Crash loop again

$ kubectl get pods -n kube-system 
NAME                                            READY     STATUS             RESTARTS   AGE
heapster-7768c79696-79bxl                       2/2       Running            4          3d
kube-addon-manager-k8s-master-18208945-0        1/1       Running            2          3d
kube-apiserver-k8s-master-18208945-0            1/1       Running            2          3d
kube-controller-manager-k8s-master-18208945-0   1/1       Running            2          3d
kube-dns-v20-55498dbf49-fw4hx                   2/3       CrashLoopBackOff   8          7m
kube-dns-v20-55498dbf49-g5cbb                   2/3       CrashLoopBackOff   8          7m
kube-proxy-5lnt9                                1/1       Running            2          3d
kube-proxy-xnvl6                                1/1       Running            2          3d
kube-scheduler-k8s-master-18208945-0            1/1       Running            2          3d
kubernetes-dashboard-868965c888-crzjx           0/1       CrashLoopBackOff   1          1m
tiller-deploy-589f6788d7-lr4h5                  1/1       Running            2          3d

I also Azure-reallocated the master and azgent nodes, no improvement.

@feiskyer
Copy link
Member

@odauby Seems there is something wrong with Pod networking. As you confirmed above, the node itself could access https://10.0.0.1:443 while kubedns couldn't. Have you created other NSGs which may block pod networking potentially?

@ghost
Copy link
Author

ghost commented Jan 24, 2018

@feiskyer no, I did not. This is a vanilla acs-engine deploy:

acs-engine deploy --subscription-id $azureSubscriptionID --location $azureLocation --auto-suffix  --api-model ${clusterName}.json --resource-group $clusterName

Where :
$azureLocation = westeurope
$clusterName = k8s-sl31065

@feiskyer
Copy link
Member

@odauby Could you upgrade to latest acs-engine and try again?

@feiskyer
Copy link
Member

@oivindoh Kubernetes community has already added FORWARD rules in kube-proxy (kubernetes/kubernetes#52569). We need to figure out why it is not working on Azure.

Just noticed kubernetes/kubernetes#52569 only fixes problem for NodePort services, we still need to enable FORWARD.

@ghost
Copy link
Author

ghost commented Jan 24, 2018

@feiskyer just tried with latest acs-engine, no improvement.

Version: v0.12.4
GitCommit: 069d9e4
GitTreeState: clean

api-model:

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorRelease": "1.9"
    },

    "masterProfile": {
      "count": 1,
      "dnsPrefix": "k8s-sl31075",
      "vmSize": "Standard_D2s_v3"
    },
    "agentPoolProfiles": [
      {
        "name": "linuxpool1",
        "count": 1,
        "vmSize": "Standard_D2s_v3",
	      "storageProfile": "ManagedDisks",
        "availabilityProfile": "AvailabilitySet",
        "osType": "Linux"
      },
      {
        "name": "windowspool1",
        "count": 1,
        "vmSize": "Standard_D2_v3",
        "storageProfile": "ManagedDisks",
        "OSDiskSizeGB": 100,
        "availabilityProfile": "AvailabilitySet",
        "osType": "Windows"
      }
    ],

    "linuxProfile": {
      "adminUsername": "[REDACTED]",
      "ssh": {
        "publicKeys": [
          {
            "keyData": "[REDACTED]"
          }
        ]
      }
    },
    "windowsProfile": {
      "adminUsername": "[REDACTED]",
      "adminPassword": "[REDACTED]"
    },
    "servicePrincipalProfile": {
      "clientId": "[REDACTED]",
      "secret": "[REDACTED]"
    }
  }
}

Deployment :

acs-engine deploy --subscription-id $azureSubscriptionID --location $azureLocation --auto-suffix  --api-model ${clusterName}.json --resource-group $clusterName

where:
$azureLocation is westeurope
$cluster is k8s-sl31075

Outcome:

$ kubectl -n kube-system get pods
NAME                                            READY     STATUS             RESTARTS   AGE
heapster-7768c79696-h9cf4                       2/2       Running            0          14m
kube-addon-manager-k8s-master-32130662-0        1/1       Running            0          13m
kube-apiserver-k8s-master-32130662-0            1/1       Running            0          13m
kube-controller-manager-k8s-master-32130662-0   1/1       Running            0          13m
kube-dns-v20-55498dbf49-nmvsj                   1/3       CrashLoopBackOff   12         14m
kube-dns-v20-55498dbf49-sbs55                   1/3       CrashLoopBackOff   12         14m
kube-proxy-kd9mn                                1/1       Running            0          14m
kube-proxy-tgpfn                                1/1       Running            0          14m
kube-scheduler-k8s-master-32130662-0            1/1       Running            0          13m
kubernetes-dashboard-868965c888-xbhw2           0/1       CrashLoopBackOff   6          14m
tiller-deploy-589f6788d7-2kl7z                  1/1       Running            0          14m

@ghost
Copy link
Author

ghost commented Jan 24, 2018

Just tried with Kubernetes 1.8 by updating this line in the api-model:

"orchestratorRelease": "1.8"

Same CrashLoopBackOff issue on the kube-dns pods. So it does not seem to be Kubernetes 1.9 related IMHO.

@ghost
Copy link
Author

ghost commented Jan 24, 2018

@feiskyer just noticed that if I remove the Windows machine from my api-model, kube-dns-v20 is succesful and kube-dns pods don't crash anymore.

Could it be that for some odd reason the kube-dns-v20 deployment is scheduled on a Windows agent ?
I remember that acs Kubernetes schedules containers by default on Windows agents and that you have to add a node selector to get them scheduled on Linux agents:

      nodeSelector:
        beta.kubernetes.io/os: linux

@feiskyer
Copy link
Member

Actually I couldn't repro the problem with or without windows nodes. I have verified api-model from both #2100 (comment) and #2100 (comment).

@odauby have you joined kubernetes slack? If so, I can help to check what's wrong tomorrow.

@msorby
Copy link

msorby commented Feb 5, 2018

For anyone looking, #2174 fixes the issue if you use networkPolicy azure.

@saiyan86
Copy link
Contributor

saiyan86 commented Feb 6, 2018

Yes. If you want to use networkPolicy azure, please use #2174 . It should be merged soon.

@magnock
Copy link

magnock commented Feb 15, 2018

Yes, indeed, the principle didn't have "Contributor" role on the resource group.
After setting that, all system pods were running without problems.

@jackfrancis
Copy link
Member

Thanks for confirming your fix @magnock FYI v0.13.0 release includes #2174 above.

@magnock
Copy link

magnock commented Feb 19, 2018

Thank you @jackfrancis, I used v0.13.0 and all pods are running good, but we still have the internal windows DNS problem, windows containers can't resolve hostnames and can't access Internet. I've seen some quick fixes :

  • using container from this image : microsoft/windowsservercore:1709_KB4074588
    or
  • by setting DNS client using powershell
    $adapter=Get-NetAdapter Set-DnsClientServerAddress -InterfaceIndex
    $adapter.ifIndex -ServerAddresses 10.244.0.2,10.244.0.3 Set-DnsClient -InterfaceIndex
    $adapter.ifIndex -ConnectionSpecificSuffix "default.svc.cluster.local"

Are theses changes already merged to master ? any new release that will include that ?
Thx

@msorby
Copy link

msorby commented Feb 20, 2018

@magnock Why do you consider using microsoft/windowsservercore:1709_KB4074588 a quick fix?

@magnock
Copy link

magnock commented Feb 21, 2018

@msorby There is some discussion about that on #2027, But it doesn't really help.
If I restart my cluster, I need to check the DNS containers IPs, rebuild windows image with new DNS IPs and then redeploy windows containers :( (OR using env variables)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants