Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Hybrid Cluster with Custom Vnet Doesn't seem to work with acs-engine v0.11 #1949

Closed
jwalker343 opened this issue Dec 19, 2017 · 8 comments
Closed

Comments

@jwalker343
Copy link

jwalker343 commented Dec 19, 2017

Is this a request for help?:

YES


Is this an ISSUE or FEATURE REQUEST? (choose one):

ISSUE


What version of acs-engine?:

0.11.0


Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
kubernetes 1.7.9

What happened:
I downloaded the latest release of acs-engine and created a hybrid cluster in a custom vnet with the template below. Windows Pods fail to start with an "Error Syncing Pod"

What you expected to happen:
Windows Pods should run properly.

How to reproduce it (as minimally and precisely as possible):

I have a vnet VN-Sandbox1-useast with 3 subnets:
k8smaster = 10.201.150.0/26
k8sagent = 10.201.155.0/26
k8sclustersubnet = 10.201.240.0/21

template.json:

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "kubernetesConfig": {
          "clusterSubnet": "10.201.240.0/21",
          "networkPolicy": "none"
      }
    },
    "masterProfile": {
        "count": 3,
        "dnsPrefix": "sandbox1-useast",
        "vmSize": "Standard_D2_v2",
        "vnetSubnetId": "/subscriptions/xxxx/resourceGroups/RG-Sandbox1-USEast/providers/Microsoft.Network/virtualNetworks/VN-sandbox1-USEast/subnets/k8smaster",
        "firstConsecutiveStaticIP": "10.201.150.10",
        "vnetCidr": "10.201.0.0/16"
    },
    "agentPoolProfiles": [
    {
        "name": "linuxpool1",
        "count": 2,
        "vmSize": "Standard_D2_v2",
        "vnetSubnetId": "/subscriptions/xxxx/resourceGroups/RG-sandbox1-USEast/providers/Microsoft.Network/virtualNetworks/VN-sandbox1-USEast/subnets/k8sagent",
        "availabilityProfile": "AvailabilitySet",
        "osType": "Linux"
    },
    {
        "name": "winpool1",
        "count": 2,
        "vmSize": "Standard_D2_v2",
        "vnetSubnetId": "/subscriptions/xxxx/resourceGroups/RG-sandbox1-USEast/providers/Microsoft.Network/virtualNetworks/VN-sandbox1-USEast/subnets/k8sagent",
        "availabilityProfile": "AvailabilitySet",
        "osType": "Windows"
    }
    ],
    "windowsProfile": {
        "adminUsername": "k8sagentadmin",
        "adminPassword": "xxxx"
    },
    "linuxProfile": {
        "adminUsername": "azureuser",
        "ssh": {
            "publicKeys": [
                {
                    "keyData": "ssh-rsa xxxx [email protected]"
                }
            ]
        }
    },
    "servicePrincipalProfile": {
        "clientId": "xxxx",
        "secret": "xxxx"
    }
}
}

Run ./acsengine generate template.json

Edit the azuredeploy.json file and add the subnet variable due to #1767 (comment)

Run az group deployment create --template-file "azuredeploy.json" --parameters "azuredeploy.parameters.json" -g RG-Sandbox1-useast -n VN-Sandbox1-useast

Update route tables:
az network vnet subnet update -n k8smaster -g RG-Sandbox1-useast --vnet-name VN-sandbox1-useast --route-table <RT_NAME>

az network vnet subnet update -n k8sagent -g RG-Sandbox1-useast --vnet-name VN-Sandbox1-useast --route-table <RT_NAME>

Run a standard aspnet image:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: scratch-dep
spec:
  replicas: 1
  template:
    metadata:
      name: scratch-dep
      labels:
        app: asp
    spec:
      restartPolicy: Always
      nodeSelector:
        beta.kubernetes.io/os: windows
      containers:
      - name: asp
        imagePullPolicy: Always
        image: microsoft/aspnet:4.7.1-windowsservercore-1709
        tty: true
        stdin: true
        ports:
        - containerPort: 80

Anything else we need to know:

C:\k\kubelet.log

PS C:\k> cat .\kubelet.log
waiting to discover pod CIDR
Sleeping for 10s, and then waiting to discover pod CIDR
Ok.

No HNS network found, creating a new one...
VERBOSE: Invoke-HNSRequest Method[POST] Path[/networks/] Data[{

    "Subnets":  [

                    {

                        "GatewayAddress":
"/subscriptions/xxxx/resourceGroups/RG-Sandbox1
-USEast/providers/Microsoft.1",

                        "AddressPrefix":  "10.201.244.0/24"

                    }

                ],

    "Name":  "l2bridge",

    "Type":  "L2Bridge"

}]
VERBOSE: Result: { "Error" : "The parameter is incorrect. ", "Success" : false
}
Generated CNI Config [@{cniVersion=0.2.0; name=l2bridge; type=wincni.exe; master=Ethernet; capabilities=; ipam=; dns=; AdditionalArgs=System.Object[]}]
PS C:\k>

kubelet.err.log

~~~
{"level":"debug","msg":"[cni-net] Processing ADD command with args {ContainerID:c2eafaf460cd3623e7f1974df0aaea72617de9994efa8a199aefd493aa46dd0d Netns:none IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=scratch-dep-1585851119-h6jp0;K8S_POD_INFRA_CONTAINER_ID=c2eafaf460cd3623e7f1974df0aaea72617de9994efa8a199aefd493aa46dd0d Path:/opt/wincni.exe/bin;c:\\k\\cni}.","time":"2017-12-19T16:41:47Z"}
{"level":"error","msg":"[cni-net] Failed to parse network configuration, err:invalid IP address: /subscriptions/xxxx/resourceGroups/RG-Sandbox1-USEast/providers/Microsoft.1.","time":"2017-12-19T16:41:47Z"}
{"level":"debug","msg":"[cni-net] Plugin stopped.","time":"2017-12-19T16:41:47Z"}
E1219 16:41:47.936922     852 cni.go:238] Error adding network: unexpected end of JSON input
E1219 16:41:47.936922     852 cni.go:206] Error while adding to cni network: unexpected end of JSON input
E1219 16:41:49.457152     852 remote_runtime.go:91] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "scratch-dep-1585851119-h6jp0_default" network: unexpected end of JSON input
E1219 16:41:49.457152     852 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "scratch-dep-1585851119-h6jp0_default" network: unexpected end of JSON input
E1219 16:41:49.457152     852 kuberuntime_manager.go:624] createPodSandbox for pod "scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "scratch-dep-1585851119-h6jp0_default" network: unexpected end of JSON input
E1219 16:41:49.457152     852 pod_workers.go:182] Error syncing pod 7b60a790-e4db-11e7-b16b-000d3a14ca19 ("scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)"), skipping: failed to "CreatePodSandbox" for "scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)" with CreatePodSandboxError: "CreatePodSandbox for pod \"scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"scratch-dep-1585851119-h6jp0_default\" network: unexpected end of JSON input"
I1219 16:41:49.870640     852 kubelet.go:1917] SyncLoop (PLEG): "scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)", event: &pleg.PodLifecycleEvent{ID:"7b60a790-e4db-11e7-b16b-000d3a14ca19", Type:"ContainerDied", Data:"c2eafaf460cd3623e7f1974df0aaea72617de9994efa8a199aefd493aa46dd0d"}
W1219 16:41:49.870640     852 pod_container_deletor.go:77] Container "c2eafaf460cd3623e7f1974df0aaea72617de9994efa8a199aefd493aa46dd0d" not found in pod's containers
I1219 16:41:50.268678     852 kuberuntime_manager.go:389] No ready sandbox for pod "scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)" can be found. Need to start a new one
I1219 16:41:50.268678     852 kuberuntime_manager.go:463] Container {Name:asp Image:microsoft/aspnet:4.7.1-windowsservercore-1709 Command:[] Args:[] WorkingDir: Ports:[{Name: HostPort:0 ContainerPort:80 Protocol:TCP HostIP:}] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:default-token-14lt1 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:Always SecurityContext:nil Stdin:true StdinOnce:false TTY:true} is dead, but RestartPolicy says that we should restart it.
{"level":"debug","msg":"[cni-net] Plugin wcn-net version .","time":"2017-12-19T16:41:50Z"}
{"level":"debug","msg":"[net] Network interface: {Index:3 MTU:1500 Name:Ethernet 3 HardwareAddr:00:0d:3a:14:ce:87 Flags:up|broadcast|multicast} with IP addresses: [fe80::1ce9:58b:36af:8180/64 10.201.155.5/26]","time":"2017-12-19T16:41:50Z"}
{"level":"debug","msg":"[net] Network interface: {Index:11 MTU:1500 Name:vEthernet (nat) HardwareAddr:00:15:5d:ce:fd:51 Flags:up|broadcast|multicast} with IP addresses: [fe80::58c1:329:b498:9a1a/64 172.31.48.1/20]","time":"2017-12-19T16:41:50Z"}
{"level":"debug","msg":"[net] Network interface: {Index:1 MTU:-1 Name:Loopback Pseudo-Interface 1 HardwareAddr: Flags:up|loopback|multicast} with IP addresses: [::1/128 127.0.0.1/8]","time":"2017-12-19T16:41:50Z"}
{"level":"debug","msg":"[cni-net] Plugin started.","time":"2017-12-19T16:41:50Z"}
{"level":"debug","msg":"[cni-net] Processing DEL command with args {ContainerID:c2eafaf460cd3623e7f1974df0aaea72617de9994efa8a199aefd493aa46dd0d Netns:none IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=scratch-dep-1585851119-h6jp0;K8S_POD_INFRA_CONTAINER_ID=c2eafaf460cd3623e7f1974df0aaea72617de9994efa8a199aefd493aa46dd0d Path:/opt/wincni.exe/bin;c:\\k\\cni}.","time":"2017-12-19T16:41:50Z"}
{"level":"error","msg":"[cni-net] Failed to parse network configuration, err:invalid IP address: /subscriptions/xxxx/resourceGroups/RG-Sandbox1-USEast/providers/Microsoft.1.","time":"2017-12-19T16:41:50Z"}
~~~

I've tried this also with "networkPolicy": "azure" and I get the same result. Please let me know if you need any more information.

@jwalker343
Copy link
Author

Linux Containers run fine:

  Normal  Scheduled              23s   default-scheduler                   Successfully assigned scratch-dep-1136686056-xjpdv to k8s-linuxpool1-40233731-0
  Normal  SuccessfulMountVolume  23s   kubelet, k8s-linuxpool1-40233731-0  MountVolume.SetUp succeeded for volume "default-token-14lt1"
  Normal  Pulling                22s   kubelet, k8s-linuxpool1-40233731-0  pulling image "nginx"
  Normal  Pulled                 18s   kubelet, k8s-linuxpool1-40233731-0  Successfully pulled image "nginx"
  Normal  Created                17s   kubelet, k8s-linuxpool1-40233731-0  Created container
  Normal  Started                17s   kubelet, k8s-linuxpool1-40233731-0  Started container

@jwalker343 jwalker343 changed the title Hybrid Cluster with Custom Vnet Doesn't seem to work. Hybrid Cluster with Custom Vnet Doesn't seem to work with acs-engine v0.11 Dec 20, 2017
@jwalker343
Copy link
Author

I regressed and used acs-engine version v0.9.1 and I'm at least able to run pods

./acs-engine version
Version: v0.9.1
GitCommit: f9d0e574
GitTreeState: clean

@chweidling
Copy link

I tested the custom VNET for Windows nodes with the snapshot 12d7fc5 from 2018-01-09. I patched the generated azuredeploy.json file, so that it contains a variable subnet with the value of the subnet range of my custom VNET like this:

"subnet": "10.1.0.0/16"

The deployment worked and the Windows nodes were successfully created. I could deploy Windows containers. But inside the Windows pods, there was a problem with DNS: I could not resolve domain names, that is, I could not reach services inside my cluster.

@jwalker343
Copy link
Author

@chweidling I have not rebuilt a cluster using the latest snapshot, however according to #558 (comment) you may just have to wait a little while and DNS may resolve itself?

@chweidling
Copy link

The problem does not disapper even after one hour waiting.

@pushkar-bitwise
Copy link

Hi @jwalker343 , we are facing similar issue, are you able to find root cause or solution for same.

@SachinL9
Copy link

SachinL9 commented Oct 3, 2018

I am facing problems deploying a hybrid cluster in custom vnet.

Error: The template parameter 'masterSubnet' is not found.
acs-engine version: v0.21.2
k8s version: 1.11

Any idea when support for "Hybrid Cluster with Custom Vnet" will be added?

@jwalker343
Copy link
Author

I was able to successfully deploy a Hybrid cluster in a custom vnet with 0.25.3. This is an old issue and is almost 1year old, so I'm marking it closed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants