Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod stuck in ContainerCreating state with Failed to find the master interface warning. #1945

Closed
Tracked by #3611
nawazkh opened this issue May 4, 2023 · 12 comments
Closed
Tracked by #3611
Assignees
Labels
cni Related to CNI.

Comments

@nawazkh
Copy link
Contributor

nawazkh commented May 4, 2023

What happened:

  • A test pod named webaaw2ci-84cc489f6f-d22gj is stuck on ContainerCreating status in a self managed cluster.
  • This self managed cluster has
    • two worker nodes
      • each worker node has two NICs configured attached to them
    • three control-plane nodes
      • each control-plane node has one NIC attached to it.
  • Upon describing the pod webaaw2ci-84cc489f6f-d22gj, we see the below events in its description. (Posting the whole description of the pod to point that IP was also not allocated to the pod.)
Name:             webaaw2ci-84cc489f6f-d22gj
Namespace:        default
Priority:         0
Service Account:  default
Node:             capz-e2e-v80cek-azcni-v1-md-0-7rvl9/10.1.0.4
Start Time:       Tue, 02 May 2023 22:46:58 -0700
Labels:           app=webaaw2ci
                  pod-template-hash=84cc489f6f
Annotations:      <none>
Status:           Pending
IP:
IPs:              <none>
Controlled By:    ReplicaSet/webaaw2ci-84cc489f6f
Containers:
  webaaw2ci:
    Container ID:
    Image:          httpd
    Image ID:
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:        10m
      memory:     10M
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5f799 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-5f799:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                   From               Message
  ----     ------                  ----                  ----               -------
  Normal   Scheduled               13m                   default-scheduler  Successfully assigned default/webaaw2ci-84cc489f6f-d22gj to capz-e2e-v80cek-azcni-v1-md-0-7rvl9
  Warning  FailedCreatePodSandBox  13m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4359727928b178b616b93b6c695df7abe09ae29bbf8522bc22f1e82dbe64dc85": plugin type="azure-vnet" failed (add): Failed to find the master interface
  Warning  FailedCreatePodSandBox  13m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "07fdad5036775927ff0aff9d29f0fb8789c6a3597f93bff1eae78093371233fb": plugin type="azure-vnet" failed (add): Failed to find the master interface
  Warning  FailedCreatePodSandBox  13m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b45a16a4f4e7ffbf1bdec051b86db4f623af620921181b23050c3bec1dcbd82c": plugin type="azure-vnet" failed (add): Failed to find the master interface
  Warning  FailedCreatePodSandBox  13m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "c4acc04c85fd3816d7371d76e7d333553f1f5064114a63a49f5073e5a0b619c3": plugin type="azure-vnet" failed (add): Failed to find the master interface
  Warning  FailedCreatePodSandBox  12m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "84ea88ad965e1e9ead092471c5f8f1e85c2431ba93b0d5c48c488120b2da025b": plugin type="azure-vnet" failed (add): Failed to find the master interface
  Warning  FailedCreatePodSandBox  12m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "90206cdbad6ec1e4a61055186e45d0744ee1edbe78f6d63d108aeb318052ae27": plugin type="azure-vnet" failed (add): Failed to find the master interface
  Warning  FailedCreatePodSandBox  12m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "874b3ff7f354d00231f3e7fe51f04356571ab958f7f3ea4f59ec81b198c1f3fd": plugin type="azure-vnet" failed (add): Failed to find the master interface
  Warning  FailedCreatePodSandBox  12m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "6b2242b077117b4c9cd10a200a48921cd13818a349726eea2be0afdce56aa62e": plugin type="azure-vnet" failed (add): Failed to find the master interface
  Warning  FailedCreatePodSandBox  11m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "63d772fe8b52a638478c50d2843dee6fc4973968a54f15dc12eda7a75b1c7ac7": plugin type="azure-vnet" failed (add): Failed to find the master interface
  Warning  FailedCreatePodSandBox  3m32s (x39 over 11m)  kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "03705bd3bb5929a670a114d2acf293211c5d4ba1b2ef06d8bda5381ee708e4a3": plugin type="azure-vnet" failed (add): Failed to find the master interface

What you expected to happen:

  • for pod webaaw2ci-84cc489f6f-d22gj to get scheduled and get into Running state.

How to reproduce it:

  • Checkout the PR add azure cni e2e tests with one NIC per linux node kubernetes-sigs/cluster-api-provider-azure#3508
  • In templates/flavors/azure-cni-v1/patches/azure-machine-template.yaml update each of the node-subnet-x's privateIPConfig value to 50.
  • Run the test locally using the command GINKGO_FOCUS="Azure CNI v1" LOCAL_ONLY=true SKIP_LOG_COLLECTION=true SKIP_CLEANUP=true ./scripts/ci-e2e.sh
  • You will observe that the workload cluster brought up by test has a pod named webaaw2ci-xxxxx stuck in ContainerCreating state.
    • Note that it will take around 10 mins to progress through the test and to reach this stuck state.

Orchestrator and Version (e.g. Kubernetes, Docker):

  • Kubernetes : v1.25.9

Kernel (e.g. uanme -a for Linux or $(Get-ItemProperty -Path "C:\windows\system32\hal.dll").VersionInfo.FileVersion for Windows):

Anything else we need to know?:
[Miscellaneous information that will assist in solving the issue.]

@rbtr
Copy link
Contributor

rbtr commented May 5, 2023

@tamilmani1989 do you recognize this error? This is CNIv1.

@nawazkh
Copy link
Contributor Author

nawazkh commented May 16, 2023

@rbtr @tamilmani1989 Is there anything I could do to better probe into this challenge?

@tamilmani1989
Copy link
Member

@nawazkh apologies for delay. somehow I didn't get notification. The error looks like cni couldn't find master interface(eth0) based on the subnet prefix returned from azure-vnet-ipam. can you share all files starting with azure-vnet prefix

/var/log/azure-vnet*
/var/run/azure-vnet*

@nawazkh
Copy link
Contributor Author

nawazkh commented May 19, 2023

Shared access to the above logs via Bastion.

@tamilmani1989 tamilmani1989 assigned vipul-21 and unassigned behzad-mir May 24, 2023
@behzad-mir
Copy link
Contributor

out-azure-vnet.log
out-ipam.log

Looking at the log at for example 23:08:21 (line 19077 on the azure-vnet log) the azure-vnet did not get a master interface from ipam return call. However, IPAM log shows that it actually returns an Interface (eth1) at that timestamp (line 22652 on ipam log)

@vipul-21
Copy link
Contributor

vipul-21 commented May 26, 2023

Checked the log. One thing I noticed in the azure-vnet.log is that eth1 does not have the IP address assigned.

2023/05/18 21:49:47 [4551] [net] Network interface: {Index:3 MTU:1500 Name:eth1 HardwareAddr:60:45:bd:ad:e4:6d Flags:broadcast|multicast} with IP: []

The IPAM results an IP address 10.2.0.99 and mask ffff0000 [From the logs]. To find the master interface we loop through all the interfaces and find the first interface with IP address in that subnet[ 10.2.0.0/16]. here. But since there is no interface with the IP address from that subnet, the master interface is not found.( the error we see)

All the interfaces from the interfaces i can see in the logs:

2023/05/18 21:49:47 [4551] [net] Network interface: {Index:1 MTU:65536 Name:lo HardwareAddr: Flags:up|loopback} with IP: [127.0.0.1/8 ::1/128]
2023/05/18 21:49:47 [4551] [net] Network interface: {Index:2 MTU:1500 Name:eth0 HardwareAddr:60:45:bd:ad:ef:5e Flags:up|broadcast|multicast} with IP: [10.1.0.54/16 fe80::6245:bdff:fead:ef5e/64]
2023/05/18 21:49:47 [4551] [net] Network interface: {Index:3 MTU:1500 Name:eth1 HardwareAddr:60:45:bd:ad:e4:6d Flags:broadcast|multicast} with IP: []

@nawazkh eth1 interface is not expected to have ip here? Also you can you please run the cmd mentioned below and share the output with us.

sudo curl -X GET "http://168.63.129.16/machine/plugins?comp=nmagent&type=getinterfaceinfov1"

@nawazkh
Copy link
Contributor Author

nawazkh commented Jun 26, 2023

sudo curl -X GET "http://168.63.129.16/machine/plugins?comp=nmagent&type=getinterfaceinfov1"

Got the following output

<Interfaces><Interface MacAddress="002248254DFD" IsPrimary="true"><IPSubnet Prefix="10.1.0.0/16"><IPAddress Address="10.1.0.4" IsPrimary="true"/><IPAddress Address="10.1.0.5" IsPrimary="false"/><IPAddress Address="10.1.0.6" IsPrimary="false"/><IPAddress Address="10.1.0.7" IsPrimary="false"/><IPAddress Address="10.1.0.8" IsPrimary="false"/><IPAddress Address="10.1.0.9" IsPrimary="false"/><IPAddress Address="10.1.0.10" IsPrimary="false"/><IPAddress Address="10.1.0.11" IsPrimary="false"/><IPAddress Address="10.1.0.12" IsPrimary="false"/><IPAddress Address="10.1.0.13" IsPrimary="false"/><IPAddress Address="10.1.0.14" IsPrimary="false"/><IPAddress Address="10.1.0.15" IsPrimary="false"/><IPAddress Address="10.1.0.16" IsPrimary="false"/><IPAddress Address="10.1.0.17" IsPrimary="false"/><IPAddress Address="10.1.0.18" IsPrimary="false"/><IPAddress Address="10.1.0.19" IsPrimary="false"/><IPAddress Address="10.1.0.20" IsPrimary="false"/><IPAddress Address="10.1.0.21" IsPrimary="false"/><IPAddress Address="10.1.0.22" IsPrimary="false"/><IPAddress Address="10.1.0.23" IsPrimary="false"/><IPAddress Address="10.1.0.24" IsPrimary="false"/><IPAddress Address="10.1.0.25" IsPrimary="false"/><IPAddress Address="10.1.0.26" IsPrimary="false"/><IPAddress Address="10.1.0.27" IsPrimary="false"/><IPAddress Address="10.1.0.28" IsPrimary="false"/><IPAddress Address="10.1.0.29" IsPrimary="false"/><IPAddress Address="10.1.0.30" IsPrimary="false"/><IPAddress Address="10.1.0.31" IsPrimary="false"/><IPAddress Address="10.1.0.32" IsPrimary="false"/><IPAddress Address="10.1.0.33" IsPrimary="false"/><IPAddress Address="10.1.0.34" IsPrimary="false"/><IPAddress Address="10.1.0.35" IsPrimary="false"/><IPAddress Address="10.1.0.36" IsPrimary="false"/><IPAddress Address="10.1.0.37" IsPrimary="false"/><IPAddress Address="10.1.0.38" IsPrimary="false"/><IPAddress Address="10.1.0.39" IsPrimary="false"/><IPAddress Address="10.1.0.40" IsPrimary="false"/><IPAddress Address="10.1.0.41" IsPrimary="false"/><IPAddress Address="10.1.0.42" IsPrimary="false"/><IPAddress Address="10.1.0.43" IsPrimary="false"/><IPAddress Address="10.1.0.44" IsPrimary="false"/><IPAddress Address="10.1.0.45" IsPrimary="false"/><IPAddress Address="10.1.0.46" IsPrimary="false"/><IPAddress Address="10.1.0.47" IsPrimary="false"/><IPAddress Address="10.1.0.48" IsPrimary="false"/><IPAddress Address="10.1.0.49" IsPrimary="false"/><IPAddress Address="10.1.0.50" IsPrimary="false"/><IPAddress Address="10.1.0.51" IsPrimary="false"/><IPAddress Address="10.1.0.52" IsPrimary="false"/><IPAddress Address="10.1.0.53" IsPrimary="false"/></IPSubnet></Interface><Interface MacAddress="00224825486B" IsPrimary="false"><IPSubnet Prefix="10.2.0.0/16"><IPAddress Address="10.2.0.4" IsPrimary="true"/><IPAddress Address="10.2.0.5" IsPrimary="false"/><IPAddress Address="10.2.0.6" IsPrimary="false"/><IPAddress Address="10.2.0.7" IsPrimary="false"/><IPAddress Address="10.2.0.8" IsPrimary="false"/><IPAddress Address="10.2.0.9" IsPrimary="false"/><IPAddress Address="10.2.0.10" IsPrimary="false"/><IPAddress Address="10.2.0.11" IsPrimary="false"/><IPAddress Address="10.2.0.12" IsPrimary="false"/><IPAddress Address="10.2.0.13" IsPrimary="false"/><IPAddress Address="10.2.0.14" IsPrimary="false"/><IPAddress Address="10.2.0.15" IsPrimary="false"/><IPAddress Address="10.2.0.16" IsPrimary="false"/><IPAddress Address="10.2.0.17" IsPrimary="false"/><IPAddress Address="10.2.0.18" IsPrimary="false"/><IPAddress Address="10.2.0.19" IsPrimary="false"/><IPAddress Address="10.2.0.20" IsPrimary="false"/><IPAddress Address="10.2.0.21" IsPrimary="false"/><IPAddress Address="10.2.0.22" IsPrimary="false"/><IPAddress Address="10.2.0.23" IsPrimary="false"/><IPAddress Address="10.2.0.24" IsPrimary="false"/><IPAddress Address="10.2.0.25" IsPrimary="false"/><IPAddress Address="10.2.0.26" IsPrimary="false"/><IPAddress Address="10.2.0.27" IsPrimary="false"/><IPAddress Address="10.2.0.28" IsPrimary="false"/><IPAddress Address="10.2.0.29" IsPrimary="false"/><IPAddress Address="10.2.0.30" IsPrimary="false"/><IPAddress Address="10.2.0.31" IsPrimary="false"/><IPAddress Address="10.2.0.32" IsPrimary="false"/><IPAddress Address="10.2.0.33" IsPrimary="false"/><IPAddress Address="10.2.0.34" IsPrimary="false"/><IPAddress Address="10.2.0.35" IsPrimary="false"/><IPAddress Address="10.2.0.36" IsPrimary="false"/><IPAddress Address="10.2.0.37" IsPrimary="false"/><IPAddress Address="10.2.0.38" IsPrimary="false"/><IPAddress Address="10.2.0.39" IsPrimary="false"/><IPAddress Address="10.2.0.40" IsPrimary="false"/><IPAddress Address="10.2.0.41" IsPrimary="false"/><IPAddress Address="10.2.0.42" IsPrimary="false"/><IPAddress Address="10.2.0.43" IsPrimary="false"/><IPAddress Address="10.2.0.44" IsPrimary="false"/><IPAddress Address="10.2.0.45" IsPrimary="false"/><IPAddress Address="10.2.0.46" IsPrimary="false"/><IPAddress Address="10.2.0.47" IsPrimary="false"/><IPAddress Address="10.2.0.48" IsPrimary="false"/><IPAddress Address="10.2.0.49" IsPrimary="false"/><IPAddress Address="10.2.0.50" IsPrimary="false"/><IPAddress Address="10.2.0.51" IsPrimary="false"/><IPAddress Address="10.2.0.52" IsPrimary="false"/><IPAddress Address="10.2.0.53" IsPrimary="false"/></IPSubnet></Interface></Interfaces>

@vipul-21
Copy link
Contributor

@nawazkh probably you have to follow-up NMAgent on this. Looks like eth1 IP is not configured in VM which is why CNI failing. eth1 IP is expected to be configured through dhcp by nmagent

@tamilmani1989
Copy link
Member

@nawazkh can we close this if you don't have anything else?

@nawazkh
Copy link
Contributor Author

nawazkh commented Jul 20, 2023

@tamilmani1989 I did not get a chance to follow up with the NMAgent on this. Can I keep this open since it is still a valid issue from CAPZ's perspective? Or can you point me to the repo to transfer the issue to?

@tamilmani1989
Copy link
Member

tamilmani1989 commented Jul 29, 2023

Please open separate support ticket with them via microsoft icm channel

@nawazkh
Copy link
Contributor Author

nawazkh commented Jul 31, 2023

Makes sense, closing this issue for now :)

@nawazkh nawazkh closed this as not planned Won't fix, can't repro, duplicate, stale Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cni Related to CNI.
Projects
None yet
Development

No branches or pull requests

5 participants