Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sonobuoy I/O timeout issues when using ipv6 address in an air-gapped environment #1136

Closed
johnray21216 opened this issue Jul 1, 2020 · 25 comments

Comments

@johnray21216
Copy link

johnray21216 commented Jul 1, 2020

What steps did you take and what happened:
I have an air-gapped Kubernetes setup remotely that is using ipv6. The following architecture is as follows:

The images to be used is in a private registry virtual machine environment. The kubernetes setup is a simple Single-Master with 2 worker nodes. The setup was able to be configured to use ipv6. When trying to run sonobuoy from the private registry, the application runs successfully but returns an i/o timeout error in the logs.

What did you expect to happen:
Sonobuoy will pull the images from the private registry and run successfully.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

[2020-06-29 12:15:53.498] [root@opm bin]# sonobuoy logs --kubeconfig $HOME/bin/a-ipv6-k8s_config [2020-06-29 12:15:56.163] namespace="sonobuoy" pod="sonobuoy" container="kube-sonobuoy" [2020-06-29 12:15:56.163] time="2020-06-29T04:14:37Z" level=info msg="Scanning plugins in ./plugins.d (pwd: /)" [2020-06-29 12:15:56.163] time="2020-06-29T04:14:37Z" level=info msg="Scanning plugins in /etc/sonobuoy/plugins.d (pwd: /)" [2020-06-29 12:15:56.163] time="2020-06-29T04:14:37Z" level=info msg="Directory (/etc/sonobuoy/plugins.d) does not exist" [2020-06-29 12:15:56.163] time="2020-06-29T04:14:37Z" level=info msg="Scanning plugins in ~/sonobuoy/plugins.d (pwd: /)" [2020-06-29 12:15:56.163] time="2020-06-29T04:14:37Z" level=info msg="Directory (~/sonobuoy/plugins.d) does not exist" [2020-06-29 12:15:56.163] time="2020-06-29T04:15:07Z" level=error msg="could not get api group resources: Get https://[<ipv6_address>]:443/api?timeout=32s: dial tcp [<ipv6_address>]:443: i/o timeout" [2020-06-29 12:15:56.163] time="2020-06-29T04:15:07Z" level=info msg="no-exit was specified, sonobuoy is now blocking"

Environment:

  • Sonobuoy version: 0.17.2 and 0.18
  • Kubernetes version: (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:23:11Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:13:49Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

We've already created an issue before which can be found here. Can someone help me on this one? Thank you.

@tomikonio
Copy link

tomikonio commented Jul 1, 2020

Hi, I have an ipv6 cluster, and had the same problem.
Are the sonobuoy images already present on the cluster nodes when you execute the run command?
( According to your logs - the images are already present).
If they are - this is what causes the issue for me.
I've talked in slack with @zubron and she suggested to use a newer image of sonobuoy that was not present on the machine, it worked.
When I tried it again, I got the error again...
When I deleted all sonobuoy images from all the nodes and ran the regular command sonobuoy run, it pulled the images and it worked !

Tldr, sonobuoy cli seems to mess something up when the docker images are already present on the nodes, otherwise - it works.

@johnray21216
Copy link
Author

Hi, I have an ipv6 cluster, and had the same problem.
Are the sonobuoy images already present on the cluster nodes when you execute the run command?
( According to your logs - the images are already present).
If they are - this is what causes the issue for me.
I've talked in slack with @zubron and she suggested to use a newer image of sonobuoy that was not present on the machine, it worked.
When I tried it again, I got the error again...
When I deleted all sonobuoy images from all the nodes and ran the regular command sonobuoy run, it pulled the images and it worked !

Tldr, sonobuoy cli seems to mess something up when the docker images are already present on the nodes, otherwise - it works.

Hey @tomikonio! Thanks for the response. We've tried that suggestion of yours and it worked. When we ran it the second time, the i/o timeout issue reappeared.

I wonder where in sonobuoy is the issue occuring. Do you have any idea as to where? We don't want to constantly remove and reload the images when we try to use sonobuoy.

@zubron, do you have inputs regarding this? I would love hear from you.

@zubron
Copy link
Contributor

zubron commented Jul 2, 2020

This is very strange behaviour indeed. The timeout error is occurring when the Sonobuoy pod is attempting to create a Kubernetes client (using kubernetes/client-go). This is required as the client is then used to communicate with the API server to perform actions on the cluster such as starting the plugin pods, updating annotations on the main Sonobuoy pod, etc. The error seems to happen at the point when it first tries to connect to the API server to get the API group resources.

Given that this error is occurring from a pod already running on the cluster when the image has already been pulled, I'm struggling to understand how it could be related to the image being pulled in the first place. The Sonobuoy CLI generates a Kubernetes manifest which it deploys on the cluster. You can inspect that manifest by looking at the output from sonobuoy gen. The CLI is not involved in how the images are pulled onto the cluster, it just specifies which images should be pulled in the pod definitions.

Would either of you mind sharing a bit more about your environments? I haven't done any testing on IPv6 and I would like to set up an environment to investigate this issue more. It would help to replicate it as closely as possible.

@tomikonio - Is your environment also an air-gapped environment?

@johnray21216
Copy link
Author

johnray21216 commented Jul 2, 2020

Would either of you mind sharing a bit more about your environments? I haven't done any testing on IPv6 and I would like to set up an environment to investigate this issue more. It would help to replicate it as closely as possible.

Thanks for the reply @zubron! My current environment is a simple air-gapped three node cluster (1 master , 2 workers) with a private registry virtual machine in which the images are stored. The files are pulled from an environment with access to the internet and is saved and transferred to the private repository.

The setup is IPV6. Here is the log in which we followed @tomikonio's suggestion to delete the existing images and reuploaded them into the private repository.

[root@abk-opm ~]# sonobuoy logs --kubeconfig $HOME/bin/a-ipv6-k8s_config --debug
namespace="sonobuoy" pod="sonobuoy" container="kube-sonobuoy"
time="2020-07-02T07:47:50Z" level=info msg="Scanning plugins in ./plugins.d (pwd: /)"
time="2020-07-02T07:47:50Z" level=info msg="Scanning plugins in /etc/sonobuoy/plugins.d (pwd: /)"
time="2020-07-02T07:47:50Z" level=info msg="Directory (/etc/sonobuoy/plugins.d) does not exist"
time="2020-07-02T07:47:50Z" level=info msg="Scanning plugins in ~/sonobuoy/plugins.d (pwd: /)"
time="2020-07-02T07:47:50Z" level=info msg="Directory (~/sonobuoy/plugins.d) does not exist"
time="2020-07-02T07:47:51Z" level=info msg="Filtering namespaces based on the following regex:.*"
time="2020-07-02T07:47:51Z" level=info msg="Namespace default Matched=true"
time="2020-07-02T07:47:51Z" level=info msg="Namespace kube-node-lease Matched=true"
time="2020-07-02T07:47:51Z" level=info msg="Namespace kube-public Matched=true"
time="2020-07-02T07:47:51Z" level=info msg="Namespace kube-system Matched=true"
time="2020-07-02T07:47:51Z" level=info msg="Namespace sonobuoy Matched=true"
time="2020-07-02T07:47:51Z" level=info msg="Starting server Expected Results: [{global e2e}]"
time="2020-07-02T07:47:51Z" level=info msg="Starting annotation update routine"
time="2020-07-02T07:47:51Z" level=info msg="Starting aggregation server" address=0.0.0.0 port=8080
time="2020-07-02T07:47:51Z" level=info msg="Running plugin" plugin=e2e
time="2020-07-02T07:48:36Z" level=info msg="received aggregator request" client_cert=e2e plugin_name=e2e
time="2020-07-02T07:48:36Z" level=info msg="Last update to annotations on exit"
time="2020-07-02T07:48:36Z" level=info msg="Shutting down aggregation server"
time="2020-07-02T07:48:36Z" level=info msg="csinodes not specified in non-nil Resources. Skipping csinodes query."
time="2020-07-02T07:48:36Z" level=info msg="csidrivers not specified in non-nil Resources. Skipping csidrivers query."
time="2020-07-02T07:48:36Z" level=info msg="blockaffinities not specified in non-nil Resources. Skipping blockaffinities query."
time="2020-07-02T07:48:36Z" level=info msg="kubecontrollersconfigurations not specified in non-nil Resources. Skipping kubecontrollersconfigurations query."
time="2020-07-02T07:48:36Z" level=info msg="bgppeers not specified in non-nil Resources. Skipping bgppeers query."
time="2020-07-02T07:48:36Z" level=info msg="ipamblocks not specified in non-nil Resources. Skipping ipamblocks query."
time="2020-07-02T07:48:36Z" level=info msg="hostendpoints not specified in non-nil Resources. Skipping hostendpoints query."
time="2020-07-02T07:48:36Z" level=info msg="clusterinformations not specified in non-nil Resources. Skipping clusterinformations query."
time="2020-07-02T07:48:36Z" level=info msg="ippools not specified in non-nil Resources. Skipping ippools query."
time="2020-07-02T07:48:36Z" level=info msg="ipamhandles not specified in non-nil Resources. Skipping ipamhandles query."
time="2020-07-02T07:48:36Z" level=info msg="bgpconfigurations not specified in non-nil Resources. Skipping bgpconfigurations query."
time="2020-07-02T07:48:36Z" level=info msg="ipamconfigs not specified in non-nil Resources. Skipping ipamconfigs query."
time="2020-07-02T07:48:36Z" level=info msg="globalnetworkpolicies not specified in non-nil Resources. Skipping globalnetworkpolicies query."
time="2020-07-02T07:48:36Z" level=info msg="globalnetworksets not specified in non-nil Resources. Skipping globalnetworksets query."
time="2020-07-02T07:48:36Z" level=info msg="felixconfigurations not specified in non-nil Resources. Skipping felixconfigurations query."
time="2020-07-02T07:48:36Z" level=info msg="runtimeclasses not specified in non-nil Resources. Skipping runtimeclasses query."
time="2020-07-02T07:48:36Z" level=info msg="secrets not specified in non-nil Resources. Skipping secrets query."
time="2020-07-02T07:48:36Z" level=info msg="events not specified in non-nil Resources. Skipping events query."
time="2020-07-02T07:48:36Z" level=info msg="horizontalpodautoscalers not specified in non-nil Resources. Skipping horizontalpodautoscalers query."
time="2020-07-02T07:48:36Z" level=info msg="networksets not specified in non-nil Resources. Skipping networksets query."
time="2020-07-02T07:48:36Z" level=info msg="Collecting Node Configuration and Health..."
time="2020-07-02T07:48:36Z" level=info msg="Creating host results for a-ipv6-k8s-ms1 under /tmp/sonobuoy/76d6610f-3654-402a-b974-58e444cda87d/hosts/a-ipv6-k8s-ms1\n"
time="2020-07-02T07:48:36Z" level=info msg="Creating host results for a-ipv6-k8s-wk1 under /tmp/sonobuoy/76d6610f-3654-402a-b974-58e444cda87d/hosts/a-ipv6-k8s-wk1\n"
time="2020-07-02T07:48:36Z" level=info msg="Creating host results for a-ipv6-k8s-wk2 under /tmp/sonobuoy/76d6610f-3654-402a-b974-58e444cda87d/hosts/a-ipv6-k8s-wk2\n"
time="2020-07-02T07:48:36Z" level=info msg="Running cluster queries"
time="2020-07-02T07:48:36Z" level=info msg="Running ns query (default)"
time="2020-07-02T07:48:36Z" level=info msg="Running ns query (kube-node-lease)"
time="2020-07-02T07:48:37Z" level=info msg="Running ns query (kube-public)"
time="2020-07-02T07:48:37Z" level=info msg="Running ns query (kube-system)"
time="2020-07-02T07:48:38Z" level=info msg="Running ns query (sonobuoy)"
time="2020-07-02T07:48:39Z" level=info msg="Namespace default Matched=false"
time="2020-07-02T07:48:39Z" level=info msg="Namespace kube-node-lease Matched=false"
time="2020-07-02T07:48:39Z" level=info msg="Namespace kube-public Matched=false"
time="2020-07-02T07:48:39Z" level=info msg="Namespace kube-system Matched=false"
time="2020-07-02T07:48:39Z" level=info msg="Namespace sonobuoy Matched=true"
time="2020-07-02T07:48:39Z" level=info msg="Collecting Pod Logs by namespace (sonobuoy)"
time="2020-07-02T07:48:39Z" level=info msg="Collecting Pod Logs by FieldSelectors []"
time="2020-07-02T07:48:39Z" level=info msg="Log lines after this point will not appear in the downloaded tarball."
time="2020-07-02T07:48:39Z" level=info msg="Results available at /tmp/sonobuoy/202007020747_sonobuoy_76d6610f-3654-402a-b974-58e444cda87d.tar.gz"
time="2020-07-02T07:48:39Z" level=info msg="no-exit was specified, sonobuoy is now blocking"

[root@abk-opm ~]# sonobuoy status --kubeconfig $HOME/bin/a-ipv6-k8s_config --debug
   PLUGIN     STATUS   RESULT   COUNT
      e2e   complete   passed       1

Sonobuoy has completed. Use `sonobuoy retrieve` to get results.
--

Here is the sample e2e test result.

[root@abk-opm ~]# sonobuoy retrieve --kubeconfig $HOME/bin/a-ipv6-k8s_config
202007020747_sonobuoy_76d6610f-3654-402a-b974-58e444cda87d.tar.gz
[root@abk-opm ~]# results=$(sonobuoy retrieve --kubeconfig $HOME/bin/a-ipv6-k8s_config)
[root@abk-opm ~]# mkdir ./results; tar xzf $results -C ./results
[root@abk-opm ~]# less /root/results/plugins/e2e/results/global/e2e.log

I0702 07:48:16.882247      24 test_context.go:414] Using a temporary kubeconfig file from in-cluster config : /tmp/kubeconfig-735256049
I0702 07:48:16.882497      24 e2e.go:92] Starting e2e run "ffa61e7d-49e9-430c-acf7-99316951c2a1" on Ginkgo node 1
Running Suite: Kubernetes e2e suite
===================================
Random Seed: 1593676095 - Will randomize all specs
Will run 1 of 4732 specs

Jul  2 07:48:16.894: INFO: >>> kubeConfig: /tmp/kubeconfig-735256049
Jul  2 07:48:16.898: INFO: Waiting up to 30m0s for all (but 0) nodes to be schedulable
Jul  2 07:48:16.914: INFO: Waiting up to 10m0s for all pods (need at least 0) in namespace 'kube-system' to be running and ready
Jul  2 07:48:16.944: INFO: 13 / 13 pods in namespace 'kube-system' are running and ready (0 seconds elapsed)
Jul  2 07:48:16.944: INFO: expected 3 pod replicas in namespace 'kube-system', 3 are Running and Ready.
Jul  2 07:48:16.944: INFO: Waiting up to 5m0s for all daemonsets in namespace 'kube-system' to start
Jul  2 07:48:16.951: INFO: 3 / 3 pods ready in namespace 'kube-system' in daemonset 'calico-node' (0 seconds elapsed)
Jul  2 07:48:16.951: INFO: 3 / 3 pods ready in namespace 'kube-system' in daemonset 'kube-proxy' (0 seconds elapsed)
Jul  2 07:48:16.951: INFO: e2e test version: v1.16.3
Jul  2 07:48:16.953: INFO: kube-apiserver version: v1.16.3
Jul  2 07:48:16.953: INFO: >>> kubeConfig: /tmp/kubeconfig-735256049
Jul  2 07:48:16.958: INFO: Cluster IP family: ipv6
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
------------------------------
[k8s.io] Pods
  should be submitted and removed [NodeConformance] [Conformance]
  /workspace/anago-v1.16.3-beta.0.56+b3cbbae08ec52a/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:698
[BeforeEach] [k8s.io] Pods
  /workspace/anago-v1.16.3-beta.0.56+b3cbbae08ec52a/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:151
STEP: Creating a kubernetes client
Jul  2 07:48:17.004: INFO: >>> kubeConfig: /tmp/kubeconfig-735256049
STEP: Building a namespace api object, basename pods
Jul  2 07:48:17.083: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [k8s.io] Pods
  /workspace/anago-v1.16.3-beta.0.56+b3cbbae08ec52a/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/common/pods.go:165
[It] should be submitted and removed [NodeConformance] [Conformance]
  /workspace/anago-v1.16.3-beta.0.56+b3cbbae08ec52a/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:698
STEP: creating the pod
STEP: setting up watch
STEP: submitting the pod to kubernetes
Jul  2 07:48:17.089: INFO: observed the pod list
STEP: verifying the pod is in kubernetes
STEP: verifying pod creation was observed
STEP: deleting the pod gracefully
STEP: verifying the kubelet observed the termination notice
STEP: verifying pod deletion was observed
[AfterEach] [k8s.io] Pods
  /workspace/anago-v1.16.3-beta.0.56+b3cbbae08ec52a/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:152
Jul  2 07:48:30.205: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "pods-8436" for this suite.
Jul  2 07:48:36.219: INFO: Waiting up to 30s for server preferred namespaced resources to be successfully discovered
Jul  2 07:48:36.282: INFO: namespace pods-8436 deletion completed in 6.074236511s

~ [SLOW TEST:19.277 seconds]
[k8s.io] Pods
/workspace/anago-v1.16.3-beta.0.56+b3cbbae08ec52a/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:693
  should be submitted and removed [NodeConformance] [Conformance]
  /workspace/anago-v1.16.3-beta.0.56+b3cbbae08ec52a/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:698
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSJul  2 07:48:36.283: INFO: Running AfterSuite actions on all nodes
Jul  2 07:48:36.283: INFO: Running AfterSuite actions on node 1
Jul  2 07:48:36.283: INFO: Skipping dumping logs from cluster

Ran 1 of 4732 Specs in 19.393 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 4731 Skipped
PASS

Ginkgo ran 1 suite in 21.014974956s
Test Suite Passed


[Master-node]

[root@a-ipv6-k8s-ms1 ~]# kubectl get pods -A -o wide
NAMESPACE     NAME                                       READY   STATUS              RESTARTS   AGE   IP                          NODE             NOMINATED NODE   READINESS GATES
kube-system   calico-kube-controllers-77d6cbc65f-nhtz8   1/1     Running             4          8d     <redacted_ipv6_address>  a-ipv6-k8s-ms1   <none>           <none>
kube-system   calico-node-85ldm                          1/1     Running             110        8d       <redacted_ipv6_address>       a-ipv6-k8s-wk1   <none>           <none>
kube-system   calico-node-jbc8g                          1/1     Running             3          8d      <redacted_ipv6_address>        a-ipv6-k8s-wk2   <none>           <none>
kube-system   calico-node-kjj6g                          1/1     Running             2          8d     <redacted_ipv6_address>         a-ipv6-k8s-ms1   <none>           <none>
kube-system   coredns-5644d7b6d9-g2hf5                   1/1     Running             2          8d   <redacted_ipv6_address>   a-ipv6-k8s-ms1   <none>           <none>
kube-system   coredns-5644d7b6d9-kh62b                   1/1     Running             2          8d   <redacted_ipv6_address>    a-ipv6-k8s-ms1   <none>           <none>
kube-system   etcd-a-ipv6-k8s-ms1                        1/1     Running             2          8d      <redacted_ipv6_address>        a-ipv6-k8s-ms1   <none>           <none>
kube-system   kube-apiserver-a-ipv6-k8s-ms1              1/1     Running             2          8d      <redacted_ipv6_address>       a-ipv6-k8s-ms1   <none>           <none>
kube-system   kube-controller-manager-a-ipv6-k8s-ms1     1/1     Running             3          8d     <redacted_ipv6_address>         a-ipv6-k8s-ms1   <none>           <none>
kube-system   kube-proxy-ddjq7                           1/1     Running             3          8d     <redacted_ipv6_address>         a-ipv6-k8s-wk1   <none>           <none>
kube-system   kube-proxy-s7lnn                           1/1     Running             4          8d    <redacted_ipv6_address>          a-ipv6-k8s-wk2   <none>           <none>
kube-system   kube-proxy-sbj6b                           1/1     Running             2          8d     <redacted_ipv6_address>         a-ipv6-k8s-ms1   <none>           <none>
kube-system   kube-scheduler-a-ipv6-k8s-ms1              1/1     Running             3          8d     <redacted_ipv6_address>         a-ipv6-k8s-ms1   <none>           <none>
sonobuoy      sonobuoy                                   1/1     Running             0          22s   <redacted_ipv6_address>    a-ipv6-k8s-wk2   <none>           <none>
sonobuoy      sonobuoy-e2e-job-257e53d9a8224704          0/2     ContainerCreating   0          16s  <redacted_ipv6_address>  <none>                      a-ipv6-k8s-wk2   <none>           <none>
[root@a-ipv6-k8s-ms1 ~]#
[root@a-ipv6-k8s-ms1 ~]#
[root@a-ipv6-k8s-ms1 ~]# mkdir ./results; tar xzf $results -C ./results
tar (child): -C: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now

Here is the sonobuoy pod logs:

[root@a-ipv6-k8s-ms1 ~]# kubectl logs -n sonobuoy sonobuoy
time="2020-07-02T05:46:37Z" level=info msg="Scanning plugins in ./plugins.d (pwd: /)"
time="2020-07-02T05:46:37Z" level=info msg="Scanning plugins in /etc/sonobuoy/plugins.d (pwd: /)"
time="2020-07-02T05:46:37Z" level=info msg="Directory (/etc/sonobuoy/plugins.d) does not exist"
time="2020-07-02T05:46:37Z" level=info msg="Scanning plugins in ~/sonobuoy/plugins.d (pwd: /)"
time="2020-07-02T05:46:37Z" level=info msg="Directory (~/sonobuoy/plugins.d) does not exist"
time="2020-07-02T05:46:37Z" level=info msg="Filtering namespaces based on the following regex:.*"
time="2020-07-02T05:46:37Z" level=info msg="Namespace default Matched=true"
time="2020-07-02T05:46:37Z" level=info msg="Namespace kube-node-lease Matched=true"
time="2020-07-02T05:46:37Z" level=info msg="Namespace kube-public Matched=true"
time="2020-07-02T05:46:37Z" level=info msg="Namespace kube-system Matched=true"
time="2020-07-02T05:46:37Z" level=info msg="Namespace sonobuoy Matched=true"
time="2020-07-02T05:46:37Z" level=info msg="Starting server Expected Results: [{global e2e}]"
time="2020-07-02T05:46:37Z" level=info msg="Starting annotation update routine"
time="2020-07-02T05:46:37Z" level=info msg="Starting aggregation server" address=0.0.0.0 port=8080
time="2020-07-02T05:46:37Z" level=info msg="Running plugin" plugin=e2e
time="2020-07-02T05:47:19Z" level=info msg="received aggregator request" client_cert=e2e plugin_name=e2e
time="2020-07-02T05:47:19Z" level=info msg="Last update to annotations on exit"
time="2020-07-02T05:47:19Z" level=info msg="Shutting down aggregation server"
time="2020-07-02T05:47:19Z" level=info msg="csinodes not specified in non-nil Resources. Skipping csinodes query."
time="2020-07-02T05:47:19Z" level=info msg="csidrivers not specified in non-nil Resources. Skipping csidrivers query."
time="2020-07-02T05:47:19Z" level=info msg="runtimeclasses not specified in non-nil Resources. Skipping runtimeclasses query."
time="2020-07-02T05:47:19Z" level=info msg="blockaffinities not specified in non-nil Resources. Skipping blockaffinities query."
time="2020-07-02T05:47:19Z" level=info msg="ippools not specified in non-nil Resources. Skipping ippools query."
time="2020-07-02T05:47:19Z" level=info msg="bgppeers not specified in non-nil Resources. Skipping bgppeers query."
time="2020-07-02T05:47:19Z" level=info msg="kubecontrollersconfigurations not specified in non-nil Resources. Skipping kubecontrollersconfigurations query."
time="2020-07-02T05:47:19Z" level=info msg="hostendpoints not specified in non-nil Resources. Skipping hostendpoints query."
time="2020-07-02T05:47:19Z" level=info msg="bgpconfigurations not specified in non-nil Resources. Skipping bgpconfigurations query."
time="2020-07-02T05:47:19Z" level=info msg="globalnetworkpolicies not specified in non-nil Resources. Skipping globalnetworkpolicies query."
time="2020-07-02T05:47:19Z" level=info msg="felixconfigurations not specified in non-nil Resources. Skipping felixconfigurations query."
time="2020-07-02T05:47:19Z" level=info msg="ipamhandles not specified in non-nil Resources. Skipping ipamhandles query."
time="2020-07-02T05:47:19Z" level=info msg="ipamconfigs not specified in non-nil Resources. Skipping ipamconfigs query."
time="2020-07-02T05:47:19Z" level=info msg="ipamblocks not specified in non-nil Resources. Skipping ipamblocks query."
time="2020-07-02T05:47:19Z" level=info msg="globalnetworksets not specified in non-nil Resources. Skipping globalnetworksets query."
time="2020-07-02T05:47:19Z" level=info msg="clusterinformations not specified in non-nil Resources. Skipping clusterinformations query."
time="2020-07-02T05:47:19Z" level=info msg="events not specified in non-nil Resources. Skipping events query."
time="2020-07-02T05:47:19Z" level=info msg="secrets not specified in non-nil Resources. Skipping secrets query."
time="2020-07-02T05:47:19Z" level=info msg="horizontalpodautoscalers not specified in non-nil Resources. Skipping horizontalpodautoscalers query."
time="2020-07-02T05:47:19Z" level=info msg="networksets not specified in non-nil Resources. Skipping networksets query."
time="2020-07-02T05:47:19Z" level=info msg="Collecting Node Configuration and Health..."
time="2020-07-02T05:47:19Z" level=info msg="Creating host results for a-ipv6-k8s-ms1 under /tmp/sonobuoy/1d0aaae0-371b-4afa-a00a-8855945411f2/hosts/a-ipv6-k8s-ms1\n"
time="2020-07-02T05:47:19Z" level=info msg="Creating host results for a-ipv6-k8s-wk1 under /tmp/sonobuoy/1d0aaae0-371b-4afa-a00a-8855945411f2/hosts/a-ipv6-k8s-wk1\n"
time="2020-07-02T05:47:19Z" level=info msg="Creating host results for a-ipv6-k8s-wk2 under /tmp/sonobuoy/1d0aaae0-371b-4afa-a00a-8855945411f2/hosts/a-ipv6-k8s-wk2\n"
time="2020-07-02T05:47:19Z" level=info msg="Running cluster queries"
time="2020-07-02T05:47:19Z" level=info msg="Running ns query (default)"
time="2020-07-02T05:47:19Z" level=info msg="Running ns query (kube-node-lease)"
time="2020-07-02T05:47:20Z" level=info msg="Running ns query (kube-public)"
time="2020-07-02T05:47:20Z" level=info msg="Running ns query (kube-system)"
time="2020-07-02T05:47:21Z" level=info msg="Running ns query (sonobuoy)"
time="2020-07-02T05:47:22Z" level=info msg="Namespace default Matched=false"
time="2020-07-02T05:47:22Z" level=info msg="Namespace kube-node-lease Matched=false"
time="2020-07-02T05:47:22Z" level=info msg="Namespace kube-public Matched=false"
time="2020-07-02T05:47:22Z" level=info msg="Namespace kube-system Matched=false"
time="2020-07-02T05:47:22Z" level=info msg="Namespace sonobuoy Matched=true"
time="2020-07-02T05:47:22Z" level=info msg="Collecting Pod Logs by namespace (sonobuoy)"
time="2020-07-02T05:47:22Z" level=info msg="Collecting Pod Logs by FieldSelectors []"
time="2020-07-02T05:47:22Z" level=info msg="Log lines after this point will not appear in the downloaded tarball."
time="2020-07-02T05:47:22Z" level=info msg="Results available at /tmp/sonobuoy/202007020546_sonobuoy_1d0aaae0-371b-4afa-a00a-8855945411f2.tar.gz"
time="2020-07-02T05:47:22Z" level=info msg="no-exit was specified, sonobuoy is now blocking"
`
I really wonder what issue are we having in this scenario for ipv6-based clusters.

@zubron
Copy link
Contributor

zubron commented Jul 2, 2020

@johnray21216 Thanks for the logs.

How did you provision your cluster? I notice you're using Calico, and one of the pods has a high number of restarts which seems like it could be affecting the network. What versions of each of these components are you using?

@johnray21216
Copy link
Author

I am using the following components:

kubernetes: 1.16.3
calico: 3.14.1
Sonobuoy version: 0.17.2 and 0.18

@zubron
Copy link
Contributor

zubron commented Jul 2, 2020

Did you set up the cluster manually or are you using a managed offering? If managed, which provider?

@johnray21216
Copy link
Author

I managed it manually.

@johnray21216
Copy link
Author

I checked the logs of that pod with an unusual high number of restarts and then grep'ed the logs with the WARNINGS.

root@a-ipv6-k8s-ms1 ~]# kubectl logs -n kube-system calico-node-85ldm | grep WARNING 2020-07-02 06:16:30.106 [WARNING][50] felix/active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.sonobuoy.sonobuoy-serviceaccount" 2020-07-02 06:20:12.270 [WARNING][50] felix/active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.sonobuoy.sonobuoy-serviceaccount" 2020-07-02 06:29:47.414 [WARNING][50] felix/active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.sonobuoy.sonobuoy-serviceaccount" 2020-07-02 07:17:00.862 [WARNING][50] felix/active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.sonobuoy.sonobuoy-serviceaccount" 2020-07-02 07:19:13.464 [WARNING][50] felix/active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.sonobuoy.sonobuoy-serviceaccount" 2020-07-02 07:46:20.367 [WARNING][50] felix/active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.sonobuoy.sonobuoy-serviceaccount" 2020-07-02 08:01:06.659 [WARNING][50] felix/active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.sonobuoy.sonobuoy-serviceaccount"

@zubron
Copy link
Contributor

zubron commented Jul 2, 2020

I'm not familiar with Calico and its related projects. That warning is coming from Felix. I don't know how policy management works with Calico but it looks like there might be something in your configuration which is causing the traffic for the sonobuoy-serviceaccount to be dropped?

I would recommend looking at your Calico configuration to see if there is anything in place that would cause this as this is not controlled by Sonobuoy.

@johnray21216
Copy link
Author

Hello @zubron. Apologies for the late response. According to one of my colleague, the high restart of the calico node was caused by a trial-and-error setup of calico for ipv6. But we will still try to look into this suggestion of yours.

May I ask if there are other factors that could result in this unusual behavior? Please let me know if you need additional information from us.

@zubron
Copy link
Contributor

zubron commented Jul 3, 2020

Ah, I see. I think it's still suspicious that you have those warnings regarding the sonbouoy service account. Reading the code from Felix where that warning is being generated, it looks like a deny all rule is being applied for inbound and outbound traffic for the profile for the sonbouoy service account which would prevent it from talking to the API server.

Given that it was able to run once successfully leads me to think that it's not an issue with Sonobuoy itself (my first thought was that it was an IPv6 specific issue). Sonobuoy is capable of running on the cluster, but it looks like there is something in the networking configuration on the cluster which is preventing it from communicating with the API server on subsequent runs. I would revisit the Calico settings to see if there is anything in place where it would default to denying traffic.

I have a small sample which isolates the section of code that is failing in Sonobuoy which you can try out. It might be easier for debugging the networking in your cluster: https://github.com/zubron/k8s-in-cluster-config

@tomikonio
Copy link

tomikonio commented Jul 5, 2020

Hi,
We are using calico as well.
Our cluster is not air-gaped - but we wish to make it so eventually.

I wonder what configuration in calico may cause this issue.
Running:

kubectl logs -n kube-system calico-node-722qr | grep WARNING

Yields:

2020-07-05 11:36:01.376 [WARNING][116] int_dataplane.go 357: Failed to query VXLAN device error=Link not found
2020-07-05 11:36:01.376 [WARNING][116] int_dataplane.go 369: Can't enable XDP acceleration. error=kernel is too old (have: 3.10.0 but want at least: 4.16.0)
2020-07-05 11:44:21.823 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.containers-4137.default"
2020-07-05 11:44:57.195 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.pod-network-test-2032.default"
2020-07-05 11:45:03.916 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.services-9331.default"
2020-07-05 11:45:16.147 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.pods-4949.default"
2020-07-05 11:45:50.692 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.sched-pred-5220.default"
2020-07-05 11:46:03.400 [WARNING][116] table.go 566: Detected out-of-sync inserts, marking for resync actualRuleIDs=[]string{"", "O3lYWMrLQYEMJtB5", ""} chainName="POSTROUTING" expectedRuleIDs=[]string{"O3lYWMrLQYEMJtB5", "", ""} ipVersion=0x6 table="nat"
2020-07-05 11:46:18.729 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.webhook-5412.default"
2020-07-05 11:47:59.702 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.container-lifecycle-hook-4331.default"
2020-07-05 11:49:29.593 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.init-container-1831.default"
2020-07-05 11:49:31.560 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.kubelet-test-3456.default"
2020-07-05 11:55:18.520 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.services-2986.default"
2020-07-05 11:55:53.743 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.kubectl-5551.default"
2020-07-05 11:56:13.808 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.container-probe-9229.default"
2020-07-05 11:57:20.109 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.container-lifecycle-hook-9737.default"
2020-07-05 12:04:35.798 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.projected-5355.default"
2020-07-05 12:05:16.672 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.projected-7349.default"
2020-07-05 12:05:24.955 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.deployment-2328.default"
2020-07-05 12:06:41.657 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.projected-5053.default"
2020-07-05 12:06:59.868 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.kubelet-test-9957.default"
2020-07-05 12:13:35.640 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.sched-pred-5454.default"
2020-07-05 12:13:42.892 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.e2e-kubelet-etc-hosts-6362.default"
2020-07-05 12:14:19.690 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.downward-api-2335.default"
2020-07-05 12:15:22.249 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.pods-5330.default"
2020-07-05 12:16:05.334 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.replicaset-8238.default"
2020-07-05 12:16:07.456 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.configmap-8344.default"
2020-07-05 12:19:10.599 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.secrets-9798.default"
2020-07-05 12:19:57.699 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.prestop-3027.default"
2020-07-05 12:20:23.220 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.kubectl-7732.default"
2020-07-05 12:21:02.217 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.webhook-6048.default"
2020-07-05 12:22:56.629 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.sched-pred-106.default"
2020-07-05 12:23:38.979 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.services-5879.default"
2020-07-05 12:23:59.239 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.job-9844.default"
2020-07-05 12:24:23.818 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.container-lifecycle-hook-5295.default"
2020-07-05 12:25:17.021 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.downward-api-7736.default"
2020-07-05 12:25:21.090 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.pods-3588.default"
2020-07-05 12:25:55.068 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.pod-network-test-6224.default"
2020-07-05 12:26:07.782 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.configmap-2794.default"
2020-07-05 12:26:28.186 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.pods-6248.default"
2020-07-05 12:26:45.040 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.replication-controller-6798.default"
2020-07-05 12:27:57.722 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.services-1905.default"
2020-07-05 12:33:55.996 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.pod-network-test-629.default"
2020-07-05 12:34:00.112 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.replicaset-9777.default"
2020-07-05 12:34:38.934 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.svc-latency-7098.default"
2020-07-05 12:35:55.493 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.configmap-1806.default"
2020-07-05 12:39:11.055 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.pods-2206.default"
2020-07-05 12:40:02.204 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.gc-2459.default"
2020-07-05 12:42:15.415 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.events-5122.default"
2020-07-05 12:48:15.680 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.kubectl-2695.default"
2020-07-05 14:14:16.436 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.sonobuoy.sonobuoy-serviceaccount"

Hmm, it seems that sonobuoy is being blocked here.

But we are not using any calico network policies or profiles.
It seems that only the sonobuoy related stuff is being blocked by calico.
And what does that have to do with the image being present on the machine or not ?

@zubron
Copy link
Contributor

zubron commented Jul 5, 2020

Is there anything relevant on log lines that aren’t just those warnings?

Also, the warnings for other profiles that aren’t recognized seem to be coming from the e2e tests themselves. The format for the profile names seems to be “ksa..“ and all the namespaces match the format used by those tests. The workloads that run as part of the e2e tests aren’t managed by sonobuoy. Sonobuoy initiates the run of the test framework but the workloads for tests run independently. So it seems that it affecting other workloads. When you were able to run the tests, did they run successfully?

@tomikonio
Copy link

tomikonio commented Jul 6, 2020

Thease are the logs when sonobuoy is started (from all 3 nodes):


2020-07-06 06:08:43.951 [INFO][116] int_dataplane.go 836: Received *proto.ServiceAccountUpdate update from calculation graph msg=id:<namespace:"sonobuoy" name:"sonobuoy-serviceaccount" > labels:<key:"component" value:"sonobuoy" >
2020-07-06 06:08:43.966 [INFO][116] int_dataplane.go 836: Received *proto.ServiceAccountUpdate update from calculation graph msg=id:<namespace:"sonobuoy" name:"sonobuoy-serviceaccount" > labels:<key:"component" value:"sonobuoy" >
2020-07-06 06:08:44.694 [INFO][116] calc_graph.go 392: Local endpoint updated id=WorkloadEndpoint(node=allot-k8s-worker02, orchestrator=k8s, workload=sonobuoy/sonobuoy, name=eth0)
2020-07-06 06:08:44.694 [INFO][116] int_dataplane.go 836: Received *proto.ActiveProfileUpdate update from calculation graph msg=id:<name:"kns.sonobuoy" > profile:<inbound_rules:<action:"allow" rule_id:"GP63uymSS-bedsaO" > outbound_rules:<action:"allow" rule_id:"ZrNuC7hVPUgqd8UB" > >
2020-07-06 06:08:44.694 [INFO][116] table.go 475: Queueing update of chain. chainName="cali-pri-kns.sonobuoy" ipVersion=0x4 table="filter"
2020-07-06 06:08:44.694 [INFO][116] table.go 475: Queueing update of chain. chainName="cali-pro-kns.sonobuoy" ipVersion=0x4 table="filter"
2020-07-06 06:08:44.694 [INFO][116] table.go 475: Queueing update of chain. chainName="cali-pri-kns.sonobuoy" ipVersion=0x6 table="filter"
2020-07-06 06:08:44.694 [INFO][116] table.go 475: Queueing update of chain. chainName="cali-pro-kns.sonobuoy" ipVersion=0x6 table="filter"
2020-07-06 06:08:44.694 [INFO][116] int_dataplane.go 836: Received *proto.ActiveProfileUpdate update from calculation graph msg=id:<name:"ksa.sonobuoy.sonobuoy-serviceaccount" > profile:<>
2020-07-06 06:08:44.694 [INFO][116] int_dataplane.go 836: Received *proto.WorkloadEndpointUpdate update from calculation graph msg=id:<orchestrator_id:"k8s" workload_id:"sonobuoy/sonobuoy" endpoint_id:"eth0" > endpoint:<state:"active" name:"cali41a3cc79a18" profile_ids:"kns.sonobuoy" profile_ids:"ksa.sonobuoy.sonobuoy-serviceaccount" ipv6_nets:"fd00:10:101:1111::1c26/128" tiers:<name:"default" ingress_policies:"default.allow-all-upgrade" egress_policies:"default.allow-all-upgrade" > >
2020-07-06 06:08:44.694 [INFO][116] endpoint_mgr.go 497: Updating per-endpoint chains. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:08:44.694 [INFO][116] endpoint_mgr.go 524: Updating endpoint routes. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:08:44.695 [INFO][116] endpoint_mgr.go 432: Re-evaluated workload endpoint status adminUp=true failed=false known=true operUp=true status="up" workloadEndpointID=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:08:44.695 [INFO][116] status_combiner.go 58: Storing endpoint status update ipVersion=0x4 status="up" workload=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:08:44.695 [INFO][116] endpoint_mgr.go 497: Updating per-endpoint chains. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:08:44.695 [INFO][116] endpoint_mgr.go 524: Updating endpoint routes. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:08:44.695 [INFO][116] endpoint_mgr.go 432: Re-evaluated workload endpoint status adminUp=true failed=false known=true operUp=true status="up" workloadEndpointID=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:08:44.695 [INFO][116] status_combiner.go 58: Storing endpoint status update ipVersion=0x6 status="up" workload=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:08:44.707 [INFO][116] status_combiner.go 81: Endpoint up for at least one IP version id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"} ipVersion=0x4 status="up"
2020-07-06 06:08:44.707 [INFO][116] status_combiner.go 98: Reporting combined status. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"} status="up"
2020-07-06 06:08:45.593 [INFO][116] calc_graph.go 392: Local endpoint updated id=WorkloadEndpoint(node=allot-k8s-worker02, orchestrator=k8s, workload=sonobuoy/sonobuoy, name=eth0)
2020-07-06 06:08:45.593 [INFO][116] int_dataplane.go 836: Received *proto.WorkloadEndpointUpdate update from calculation graph msg=id:<orchestrator_id:"k8s" workload_id:"sonobuoy/sonobuoy" endpoint_id:"eth0" > endpoint:<state:"active" name:"cali41a3cc79a18" profile_ids:"kns.sonobuoy" profile_ids:"ksa.sonobuoy.sonobuoy-serviceaccount" ipv6_nets:"fd00:10:101:1111::1c26/128" tiers:<name:"default" ingress_policies:"default.allow-all-upgrade" egress_policies:"default.allow-all-upgrade" > >
2020-07-06 06:08:45.593 [INFO][116] endpoint_mgr.go 497: Updating per-endpoint chains. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:08:45.594 [INFO][116] endpoint_mgr.go 524: Updating endpoint routes. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:08:45.594 [INFO][116] endpoint_mgr.go 432: Re-evaluated workload endpoint status adminUp=true failed=false known=true operUp=true status="up" workloadEndpointID=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:08:45.594 [INFO][116] status_combiner.go 58: Storing endpoint status update ipVersion=0x4 status="up" workload=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:08:45.594 [INFO][116] endpoint_mgr.go 497: Updating per-endpoint chains. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:08:45.594 [INFO][116] endpoint_mgr.go 524: Updating endpoint routes. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:08:45.594 [INFO][116] endpoint_mgr.go 432: Re-evaluated workload endpoint status adminUp=true failed=false known=true operUp=true status="up" workloadEndpointID=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:08:45.594 [INFO][116] status_combiner.go 58: Storing endpoint status update ipVersion=0x6 status="up" workload=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:08:45.604 [INFO][116] status_combiner.go 81: Endpoint up for at least one IP version id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"} ipVersion=0x4 status="up"
2020-07-06 06:08:45.604 [INFO][116] status_combiner.go 98: Reporting combined status. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"} status="up"


2020-07-06 06:08:44.930 [INFO][115] int_dataplane.go 836: Received *proto.ServiceAccountUpdate update from calculation graph msg=id:<namespace:"sonobuoy" name:"sonobuoy-serviceaccount" > labels:<key:"component" value:"sonobuoy" >
2020-07-06 06:08:44.945 [INFO][115] int_dataplane.go 836: Received *proto.ServiceAccountUpdate update from calculation graph msg=id:<namespace:"sonobuoy" name:"sonobuoy-serviceaccount" > labels:<key:"component" value:"sonobuoy" >

2020-07-06 06:08:42.994 [INFO][96] int_dataplane.go 836: Received *proto.ServiceAccountUpdate update from calculation graph msg=id:<namespace:"sonobuoy" name:"sonobuoy-serviceaccount" > labels:<key:"component" value:"sonobuoy" >
2020-07-06 06:08:43.008 [INFO][96] int_dataplane.go 836: Received *proto.ServiceAccountUpdate update from calculation graph msg=id:<namespace:"sonobuoy" name:"sonobuoy-serviceaccount" > labels:<key:"component" value:"sonobuoy" >

Thease are the logs when sonobuoy is deleted:

2020-07-06 06:12:49.159 [INFO][116] int_dataplane.go 836: Received *proto.ServiceAccountUpdate update from calculation graph msg=id:<namespace:"sonobuoy" name:"sonobuoy-serviceaccount" > labels:<key:"component" value:"sonobuoy" >
2020-07-06 06:12:49.209 [WARNING][116] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.sonobuoy.sonobuoy-serviceaccount"
2020-07-06 06:12:49.209 [INFO][116] int_dataplane.go 836: Received *proto.ActiveProfileUpdate update from calculation graph msg=id:<name:"ksa.sonobuoy.sonobuoy-serviceaccount" > profile:<inbound_rules:<action:"deny" rule_id:"72K0GtWSqVmg3zna" > outbound_rules:<action:"deny" rule_id:"g4ouJCo7BiI1jZ0N" > >
2020-07-06 06:12:49.209 [INFO][116] int_dataplane.go 836: Received *proto.ServiceAccountRemove update from calculation graph msg=id:<namespace:"sonobuoy" name:"sonobuoy-serviceaccount" >
2020-07-06 06:12:49.222 [INFO][116] calc_graph.go 392: Local endpoint updated id=WorkloadEndpoint(node=allot-k8s-worker02, orchestrator=k8s, workload=sonobuoy/sonobuoy, name=eth0)
2020-07-06 06:12:49.222 [INFO][116] int_dataplane.go 836: Received *proto.WorkloadEndpointUpdate update from calculation graph msg=id:<orchestrator_id:"k8s" workload_id:"sonobuoy/sonobuoy" endpoint_id:"eth0" > endpoint:<state:"active" name:"cali41a3cc79a18" profile_ids:"kns.sonobuoy" profile_ids:"ksa.sonobuoy.sonobuoy-serviceaccount" ipv6_nets:"fd00:10:101:1111::1c26/128" tiers:<name:"default" ingress_policies:"default.allow-all-upgrade" egress_policies:"default.allow-all-upgrade" > >
2020-07-06 06:12:49.222 [INFO][116] endpoint_mgr.go 497: Updating per-endpoint chains. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:12:49.222 [INFO][116] endpoint_mgr.go 524: Updating endpoint routes. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:12:49.223 [INFO][116] endpoint_mgr.go 432: Re-evaluated workload endpoint status adminUp=true failed=false known=true operUp=true status="up" workloadEndpointID=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:12:49.223 [INFO][116] status_combiner.go 58: Storing endpoint status update ipVersion=0x4 status="up" workload=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:12:49.223 [INFO][116] endpoint_mgr.go 497: Updating per-endpoint chains. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:12:49.223 [INFO][116] endpoint_mgr.go 524: Updating endpoint routes. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:12:49.223 [INFO][116] endpoint_mgr.go 432: Re-evaluated workload endpoint status adminUp=true failed=false known=true operUp=true status="up" workloadEndpointID=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:12:49.223 [INFO][116] status_combiner.go 58: Storing endpoint status update ipVersion=0x6 status="up" workload=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:12:49.231 [INFO][116] status_combiner.go 81: Endpoint up for at least one IP version id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"} ipVersion=0x4 status="up"
2020-07-06 06:12:49.231 [INFO][116] status_combiner.go 98: Reporting combined status. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"} status="up"
2020-07-06 06:13:19.415 [INFO][116] endpoint_mgr.go 432: Re-evaluated workload endpoint status adminUp=true failed=false known=true operUp=false status="down" workloadEndpointID=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:13:19.415 [INFO][116] status_combiner.go 58: Storing endpoint status update ipVersion=0x4 status="down" workload=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:13:19.415 [INFO][116] endpoint_mgr.go 432: Re-evaluated workload endpoint status adminUp=true failed=false known=true operUp=false status="down" workloadEndpointID=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:13:19.415 [INFO][116] status_combiner.go 58: Storing endpoint status update ipVersion=0x6 status="down" workload=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:13:19.415 [INFO][116] status_combiner.go 78: Endpoint down for at least one IP version id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"} ipVersion=0x6 status="down"
2020-07-06 06:13:19.415 [INFO][116] status_combiner.go 78: Endpoint down for at least one IP version id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"} ipVersion=0x4 status="down"
2020-07-06 06:13:19.416 [INFO][116] status_combiner.go 98: Reporting combined status. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"} status="down"
2020-07-06 06:13:19.913 [INFO][116] calc_graph.go 390: Local endpoint deleted id=WorkloadEndpoint(node=allot-k8s-worker02, orchestrator=k8s, workload=sonobuoy/sonobuoy, name=eth0)
2020-07-06 06:13:19.913 [INFO][116] int_dataplane.go 836: Received *proto.WorkloadEndpointRemove update from calculation graph msg=id:<orchestrator_id:"k8s" workload_id:"sonobuoy/sonobuoy" endpoint_id:"eth0" >
2020-07-06 06:13:19.913 [INFO][116] int_dataplane.go 836: Received *proto.ActiveProfileRemove update from calculation graph msg=id:<name:"kns.sonobuoy" >
2020-07-06 06:13:19.913 [INFO][116] table.go 499: Queing deletion of chain. chainName="cali-pri-kns.sonobuoy" ipVersion=0x4 table="filter"
2020-07-06 06:13:19.913 [INFO][116] table.go 499: Queing deletion of chain. chainName="cali-pro-kns.sonobuoy" ipVersion=0x4 table="filter"
2020-07-06 06:13:19.913 [INFO][116] table.go 499: Queing deletion of chain. chainName="cali-pri-kns.sonobuoy" ipVersion=0x6 table="filter"
2020-07-06 06:13:19.913 [INFO][116] table.go 499: Queing deletion of chain. chainName="cali-pro-kns.sonobuoy" ipVersion=0x6 table="filter"
2020-07-06 06:13:19.913 [INFO][116] int_dataplane.go 836: Received *proto.ActiveProfileRemove update from calculation graph msg=id:<name:"ksa.sonobuoy.sonobuoy-serviceaccount" >
2020-07-06 06:13:19.913 [INFO][116] endpoint_mgr.go 577: Workload removed, deleting its chains. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:13:19.913 [INFO][116] endpoint_mgr.go 586: Workload removed, deleting old state. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:13:19.914 [INFO][116] endpoint_mgr.go 432: Re-evaluated workload endpoint status adminUp=false failed=false known=false operUp=false status="" workloadEndpointID=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:13:19.914 [INFO][116] status_combiner.go 58: Storing endpoint status update ipVersion=0x4 status="" workload=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:13:19.914 [INFO][116] endpoint_mgr.go 577: Workload removed, deleting its chains. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:13:19.914 [INFO][116] endpoint_mgr.go 586: Workload removed, deleting old state. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:13:19.914 [INFO][116] endpoint_mgr.go 432: Re-evaluated workload endpoint status adminUp=false failed=false known=false operUp=false status="" workloadEndpointID=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:13:19.914 [INFO][116] status_combiner.go 58: Storing endpoint status update ipVersion=0x6 status="" workload=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}
2020-07-06 06:13:19.926 [INFO][116] status_combiner.go 86: Reporting endpoint removed. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"sonobuoy/sonobuoy", EndpointId:"eth0"}



2020-07-06 06:12:50.145 [INFO][115] int_dataplane.go 836: Received *proto.ServiceAccountUpdate update from calculation graph msg=id:<namespace:"sonobuoy" name:"sonobuoy-serviceaccount" > labels:<key:"component" value:"sonobuoy" >
2020-07-06 06:12:50.193 [INFO][115] int_dataplane.go 836: Received *proto.ServiceAccountRemove update from calculation graph msg=id:<namespace:"sonobuoy" name:"sonobuoy-serviceaccount" >


2020-07-06 06:12:48.201 [INFO][96] int_dataplane.go 836: Received *proto.ServiceAccountUpdate update from calculation graph msg=id:<namespace:"sonobuoy" name:"sonobuoy-serviceaccount" > labels:<key:"component" value:"sonobuoy" >
2020-07-06 06:12:48.253 [INFO][96] int_dataplane.go 836: Received *proto.ServiceAccountRemove update from calculation graph msg=id:<namespace:"sonobuoy" name:"sonobuoy-serviceaccount" >


Hmm, it seems the warning is issued at termination.

And yes, on the first run of sonobuoy the tests were successfully completed.

@johnray21216
Copy link
Author

johnray21216 commented Jul 6, 2020

Hello @zubron! I deployed your k8s-in-cluster-config sample and this was the log message:

[root@b-ipv6-k8s-ms1 bin]# kubectl logs -n test-in-cluster-config list-pods config.Host https://<ipv6_address>:443 found 20 API group resources There are 1 pods in the test-in-cluster-config namespace: list-pods

I've pulled and pushed the zubron/k8s-in-cluster-config image into my private repository.

I've created a new environment and re ran sonobuoy. It still has the i/o timeout issue and the calico node logs still shows the following:

kubectl logs -n kube-system calico-node-p7jvz | grep WARNING 2020-07-02 16:22:29.352 [WARNING][48] felix/int_dataplane.go 407: Can't enable XDP acceleration. error=kernel is too old (have: 3.10.0 but want at least: 4.16.0) 2020-07-02 16:22:29.416 [WARNING][48] felix/ipip_mgr.go 112: Failed to add IPIP tunnel device error=exit status 1 2020-07-02 16:22:29.416 [WARNING][48] felix/ipip_mgr.go 89: Failed configure IPIP tunnel device, retrying... error=exit status 1 2020-07-06 07:58:04.336 [WARNING][48] felix/active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.sonobuoy.sonobuoy-serviceaccount" 2020-07-06 07:58:04.336 [WARNING][48] felix/active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.sonobuoy.sonobuoy-serviceaccount" 2020-07-06 09:15:30.365 [WARNING][48] felix/active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="kns.sonobuoy" 2020-07-06 09:15:30.366 [WARNING][48] felix/active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="kns.sonobuoy" 2020-07-06 09:45:54.157 [WARNING][48] felix/active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.sonobuoy.sonobuoy-serviceaccount" 2020-07-06 09:45:54.158 [WARNING][48] felix/active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.sonobuoy.sonobuoy-serviceaccount" 2020-07-06 09:55:37.726 [WARNING][48] felix/active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.sonobuoy.sonobuoy-serviceaccount"''

I will try your suggestion regarding checking the calico manifest. I will provide feedback tomorrow.

@zubron
Copy link
Contributor

zubron commented Jul 6, 2020

I deployed your k8s-in-cluster-config sample and this was the log message:

My suspicion is that it will fail with the same timeout error on any subsequent runs (deleting the resources and then retrying) just as sonobuoy fails on subsequent runs after working correctly the first time.

@zubron
Copy link
Contributor

zubron commented Jul 6, 2020

@tomikonio Thanks for providing the logs.

I wonder if there is some issue in Calico where if the profile generated for the service account is set to deny all traffic upon deletion but isn't cleaned up from the data store, the same profile is reused on subsequent runs (since its name is formed from the same namespace and service account name). That profiles rules are to deny traffic and so subsequent runs of sonobuoy fail.

I wonder if you use a different namespace for Sonobuoy for the subsequent runs (using -n <namespace>) without deleting the images, would it work?

I still don't understand why deleting the images has any impact though. I wonder if re-pulling the images causes some extra delay in the process or some other network usages which allows the state to repair.

Either way, this is not something controlled by Sonobuoy, and seems to be an issue in Calico. Recreating a namespace and service account is a valid thing to do.

@johnray21216
Copy link
Author

Hello @zubron! I have done checking on calico and we found out no issues with it. We were able to confirm that we are using the default "allow all" policy in calico. We are still doing checks in our cluster environment. The unusual thing was sonobuoy works IF we pull the image from the repository but not in instances that the image exists in the node where the pod is scheduled.

We are still investigating this and would love to hear your inputs also regarding this.

@zubron
Copy link
Contributor

zubron commented Jul 10, 2020

That is strange behaviour but something that is completely outside of Sonobuoy's control. The image already being present on the node should have absolutely no impact on the behaviour of the container using that image.

You could try setting the imagePullPolicy to Always to force the image to be pulled each time using the flag --image-pull-policy Always.

Also, as we discussed yesterday, you are seeing the same behaviour with the sample program so I don't think it's anything specific to the sonobuoy image.

@johnray21216
Copy link
Author

Hello @zubron, do we add that flag on sonobuoy? I've tried that and it it still failed. I'll do a follow up regarding this if we have further knowledge of the issue.

@zubron
Copy link
Contributor

zubron commented Jul 12, 2020

Yes, it’s a flag that you use with “sonobuoy run”. It sets an imagePullPolicy of Always on all the sonobuoy pods which forces the kubelet to pull the images each time. It’s the only way that sonobuoy can provide any influence over whether the image is pulled before starting the pods.

@johnray21216
Copy link
Author

johnray21216 commented Jul 23, 2020

Hello @zubron! Posting this here for my colleagues to see. We have tried the following scenarios already.

  1. We have tried modifying the IP address in the YAML file directly (via environment variables) to point to the following endpoints:

    • kuberenetes-svc-clusterIP:443
    • kube-api-serverIP:6443

In the first run, BOTH endpoints work and was able to proceed with the test. But, when we ran sonobuoy consecutively (with removing images, just the sonobuoy resources), the issue STILL persists. We have already checked with our kubernetes setup and our calico configuration. I may need to open an issue to calico on this though.

Is there a possibility that sonobuoy isn't able to communicate with the Cluster? We are using the 0.17.2 version and the 0.18.0 on our end.

@zubron
Copy link
Contributor

zubron commented Jul 23, 2020

Hi @johnray21216. Given that Sonobuoy is able to connect to the API server and run the first time, nothing fundamentally changes with a second run that would prevent it from communicating.

As I have stated in our conversations on Slack, I strongly suspect that it is an issue with Calico due to the logs where it states that it will drop all traffic for the service account. Yes, you may have configured Calico correctly, however there could still be a bug there. If you haven't already, I would recommend either reaching out to them on their Slack or Discourse, or opening an issue in their project. (Links here: https://www.projectcalico.org/community/).

@stale
Copy link

stale bot commented Nov 4, 2020

There has not been much activity here. We'll be closing this issue if there are no follow-ups within 15 days.

@stale stale bot added the misc/wontfix label Nov 4, 2020
@stale stale bot closed this as completed Nov 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants