Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kubernetes] Investigate replacement of NODE_NAME and hostNetwork with status.hostIP #25739

Closed
ChrsMark opened this issue May 17, 2021 · 15 comments
Labels
kubernetes Enable builds in the CI for kubernetes Team:Integrations Label for the Integrations team

Comments

@ChrsMark
Copy link
Member

Currently in Metricbeat's manifests we use spec.nodeName at https://github.com/elastic/beats/blob/master/deploy/kubernetes/metricbeat-kubernetes.yaml#L172 in combination with hostNetwork to define the IP of k8s node running on. Most probably we could avoid hostNetwork setting by leveraging status.hostIP of Downward API.

cc: @exekias @jsoriano

@ChrsMark ChrsMark added the kubernetes Enable builds in the CI for kubernetes label May 17, 2021
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label May 17, 2021
@ChrsMark ChrsMark added the Team:Integrations Label for the Integrations team label May 17, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 17, 2021
@jsoriano
Copy link
Member

Currently in Metricbeat's manifests we use spec.nodeName at https://github.com/elastic/beats/blob/master/deploy/kubernetes/metricbeat-kubernetes.yaml#L172 in combination with hostNetwork to define the IP of k8s node running on.

Umm, not sure if we are in the same page, what do you mean by defining the IP?

Let me explain my current understanding.

spec.nodeName (through NODE_NAME) and hostNetwork are used for different things.

hostNetwork: true is used so beats can collect information from the host network (in add_host_metadata, the system module and so on).

When hostNetwork: true is used, current logic, based on the hostname, cannot know in what host the beat is running, so we use host: ${NODE_NAME} to explicitly tell it. The node is needed for the default scope: node, to collect information only from resources in the current node.

So if I am right, as Beats are now, they need hostNetwork to collect host network information, and the node name to filter resources.

Most probably we could avoid hostNetwork setting by leveraging status.hostIP of Downward API.

Where is the host IP needed? If it is needed and Beats are getting it in tricky ways, +1 to use the Downward API, but I think hostNetwork: true would still be needed for other things.

Said that, if Beats would collect the host information from files mounted from the host (procfs...), hostNetwork wouldn't be needed. Same thing for NODE_NAME if there is some other reliable way of obtaining the name of the node where the Beat pod is running.

@jsoriano
Copy link
Member

Anything that reduces the dependency on hostNetwork would help on #19600.

@ChrsMark
Copy link
Member Author

I'm primarily thinking of accessing kubelet's API at

hosts: ["https://${NODE_NAME}:10250"]
.

Without hostNetwork: true I get the following

curl https://${NODE_NAME}:10250
curl: (6) Could not resolve host: minikube; Unknown error

but it can reach the API if I use status.hostIP adding:

  - name: NODE_IP
    valueFrom:
      fieldRef:
        fieldPath: status.hostIP

to get the following:

[root@metricbeat-zlx97 metricbeat]# curl https://${NODE_NAME}:10250
curl: (6) Could not resolve host: minikube; Unknown error
[root@metricbeat-zlx97 metricbeat]# curl https://${NODE_IP}:10250
curl: (60) Peer's certificate issuer has been marked as not trusted by the user.
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

So for this specific case we are fine to remove hostNetwork. Not sure if other cases (in add_host_metadata, the system module and so on) still need hostNetwork: true or they can still work with only NODE_NAME through spec.nodeName.
In general hostNetwork: true can be a limitation for some users so it would be really useful if we could get away with it.

@jsoriano
Copy link
Member

I'm primarily thinking of accessing kubelet's API at

hosts: ["https://${NODE_NAME}:10250"]

Oh ok, I was not thinking on this use case. Yes, for this it looks like a good idea to use the node ip. We would still need by now the node name and host network for other things.

In general hostNetwork: true can be a limitation for some users so it would be really useful if we could get away with it.

+1

@MichaelKatsoulis
Copy link
Contributor

Investigating a bit on this one I have noticed the following.

  1. In Kubernetes the NODE_NAME is used to help in the discovery of kubernetes node in code and as proposed host to access kubelet in each node (code)
  2. hostNetwork: true is required for add_host_metadata processor. Now this processor is by default disabled in kubernetes proposed manifest. If enabled and hostNetwork: false then beats will get host info from the pod it is running on instead of the k8s node.
  3. System module reads the information required from the host filesystem. Like in line for system cpu metrics. Maybe @jsoriano you have noticed a case that network is used to get system information
  4. spec.nodeName can be used without hostNetwork: true. The pod and the node belong to the same network and the node is accessible from inside the pod. I have tried with kind and Minikube and I never encountered the name resolving error @ChrsMark mentioned. And it wouldn't make sense for the pod to not resolve the node name domain.
  5. if status.hostIP is used instead of spec.nodeName then the only problem will be the node discovery default behaviour. If the node is not set in the metricbeat configuration like it happens in agent case then the code is based on the NODE_NAME env var as default (code).
  6. In metricbeat proposed manifest for k8s we should update the proxy host ip recommendation to leverage $NODE_NAME as we do with kubelet. The reason is that localhost works only with hostNetwork: true.

@ChrsMark
Copy link
Member Author

ChrsMark commented Aug 31, 2021

Investigating a bit on this one I have noticed the following.

Nice investigation! Commenting inline.

  1. In Kubernetes the NODE_NAME is used to help in the discovery of kubernetes node in code and as proposed host to access kubelet in each node (code)
  2. hostNetwork: true is required for add_host_metadata processor. Now this processor is by default disabled in kubernetes proposed manifest. If enabled and hostNetwork: false then beats will get host info from the pod it is running on instead of the k8s node.

We own the manifests and if we don't have add_host_metadata in the manifests then I think it is safe to not have hostNetwork: true in the same manifests too. Maybe we can add a comment in the config to add this setting if users aim to use add_host_metadata, but in the generic case I think we are not blocked by the processor here.

  1. System module reads the information required from the host filesystem. Like in line for system cpu metrics. Maybe @jsoriano you have noticed a case that network is used to get system information

I'm not aware of system's module internals, maybe @fearful-symmetry can help here?

  1. spec.nodeName can be used without hostNetwork: true. The pod and the node belong to the same network and the node is accessible from inside the pod. I have tried with kind and Minikube and I never encountered the name resolving error @ChrsMark mentioned. And it wouldn't make sense for the pod to not resolve the node name domain.

You are right, I double checked it and we can reach the kubelet's API using NODE_NAME even if we are not using hostNetwork. Not sure why it was not working in my previous comment and apologies for the confusion. Most probably I was confused with the usage of HOSTNAME var which we had been using in the past (before switching to NODE_NAME) which is not working if you don't use hostNetwork.

Example of hostNetwork:False:

NODE_NAME=gke-chrismark-elasticon-default-pool-d8304c2d-5eyc
HOSTNAME=metricbeat-7slrx

We changed that at #17469.

Having said this, I guess that are good to stick with NODE_NAME and just verify if it works without hostNetwork.

  1. if status.hostIP is used instead of spec.nodeName then the only problem will be the node discovery default behaviour. If the node is not set in the metricbeat configuration like it happens in agent case then the code is based on the NODE_NAME env var as default (code).

We can expose the value to same env var or just tune the discovery code :).

  1. In metricbeat proposed manifest for k8s we should update the proxy host ip recommendation to leverage $NODE_NAME as we do with kubelet. The reason is that localhost works only with hostNetwork: true.
    I agree, feel free to open a PR for this :) (or just a separate issue).

cc-ing @exekias and @jmlrt since we had discussions around it in the past.

For reference one issue with the usage of hostNetwork is described at elastic/helm-charts#391 (comment). Also people having multi-tenancy needs on their k8s clusters have the same issues.

@MichaelKatsoulis
Copy link
Contributor

I'm not aware of system's module internals, maybe @fearful-symmetry can help here?

Some of the metricsets of system module like network and socket give different results when hostNetwork is enabled or not. This makes sense as that option exposes the host's interfaces to the pod.
As a user of Kubernetes though I wouldn't expect to gather system metrics of the node the pod is running on as this is ephemeral and can change too often.
We have other metricsets to gather node informations.

@fearful-symmetry
Copy link
Contributor

@ChrsMark system/socket uses netlink to gather data. For the most part, the system module sticks to sysfs and procfs. However, there's still a handful of interesting cases when it comes to how that stuff works with container tech. For example, a container inherits its own cgroup hierarchy from the host OS, with the behavior differing slightly across V1 and V2.

@MichaelKatsoulis
Copy link
Contributor

When metricbeat or elastic-agent runs on kubernetes we propose in the manifests to mount /proc and /sys/fs/cgroup of the k8s node inside the pod under /hostfs/ directory (yaml).

This is due to the system module (code).

Network metricset (doc) and probably socket require the hostNetwork:true setting to get network related info from the node.

Another problem of getting rid of that setting is elastic-package. Under the hood it connects the network of the kubernetes node (created by kind) with the one of elastic-stack so that the testing can take place.
This allows the agent to connect to kibana of the elastic stack for example at kibana:5601.
If hostNetwork:false then the pod running in the k8s node won't have the same network with the node so it will not connect to the elastic stack.
This is not an issue when using elastic cloud.
Also this could be resolved in elastic package code by patching the proposed manifest to add hostNetwork:true before applying it. We already do stuff like this in elastic package.
@mtojek ^^

My opinion is that we could get rid of hostNetwork from proposed manifests and add a note to the documentation of running agent/metricbeat on kubernetes that if user wants network related info from the node, then they can enable this setting.
And update also elastic-package code to add the setting for testing.

@jsoriano , @exekias what do you think?

@jsoriano
Copy link
Member

jsoriano commented Sep 1, 2021

My opinion is that we could get rid of hostNetwork from proposed manifests and add a note to the documentation of running agent/metricbeat on kubernetes that if user wants network related info from the node, then they can enable this setting.

SGTM, maybe this network-related info could be obtained from the root filesystems, I remember some discussions about this.

And update also elastic-package code to add the setting for testing.

I think that elastic-package should have its own configuration (as it has for the docker compose runner), it shouldn't download the manifests from beats. We should update elastic-package first so it doesn't break when this is changed in beats.

@jsoriano
Copy link
Member

jsoriano commented Sep 2, 2021

I think that elastic-package should have its own configuration (as it has for the docker compose runner), it shouldn't download the manifests from beats. We should update elastic-package first so it doesn't break when this is changed in beats.

PR open for this elastic/elastic-package#502

@ChrsMark
Copy link
Member Author

@MichaelKatsoulis after recent findings I think we can conclude on keeping things as is?

@MichaelKatsoulis
Copy link
Contributor

We have the problem discussed in elastic/integrations#2029 that by default, kube-proxy metrics server binds to localhost address. So in order to be accessed by the agent/beats we need the hostNetwork:true so that pod and node will share the same network.
The only way to tackle this is to change the default settings of kube-proxy which unfortunately is not always possible when running in managed k8s.

So we will keep it as is for now.

@ChrsMark
Copy link
Member Author

So maybe we can close this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kubernetes Enable builds in the CI for kubernetes Team:Integrations Label for the Integrations team
Projects
None yet
Development

No branches or pull requests

5 participants