Consider not adding host.ip metadata to k8s container metrics by default #6674

felixbarny · 2023-06-22T15:52:18Z

@martijnvg found out that in a test dataset for k8s container metrics, there were 100+ IP addresses attached as metadata.

We'll need to find out why that's the case and if these IPs make sense to add to container metrics as metadata. If these represent the IP addresses of the k8s node, it doesn't seem useful to add them as metadata to the container metrics anyway.

Tasks

Give feedback

[Enhanncement for host.ip and host.mac] Disabling netinfo.enabled option of add-host-metadata processor beats#36506

Team:Cloudnative-Monitoring Team:Elastic-Agent
Updating Elastic Manifests with NETINFO variable elastic-agent#3354

backport-skip
Adding option to enable/disable NETINFO variable in Kibana installation steps kibana#165700

Monitoring-Cloudnative Team:Observability
Adding information for NETINFO environmental variable ingest-docs#463

backport-skip
Set env variable ELASTIC_NETINFO:false in Kibana kibana#166156

Team:Fleet backport:skip release_note:feature v8.11.0
Options

martijnvg · 2023-06-23T07:35:32Z

An example of the list of IPs that I have observed in documents:


"ip": [
              "10.100.6.1",
              "10.128.0.162",
              "169.254.123.1",
              "fe80::a2:36ff:fe8d:a721",
              "fe80::a9:6eff:fe66:57db",
              "fe80::4c3:cff:fe87:61af",
              "fe80::8af:82ff:fe10:a293",
              "fe80::c66:cbff:fed2:ff7e",
              "fe80::cbf:e1ff:fe7c:de84",
              "fe80::1490:e1ff:fe2d:c525",
              "fe80::14c1:9cff:fe36:3620",
              "fe80::18eb:57ff:febf:570d",
              "fe80::20bc:93ff:feb2:906e",
              "fe80::2480:31ff:fe41:4e64",
              "fe80::24a1:3fff:fe74:ae73",
              "fe80::2815:36ff:fe54:d2f",
              "fe80::288a:50ff:fe94:c471",
              "fe80::28cc:91ff:fef2:e4dc",
              "fe80::2c44:96ff:fead:1f17",
              "fe80::2cba:f7ff:fe8d:ba7d",
              "fe80::2cf1:deff:feea:b51d",
              "fe80::3087:4aff:fe98:35b0",
              "fe80::30ce:9aff:fe28:6329",
              "fe80::3880:8fff:fe39:bbb3",
              "fe80::3c49:d3ff:fe41:e6a5",
              "fe80::3c65:49ff:fe2a:c375",
              "fe80::4001:aff:fe80:a2",
              "fe80::40bf:fbff:feb3:88d2",
              "fe80::40de:26ff:fe7f:826",
              "fe80::4465:33ff:fe6f:2014",
              "fe80::44a1:d2ff:fe83:eecb",
              "fe80::484d:7dff:fe6c:f326",
              "fe80::48d5:cdff:fed3:207b",
              "fe80::4c6c:7bff:fefd:aa4e",
              "fe80::50b4:16ff:feaa:44ce",
              "fe80::5447:b1ff:fe53:a49f",
              "fe80::54d6:70ff:fe73:2ef6",
              "fe80::5889:feff:feca:2394",
              "fe80::6425:54ff:fee6:7942",
              "fe80::64e3:45ff:fe09:7830",
              "fe80::685b:6aff:fef3:60aa",
              "fe80::6c73:adff:fe93:6c4",
              "fe80::6c7c:6aff:fe1e:6e5b",
              "fe80::701a:25ff:fe63:7b47",
              "fe80::701c:bfff:fe92:96b5",
              "fe80::709e:c7ff:fea2:f322",
              "fe80::70b6:efff:fe31:da37",
              "fe80::749b:ffff:fead:1d26",
              "fe80::74cd:59ff:fee6:f893",
              "fe80::74cd:cbff:fea3:ef4c",
              "fe80::74d9:dcff:fe38:2278",
              "fe80::78f0:f3ff:fe7e:af53",
              "fe80::88f2:8fff:fe2f:efb8",
              "fe80::8cef:37ff:fe61:2a3e",
              "fe80::90ac:72ff:febd:ba1",
              "fe80::9820:29ff:feb3:6335",
              "fe80::988e:8cff:fe72:f5e",
              "fe80::98cd:2dff:fe17:d5cd",
              "fe80::9cf7:beff:fee5:983f",
              "fe80::a051:fbff:fe80:d76f",
              "fe80::a0b6:d0ff:fe42:e4fa",
              "fe80::a0da:1dff:fe6a:8129",
              "fe80::a42c:48ff:fe48:f80d",
              "fe80::a4bc:e8ff:fe2c:d407",
              "fe80::a88b:a3ff:feda:48b8",
              "fe80::a8a5:7bff:fe24:75d8",
              "fe80::ac33:42ff:feb2:9059",
              "fe80::b08e:7ff:fedd:9ecc",
              "fe80::b0a9:dbff:fe37:da70",
              "fe80::b0eb:ffff:feca:154f",
              "fe80::b410:b4ff:fe89:dd1c",
              "fe80::b4e6:d6ff:fe00:4334",
              "fe80::b836:37ff:fe9b:c8b8",
              "fe80::b8a2:7dff:fe1f:ab96",
              "fe80::b8f2:9aff:fe56:5623",
              "fe80::bc5d:67ff:fe09:8c7c",
              "fe80::bc76:dcff:fed9:1364",
              "fe80::bcb1:85ff:fe7b:8239",
              "fe80::c039:dff:fec6:5290",
              "fe80::c0bd:90ff:fe17:a780",
              "fe80::c44c:bbff:fef8:2d05",
              "fe80::c84f:caff:fe0b:5a44",
              "fe80::c8aa:c7ff:fee5:dda0",
              "fe80::cc3e:49ff:fe79:e547",
              "fe80::ccd9:c4ff:fea9:8dcc",
              "fe80::d01f:9dff:fe49:898f",
              "fe80::d0bb:c1ff:fe11:81f6",
              "fe80::d437:f7ff:fec7:ed52",
              "fe80::d472:63ff:fed0:ff99",
              "fe80::d4b8:11ff:fe44:cdd9",
              "fe80::d4d6:60ff:fe4a:c292",
              "fe80::d4f7:56ff:fe14:d8cb",
              "fe80::e003:83ff:fe37:51e6",
              "fe80::e086:faff:feec:ec8a",
              "fe80::e0f8:9dff:fe78:81dc",
              "fe80::e0ff:77ff:fe03:7f39",
              "fe80::e41f:93ff:fef3:fb8a",
              "fe80::e443:33ff:fe47:493a",
              "fe80::e490:6dff:fe06:1844",
              "fe80::e847:dbff:fe9f:6c5e",
              "fe80::e895:d4ff:fea0:4930",
              "fe80::e8aa:68ff:fe5b:4a",
              "fe80::ec85:79ff:fe51:d634",
              "fe80::f026:99ff:fe79:641e",
              "fe80::f40f:dbff:fe73:b43e",
              "fe80::f8a4:74ff:fe56:995",
              "fe80::fc00:54ff:fe64:b7a9",
              "fe80::fcc5:59ff:fef6:7e06",
              "fe80::fcfd:cfff:fe55:31e0"
            ],

ruflin · 2023-06-26T11:00:28Z

These ip addresses are added by Beats / Elastic Agent AFAIK. Initially the idea was that 1 or 2 host ip addresses would be shipped. But k8s + ipv6 wreck havoc to the data we ship. What are all these ipv6 addresses? One for each container? For ipv6, should we skip all the fe80:: addresses? Which of the addresses is relevant?

@tommyers-elastic @gizas any thoughts on the above?

mlunadia · 2023-06-27T08:04:30Z

A quick search rendered that these are IPv6 link-local addresses

Link-local addresses are used for communication within a local network segment, such as a Kubernetes cluster. They are automatically assigned to network interfaces and are only valid within the local network segment.

In the case of Kubernetes containers, fe80 IP addresses are assigned to the containers' network interfaces for intra-cluster communication. Containers within the same network segment can use these link-local addresses to communicate with each other directly without the need for routing.

AFAIK these are not critical for most Observability use cases.
@tommyers-elastic @gizas is there an easy way to estimate the complexity of skipping these? Can we also determine what the non-fe80 ip addresses are for?
cc: @bturquet

gizas · 2023-06-27T08:49:13Z

I did some search today to be sure where this ips come from: Those are the nodes ips, so in other words the networking of the underlying host.

The metrics from an nginx pod

Node's networking:

4: veth038d14aa@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether aa:07:91:89:6d:0f brd ff:ff:ff:ff:ff:ff link-netns cni-fefa3105-06d0-56f5-302c-f5223545f4d3
    inet 10.244.0.1/32 scope global veth038d14aa
       valid_lft forever preferred_lft forever
5: vethebc54a3e@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether f2:12:90:39:f7:21 brd ff:ff:ff:ff:ff:ff link-netns cni-3b66ba8b-35bc-8292-9679-60fb0dea2237
    inet 10.244.0.1/32 scope global vethebc54a3e
       valid_lft forever preferred_lft forever
6: vethc8cbf2a8@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 6e:3d:06:e8:08:97 brd ff:ff:ff:ff:ff:ff link-netns cni-320ccfbc-5113-88ae-96ce-36be83b6692b
    inet 10.244.0.1/32 scope global vethc8cbf2a8
       valid_lft forever preferred_lft forever
7: vetha8e1da93@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether d2:be:13:f5:f2:9c brd ff:ff:ff:ff:ff:ff link-netns cni-d3fe6667-7f55-ae5f-978e-e2306aa7603c
    inet 10.244.0.1/32 scope global vetha8e1da93
       valid_lft forever preferred_lft forever
12: eth1@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:14:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.20.0.2/16 brd 172.20.255.255 scope global eth1
       valid_lft forever preferred_lft forever
14: eth0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.18.0.3/16 brd 172.18.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fc00:f853:ccd:e793::3/64 scope global nodad
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe12:3/64 scope link
       valid_lft forever preferred_lft forever

I can see the option of netinfo.enabled ( see docs) and refernece in code : https://github.com/elastic/beats/blob/main/libbeat/processors/add_host_metadata/add_host_metadata.go#L198 but could not make it work and disappear the host.ips either in beats or by adding the processor in agent. I will keep you posted for the updates on how to disable the host.ips

gizas · 2023-06-27T10:33:52Z

@cmacknz I see that add_host_metadata processor is responsible to add host.* fields and is enabled by default (https://www.elastic.co/guide/en/fleet/current/add_host_metadata-processor.html).

I have not found a way to override it t the moment in the Agent.
Can we use something like this: https://github.com/elastic/beats/blob/main/libbeat/processors/add_observer_metadata/config.go#L36?
Asking as I am trying to connect the pieces and understand the flow

cmacknz · 2023-06-27T12:10:43Z

The default set of processors each Beat runs when they are started by agent is defined in the code, the Beats don't read their own default configuration files when agent starts them. Here is the definition for Metricbeat:

https://github.com/elastic/beats/blob/e16de717459e5a62aa376427dd25d43441b5c582/x-pack/metricbeat/cmd/root.go#L68-L80

This is equal to the set of default global processors that are enabled in the default Metricbeat configuration file:

https://github.com/elastic/beats/blob/e16de717459e5a62aa376427dd25d43441b5c582/x-pack/metricbeat/metricbeat.yml#L123-L127

The problem right now is that an agent policy has no concept of a global processor today, so there is no place in the agent policy to expose these. This is something we plan to do, but there's no date set for it yet. https://github.com/elastic/ingest-dev/issues/2442 is the tracking issue. Even if we did have this, we'd want the change to the configuration here to be conditional on whether the agent is running on Kubernetes. This might actually be easier to do in code.

If you can come up with an alternate configuration, we would only want to apply it when the agent is running on k8s. Since the default processors are defined in code, if you have a function that can accurately detect that the agent runs on Kubernetes when these processor configurations are generated you can use it to conditionally change the add_host_metadata configuration for each Beat agent can start.

You could add an option to add_host_metadata to omit host.ip entirely on Kubernetes, limit the total number of reported IPs or interfaces that it polls, etc.

tommyers-elastic · 2023-06-28T08:04:31Z

late to the party here but just chiming in that from where i'm standing these IPs for sure just look like noise. if there was some useful mapping from ip<->resource then maybe they would be more useful. hopefully we get something like that as part of the asset management work.

gizas · 2023-06-28T08:56:00Z

To add more to the issue, I have repeated some more tests with a GKE cluster with 15 nodes and there you can see more host.ips added (I count sth like 64)

I agree is noise because you can find the same information also in the kubernetes.state_node entries.

I have tried to add processors inside the integration (in the module level):

Also :

- add_host_metadata:
      netinfo.enabled: false

It seems that they dont apply. I guess that global processors are applied at beats level so they apply last and this means that they override our configuration. So long story short, host.ip fields remain in the event

ChrsMark · 2023-06-28T09:03:02Z

Hey folks! Please have a look into elastic/elastic-agent#90. This seems to be the reason.
Long story short: Agent starts Beats with the default config files which enable the add_host_metadata processor, see https://github.com/elastic/beats/blob/718c9232cfa183f6a866ebcfa6401eae72346f0d/metricbeat/metricbeat.yml#L124. This processor will run in Beats level and hence after the processor that is running on Module's level.

We need a way to disable/tune the Beats' global level processors and that's what elastic/elastic-agent#90 is trying to address.

EDIT: check also #6674 (comment) comment which explains the same.

ruflin · 2023-06-28T09:43:07Z

Could we adjust the add_host_metadata processor to just not ship local ip addresses in the first place by default, no config needed?

cmacknz · 2023-06-28T09:55:26Z

Could we adjust the add_host_metadata processor to just not ship local ip addresses in the first place by default, no config needed?

As I suggested in #6674 (comment), yes but you need to make it conditional on detecting that the Beat is running on Kubernetes where this information is not useful. I would think that just removing the IP fields unconditionally would be a breaking change, technically it would be on k8s as well but it is highly unlikely anyone depends on these fields today.

gizas · 2023-06-29T10:29:38Z

Please find a workaround here https://github.com/elastic/beats/blob/fixinghostips/x-pack/metricbeat/cmd/root.go#L76-L93

I am in the process of building the image and testing e2e so will report my findings. But let me know if workaround is ok.
The idea is to check for kubernetes (I check for a path or if a specific k8s environmantal variable that is common is present and this will identify if I am installing in K8s ) and we have introduced a new environmental variable valueNETINFO that the users can use specifically to bypass add_host_metadata_processor.

Not the most elegant solution but what do you think?

ChrsMark · 2023-06-29T11:14:58Z

@gizas I think you don't need to check first for a k8s environment and then for the env var. Checking just for the env var directly should be enough.

The pros of checking just the env var is that we don't break anything for the existing users/configurations and then only users that want to use the NETIFNO: false env var will disable the addition of these data.

I would be ok with adding this but only as a temporary solution which means that we will create GH issue to keep track of this and find a better and more generic way to implement this.

To my mind we need a way to configure Beats through Agent and at the moment we are "locked" with the default configs which is really bad. @ruflin @cmacknz are there any plans to fix this? I thought that elastic/elastic-agent#90 and then https://github.com/elastic/ingest-dev/issues/2442 would address this but then #6674 (comment) mentions that https://github.com/elastic/ingest-dev/issues/2442 would not be enough. So in that case maybe we need to prioritize elastic/elastic-agent#90 directly?

gizas · 2023-06-29T11:47:27Z

Just to clarify it is an or not and.
So scenarios would be:

User defines nothing and we manage to identify k8s -- remove netinfo
User defines NETINFO=false -- remove netinfo
In any other scenario we -- keep netinfo.enabled: true so keep host.ips

But yes we can change the logic. This is more to prove that it is working.
And yes maybe is time to raise again the prioritisation discussion for "global processor" as also BY has it in its list

ChrsMark · 2023-06-29T13:37:23Z

User defines nothing and we manage to identify k8s -- remove netinfo

This would be a breaking change for users that today run on k8s and actually collecting the data we want to skip here.

gizas · 2023-07-05T15:36:52Z

I managed to build the image and in my local cluster this works for now:
Before removal of IPs:

After Removal of IPs:

So summary of above:

So let me know what would be the default behaviour we want to introduce in our code? I agree with @ChrsMark that if by default we remove the host.ips then this breaking change needs to be clearly documented
How about https://github.com/elastic/ingest-dev/issues/2442 ? Any info regarding prioritising this?
If we agree with this fix, then this will need testing with all cloud providers I guess, especially if we introduce the aut k8s recognition

cc @bturquet , @mlunadia for prioritisation

felixbarny · 2023-08-11T08:09:15Z

Using the disk usage API on both the system cpu and the kubernetes pod data stream (on edge-lite) revealed that half of the disk usage is due to the host.ip and the host.mac fields. I bet this also has a significant impact on indexing.

I think it's important that we find a solution where these fields don't have such an impact when using default configurations.

To keep the risk of breaking users at a minimum, and to make the implementation simple, I suggest we investigate the approach proposed by @cmacknz and @ruflin and remove non-interesting (local?) ip and mac addresses directly in the metadata processor go code if it detects that it's running on k8s.

gizas · 2023-09-05T14:03:34Z

Team I updated the tasklist of the story with latest updates. The fix is working and for now we enable/disable the netinfo only with related environmental variable NETINFO:false inside agent pod.

I have not managed to find a way to pass from the kubernetes Integration a config option to add-host-metadata processor. The add-host-metadata is initialised even before kubernetes processor and this does not allow us to pass config options to it.

So for now I propose only elastic/kibana#165700 as a mean to help users in managed mode. Any other ideas?

felixbarny · 2023-09-05T14:49:24Z

@gizas have you considered the proposal in my last message to change the host metadata processor to omit link-local IP addresses by default or by default when running inside a container? This would only require changes in the add_host_metadata processor.

gizas · 2023-09-06T13:05:11Z

@felixbarny thanks again for reminder, see my last udpate in PR, now all tests seem to work with changes in the add_host_metadata processor only

## Summary This PR add the environmental veriable ELASTIC_NETINFO in the managed and standalone manifests of Elasitc agent. The variable has been introduced here elastic/elastic-agent#3354 The reason for the introduction of the new variable ELASTIC_NETINFO:false by default in the manifests, is related with the work done elastic/integrations#6674

gizas · 2023-10-27T08:41:03Z

The Kubernetes Manifests will set ELASTIC_NETINFO:false by default.
We have decided not to do elastic/kibana#165700 for now. If we think that there is an urgency we can plan accordingly

I am closing this issue for now as related work is done

felixbarny · 2023-10-27T08:48:20Z

@gizas in which cases is host.ip still added by default now?

gizas · 2023-10-27T08:51:25Z

@felixbarny in both managed and standalone manifests (https://github.com/elastic/kibana/pull/166156/files) the variable is false.

So host.ip should not be added nowhere by default.

felixbarny added the Team: Cloud Native Integrations label Jun 22, 2023

gizas self-assigned this Jul 5, 2023

mlunadia mentioned this issue Aug 11, 2023

[Meta] Review Kubernetes integration ingest defaults to what we define as sensible defaults #7364

Open

gizas mentioned this issue Sep 6, 2023

Adding information for NETINFO environmental variable elastic/ingest-docs#463

Merged

gizas mentioned this issue Sep 11, 2023

Set env variable ELASTIC_NETINFO:false in Kibana elastic/kibana#166156

Merged

gizas mentioned this issue Oct 6, 2023

Updating k8s pod and container templates with no host ips and realistic hostnames elastic/elastic-integration-corpus-generator-tool#111

Merged

gizas closed this as completed Oct 27, 2023

andrewkroh added Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team [elastic/obs-cloudnative-monitoring] and removed Team: Cloud Native Integrations labels Sep 18, 2024

cmacknz mentioned this issue Oct 31, 2024

Share host info between packages elastic/elastic-agent#5884

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider not adding host.ip metadata to k8s container metrics by default #6674

Consider not adding host.ip metadata to k8s container metrics by default #6674

felixbarny commented Jun 22, 2023 •

edited by gizas

Loading

Tasks

martijnvg commented Jun 23, 2023

ruflin commented Jun 26, 2023

mlunadia commented Jun 27, 2023

gizas commented Jun 27, 2023

gizas commented Jun 27, 2023

cmacknz commented Jun 27, 2023

tommyers-elastic commented Jun 28, 2023

gizas commented Jun 28, 2023

ChrsMark commented Jun 28, 2023 •

edited

Loading

ruflin commented Jun 28, 2023

cmacknz commented Jun 28, 2023

gizas commented Jun 29, 2023 •

edited

Loading

ChrsMark commented Jun 29, 2023

gizas commented Jun 29, 2023 •

edited by ChrsMark

Loading

ChrsMark commented Jun 29, 2023

gizas commented Jul 5, 2023

felixbarny commented Aug 11, 2023

gizas commented Sep 5, 2023

felixbarny commented Sep 5, 2023

gizas commented Sep 6, 2023

gizas commented Oct 27, 2023 •

edited

Loading

felixbarny commented Oct 27, 2023

gizas commented Oct 27, 2023

Consider not adding host.ip metadata to k8s container metrics by default #6674

Consider not adding host.ip metadata to k8s container metrics by default #6674

Comments

felixbarny commented Jun 22, 2023 • edited by gizas Loading

Tasks

martijnvg commented Jun 23, 2023

ruflin commented Jun 26, 2023

mlunadia commented Jun 27, 2023

gizas commented Jun 27, 2023

gizas commented Jun 27, 2023

cmacknz commented Jun 27, 2023

tommyers-elastic commented Jun 28, 2023

gizas commented Jun 28, 2023

ChrsMark commented Jun 28, 2023 • edited Loading

ruflin commented Jun 28, 2023

cmacknz commented Jun 28, 2023

gizas commented Jun 29, 2023 • edited Loading

ChrsMark commented Jun 29, 2023

gizas commented Jun 29, 2023 • edited by ChrsMark Loading

ChrsMark commented Jun 29, 2023

gizas commented Jul 5, 2023

felixbarny commented Aug 11, 2023

gizas commented Sep 5, 2023

felixbarny commented Sep 5, 2023

gizas commented Sep 6, 2023

gizas commented Oct 27, 2023 • edited Loading

felixbarny commented Oct 27, 2023

gizas commented Oct 27, 2023

felixbarny commented Jun 22, 2023 •

edited by gizas

Loading

ChrsMark commented Jun 28, 2023 •

edited

Loading

gizas commented Jun 29, 2023 •

edited

Loading

gizas commented Jun 29, 2023 •

edited by ChrsMark

Loading

gizas commented Oct 27, 2023 •

edited

Loading