Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Elastic Agent with the k8s integration on AKS causes error spam #1614

Closed
Happycoil opened this issue Oct 26, 2022 · 11 comments · Fixed by elastic/beats#33697
Closed

Using Elastic Agent with the k8s integration on AKS causes error spam #1614

Happycoil opened this issue Oct 26, 2022 · 11 comments · Fixed by elastic/beats#33697
Assignees
Labels
8.6-candidate bug Something isn't working Team:Elastic-Agent Label for the Agent team v8.5.0

Comments

@Happycoil
Copy link

Using the default kubernetes integration on AKS, or apparently any k8s cluster using containerd, causes a lot of spam from filebeat with the log message [elastic_agent.filebeat][error] Error extracting container id - source value does not contain matcher's logs_path '/var/lib/docker/containers/'.. The spam is so intense it overloads the agents, causing them to ping-pong between healthy and unhealthy. Increasing their resource limits only intensifies the spam.

I've just reconfirmed that this issue persists in a new Elastic Cloud instance and pointed it at a very vanilla AKS cluster. This is going to happen for every single customer of Elastic Cloud who deploys Elastic Agent to AKS.

This has been brought up in several issues and discuss threads over some time:
elastic/beats#27216
elastic/beats#27216 (comment)
#90
https://discuss.elastic.co/t/elastic-agent-fiebeat-error-spam/301206
https://discuss.elastic.co/t/elastic-agent-filebeat-logs-spams-error-messages-and-overflows-the-memory/289188

You can find more threads on discuss by searching for the error message, but none of the threads get answered.

We need some kind of confirmed workaround for this.

@fludo
Copy link

fludo commented Oct 26, 2022

As stated in #90, this is a blocker to the use Elastic agent on AKS:"
"Docker is no longer supported as of September 2022. For more information about this deprecation, see the AKS release notes."
source: https://learn.microsoft.com/en-us/azure/aks/cluster-configuration

@Happycoil
Copy link
Author

I expect any container runtime which doesn't use the /var/lib/docker/containers path will encounter this, right? So all we need is some way of configuring log.file.path to /var/lib/containerd/io.containerd.grpc.v1.cri/containers or whatever.

Or is there a more reliable way of extracting the container ID?

@fludo
Copy link

fludo commented Nov 15, 2022

Any chance to get this on 8.5.x ? Otherwise we would need to investigate other solutions for AKS monitoring.

@iamjosh007
Copy link

iamjosh007 commented Nov 15, 2022

Running elastic agent in k8s is night mare for us for last few months and we end up dropping all non essential container and agent logs but the issue we have been observing is missing check-ins via agent logs and agents turning unhealthy and Huge memory consumption for all agents. Ref case below.

Ref case - #1708 (comment)

@cmacknz cmacknz added the v8.5.0 label Nov 15, 2022
@rdner
Copy link
Member

rdner commented Nov 16, 2022

I'll do my best to merge a fix by the end of this week.

@fludo
Copy link

fludo commented Nov 17, 2022

I expect any container runtime which doesn't use the /var/lib/docker/containers path will encounter this, right? So all we need is some way of configuring log.file.path to /var/lib/containerd/io.containerd.grpc.v1.cri/containers or whatever.

Or is there a more reliable way of extracting the container ID?

Is it really fixed because it read in elastic/beats#33697 that this is a partial fix and a proper AKS/containerd support is required ?
If we cannot exploit the full potential of elastic agent on AKS we will have to go with other solutions.

@iamjosh007
Copy link

This one is closed, @cmacknz - can we please have a new ticket opened to address all AKS Elastic Agent issues?

@rdner
Copy link
Member

rdner commented Nov 17, 2022

@fludo this issue is about the error spam that was caused by the wrong log level. This is fixed by elastic/beats#33697.

If we cannot exploit the full potential of elastic agent on AKS we will have to go with other solutions.

could you elaborate more on your expectations of exploiting the full potential?

I might need to explain the cause of the issue – the error spam was caused by:

  1. Filebeat detected it was running on K8s
  2. Because of that Filebeat enabled the add_kubernetes_metadata processor and added a default matcher to it which was scanning each incoming event and tried to match its source file path (log.file.path key) to the predefined path /var/lib/docker/containers
  3. If any event, no matter from what file it came, didn't have /var/lib/docker/containers in its path it mistakenly produced an error instead of a debug message. This issue was not even AKS specific, it would occur on any Kubernetes environment where someone decided to ingest something other than Docker container logs.

This matcher has the only purpose – to extract a container/pod ID from a filename of a container log file created by Docker (Docker puts this IDs in the filename).

containerd, on the other hand, does not seem to store container logs in files but, by default, it's using journald for that. This means that right now I don't see a simple way how to ingest container logs on AKS with containerd. Even if it's possible to configure containerd to store its logs in files, filenames would need to have the same pattern as the files Docker creates.

This is what I meant by the proper containerd support – to improve our K8s integration so it supports ingesting container logs when the container engine is containerd.

But you are still able to ingest any file you want from inside a container running on AKS. It's just you would not have automatically enriched container/pod ID in your events.

I hope this clarifies.

@Happycoil
Copy link
Author

@rdner @cmacknz this fix didn't make it into 8.5.2. Is it possible to expedite a patch?

@rdner
Copy link
Member

rdner commented Nov 23, 2022

@Happycoil it's in the 8.5 branch and will be released with the next 8.5.3 version.

@cmacknz
Copy link
Member

cmacknz commented Nov 23, 2022

Yes there will be an 8.5.3 soon to make up for the short gap between 8.5.1 an 8.5.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.6-candidate bug Something isn't working Team:Elastic-Agent Label for the Agent team v8.5.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants