Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libbeat] add_cloud_metadata processor logs confusing warnings for AWS even when running on other CSPs #35147

Closed
tommyers-elastic opened this issue Apr 20, 2023 · 6 comments
Labels
bug Team:Cloud-Monitoring Label for the Cloud Monitoring team

Comments

@tommyers-elastic
Copy link
Contributor

When running filebeat on GCP for example, we get log messages at WARN level from the AWS EC2 provider, which could be confusing to the user.

e.g.

{"log.level":"warn","@timestamp":"2023-04-10T15:51:30.060Z","message":"error when check request status for getting IMDSv2 token: http request status 404. No token in the metadata request will be used.","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"httpjson-zocalo_logs","type":"httpjson"},"log":{"source":"httpjson-zocalo_logs"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"add_cloud_metadata","log.origin":{"file.line":97,"file.name":"add_cloud_metadata/provider_aws_ec2.go"},"ecs.version":"1.6.0"} 
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 20, 2023
@tommyers-elastic tommyers-elastic added the Team:Cloud-Monitoring Label for the Cloud Monitoring team label Apr 20, 2023
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Apr 20, 2023
@tommyers-elastic tommyers-elastic changed the title [libbeat] add_cloud_metadata processor logs warnings confusing warnings for AWS even when running on other CSPs [libbeat] add_cloud_metadata processor logs confusing warnings for AWS even when running on other CSPs Apr 20, 2023
@tommyers-elastic
Copy link
Contributor Author

tommyers-elastic commented Apr 21, 2023

the way this processor works is kinda weird. for example, even if the token request fails, a fetcher is still added to the list.

if we are running outside a CSP, we end up with two separate errors (1 for the token, then 1 for the metadata request). the token timeout logs a warn log (e.g. read token request for getting IMDSv2 token returns empty: Put \"http://169.254.169.254/latest/api/token\": context deadline exceeded (Client.Timeout exceeded while awaiting headers). No token in the metadata request will be used.), the metadata reqeust logs a debug log (e.g. add_cloud_metadata: received disposition for aws after 1.140125ms. result=[provider:aws, error=failed requesting aws metadata: Get \"http://169.254.169.254/2014-02-25/dynamic/instance-identity/document\": dial tcp 169.254.169.254:80: connect: host is down, metadata={}].

it would be preferable not to add the fetcher if the token request fails.

if we are running in a CSP, then the token request fails due to 404, but the fetch happens, again logging a debug message.

a reasonable solution to this IMO would be to handle the 404s in the token requester as a special case, and in the case of any token request failures, avoid adding the fetcher at all. this way we log no AWS related errors at all when running on other CSPs. it still means we log errors when running outside of CSPs, but i'd say these are expected if you're running the add_cloud_metadata processor outside of a CSP.

@zmoog
Copy link
Contributor

zmoog commented Apr 28, 2023

The add_cloud_metadata is a default processor that cannot be removed on a managed agent right now. So even on-prem, Agent users would see the AWS provider warning.

The warnings come from the getIMDSv2Token() function. @tommyers-elastic, IIRC, you think we should keep this log level here, right?

@zmoog
Copy link
Contributor

zmoog commented Apr 28, 2023

@aspacca, I see you introduced the support for IMDSv2 in the processors: do you have thoughts on this topic? Would change the log level from "warning" to "debug" make sense? Or considering the timeout error condition as a "we are not on AWS" hint?

@aspacca
Copy link

aspacca commented May 1, 2023

@zmoog the add_cloud_metadata processor runs for every cloud provider, once enabled.
you are right: the warning in most of the case is a false positive.
we also had the case were one provider metadata fetcher was "matching" the endpoint of another provider and returns misleading metadata (it was the openstack/huawei ones if I remember correctly)

ì

Would change the log level from "warning" to "debug" make sense? Or considering the timeout error condition as a "we are not on AWS" hint?

I'm fine with both: while it will be expected that most of the time the warning will be a false positive, hiding it at debug level could make more difficult to spot when it's a true positive. just minor, if we opt for the hint "we might not be on AWS"

@andrewkroh
Copy link
Member

This issue appears to be related to the number (2) bullet in #33058.

@aspacca
Copy link

aspacca commented Feb 27, 2024

seems to be fixed with #36829

confirmed closing the issue

@aspacca aspacca closed this as completed Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Team:Cloud-Monitoring Label for the Cloud Monitoring team
Projects
None yet
Development

No branches or pull requests

4 participants