Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: cloudwatch_logs templating failure can lead to a very large number of warn logs #599

Open
PettitWesley opened this issue Mar 21, 2023 · 8 comments

Comments

@PettitWesley
Copy link
Contributor

See: fluent/fluent-bit#6918

The warning for record_accessor failure here can be emitted for every single log record processed: https://github.com/fluent/fluent-bit/blob/master/plugins/out_cloudwatch_logs/cloudwatch_api.c#L1084

It'd be ideal if this was only emitted once per batch instead. This will require code changes.

@PettitWesley
Copy link
Contributor Author

For 2.0 there are workarounds: fluent/fluent-bit#6918 (comment)

@PettitWesley
Copy link
Contributor Author

This is a concern for the upcoming Daemon support launch, since it makes extensive use of templating which can fail .

@Mattie112
Copy link

Yes we also had this issue costing quite a bit (and yes we now have alerts). Would be great to have a solution indeed!

@petemounce
Copy link

It's not only templating that can fail with this impact, but also access denied when attempting to (say) create a log stream (where fluent bit can create the log stream for its own logs for whatever misconfiguration reason).

@PettitWesley
Copy link
Contributor Author

The new log suppress feature in 2.0 is good for the CW access denied and other errors. Which suppresses the same message from a single plugin.

For the templating failures, the message actually comes from a core library which I think is not part of the log suppress feature.

@lexsca
Copy link

lexsca commented Sep 27, 2023

we ran into this as well and racked up massive cloudwatch changes. the default values should not have any template variables that aren't guaranteed to be there. people will probably keep tripping over this. 😐

@PettitWesley
Copy link
Contributor Author

When Fluent Bit is deployed as a kubernetes daemonset pod, it collects its own logs, and thus, when it emits an error message, it will collect its own error message, potentially leading to a cycle of log spam in which each of its own error logs cause it to produce another error log.

Key cases:

  1. This issue- templating failure.
  2. Log stream creation failure or API failure in general: if FLB can not create log streams or uploads, then this will lead to error logs, and then more error logs for the failure to upload those error logs.

@PettitWesley
Copy link
Contributor Author

See one option for workaround here: fluent/fluent-bit#6918 (comment)

PettitWesley added a commit to PettitWesley/eks-charts that referenced this issue Nov 14, 2023
PettitWesley added a commit to aws/eks-charts that referenced this issue Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@petemounce @Mattie112 @lexsca @PettitWesley and others