fluentbit missing logs in aws cloudwatch #1080

mukshe01 · 2024-03-25T12:19:22Z

Hi Team,

We are running fluentbit to push application logs from our kubernates cluster(eks cluster with ec2 machines as k8s nodes) to cloudwatch, recently we observed some log entries are missing in cloudwatch when system is on high load.

below is fluentbit config:

fluent-bit.conf:
[SERVICE]
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_PORT 2020
Health_Check On
HC_Errors_Count 5
HC_Retry_Failure_Count 5
HC_Period 5

Parsers_File /fluent-bit/parsers/parsers.conf
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/.log
DB /var/log/flb_kube.db
Parser docker
Docker_Mode On
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
[FILTER]
Name kubernetes
Match kube.
Kube_URL https://kubernetes.default.svc.cluster.local:443
Merge_Log On
Merge_Log_Key data
Keep_Log On
K8S-Logging.Parser On
K8S-Logging.Exclude On
Buffer_Size 2048k
[OUTPUT]
Name cloudwatch_logs
Match *
region us-east-1
log_group_name /aws/containerinsights/one-source-qa-n5p1P1d1/application-new
log_stream_prefix fluentbit-
log_stream_template $kubernetes['namespace_name'].$kubernetes['container_name']
auto_create_group true

we installed fluentbit in our k8s cluster using helm chart.
https://github.com/aws/eks-charts/tree/master/stable/aws-for-fluent-bit
fluentbit appVersion: 2.31.11
helm chart version: 0.1.28

we are seeing two types of errors in fluentbit log

2024-03-19T11:32:29.235887301Z stderr F [2024/03/19 11:32:29] [ info] [input:tail:tail.0] inode=26222862 handle rotation(): /var/log/containers/rest-api-qa-954d864f9-smkv5_participant1-qa_rest-api-c5dac2e01
1fe0f093560b815135fff49dfade0835e22fd71c88aed4fa4d86439.log => /var/log/pods/participant1-qa_rest-api-qa-954d864f9-smkv5_319b4e14-e50c-44c6-86ff-558547bbcb3c/rest-api/0.log.20240319-113228
2024-03-19T11:32:29.488386964Z stderr F [2024/03/19 11:32:29] [ info] [input] tail.0 resume (mem buf overlimit)
2024-03-19T11:32:49.909327531Z stderr F [2024/03/19 11:32:49] [ info] [input] tail.0 resume (mem buf overlimit)
2024-03-19T11:32:49.911154349Z stderr F [2024/03/19 11:32:49] [error] [plugins/in_tail/tail_file.c:1432 errno=2] No such file or directory
2024-03-19T11:32:49.911160979Z stderr F [2024/03/19 11:32:49] [error] [plugins/in_tail/tail_fs_inotify.c:147 errno=2] No such file or directory
2024-03-19T11:32:49.911163819Z stderr F [2024/03/19 11:32:49] [error] [input:tail:tail.0] inode=26222863 cannot register file /var/log/containers/rest-api-qa-954d864f9-smkv5_participant1-qa_rest-api-c5dac2e011fe0f093560b815135fff49dfade0835e22fd71c88aed4fa4d86439.log

also many occurances of this(our mem buffer config is Mem_Buf_Limit) when system is on high load:

2024-03-20T13:29:12.624465969Z stderr F [2024/03/20 13:29:12] [ warn] [input] tail.0 paused (mem buf overlimit)
2024-03-20T13:29:12.915368764Z stderr F [2024/03/20 13:29:12] [ info] [input] tail.0 resume (mem buf overlimit)
2024-03-20T13:29:12.923306843Z stderr F [2024/03/20 13:29:12] [ warn] [input] tail.0 paused (mem buf overlimit)
2024-03-20T13:29:12.954591621Z stderr F [2024/03/20 13:29:12] [ info] [input] tail.0 resume (mem buf overlimit)
2024-03-20T13:29:12.956495689Z stderr F [2024/03/20 13:29:12] [ warn] [input] tail.0 paused (mem buf overlimit)
2024-03-20T13:29:13.527593998Z stderr F [2024/03/20 13:29:13] [ info] [input] tail.0 resume (mem buf overlimit)

fyi: kubernates rotates container logs when it gets 10 MB, when system runs high load the log rotation is very frequent.

would you check our config and let us know how we can avoid missing logs in cloudwatch?. please let us know if you need anymore info from us.

Regards
Shekhar

alanwu4321 · 2024-06-14T07:43:53Z

in ConfigMap aws-for-fluent-bit

I had to add auto_create_group true to the bottom, restart the pod, then it worked

[OUTPUT]
    Name                  cloudwatch_logs
    Match                 *
    region                ap-northeast-1
    log_group_name        /aws/eks/ca-prod/aws-fluentbit-logs
    log_stream_prefix     fluentbit-

jdinsel-xealth · 2024-07-12T16:56:01Z

Have you inspected the fluent-bit containers for their use of CPU or considered increasing the resource settings in the chart?

mukshe01 added the bug Something isn't working label Mar 25, 2024

bryantbiggs linked a pull request Oct 3, 2024 that will close this issue

chore: Deprecate aws-for-fluent-bit chart in favor of publishing to Public ECR from upstream repo #1168

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fluentbit missing logs in aws cloudwatch #1080

fluentbit missing logs in aws cloudwatch #1080

mukshe01 commented Mar 25, 2024

alanwu4321 commented Jun 14, 2024

jdinsel-xealth commented Jul 12, 2024

fluentbit missing logs in aws cloudwatch #1080

fluentbit missing logs in aws cloudwatch #1080

Comments

mukshe01 commented Mar 25, 2024

alanwu4321 commented Jun 14, 2024

jdinsel-xealth commented Jul 12, 2024