Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fluentbit missing logs in aws cloudwatch #1080

Open
mukshe01 opened this issue Mar 25, 2024 · 2 comments · May be fixed by #1168
Open

fluentbit missing logs in aws cloudwatch #1080

mukshe01 opened this issue Mar 25, 2024 · 2 comments · May be fixed by #1168
Labels
bug Something isn't working

Comments

@mukshe01
Copy link

Hi Team,

We are running fluentbit to push application logs from our kubernates cluster(eks cluster with ec2 machines as k8s nodes) to cloudwatch, recently we observed some log entries are missing in cloudwatch when system is on high load.

below is fluentbit config:

fluent-bit.conf:
[SERVICE]
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_PORT 2020
Health_Check On
HC_Errors_Count 5
HC_Retry_Failure_Count 5
HC_Period 5

Parsers_File /fluent-bit/parsers/parsers.conf
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/.log
DB /var/log/flb_kube.db
Parser docker
Docker_Mode On
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
[FILTER]
Name kubernetes
Match kube.
Kube_URL https://kubernetes.default.svc.cluster.local:443
Merge_Log On
Merge_Log_Key data
Keep_Log On
K8S-Logging.Parser On
K8S-Logging.Exclude On
Buffer_Size 2048k
[OUTPUT]
Name cloudwatch_logs
Match *
region us-east-1
log_group_name /aws/containerinsights/one-source-qa-n5p1P1d1/application-new
log_stream_prefix fluentbit-
log_stream_template $kubernetes['namespace_name'].$kubernetes['container_name']
auto_create_group true

we installed fluentbit in our k8s cluster using helm chart.
https://github.com/aws/eks-charts/tree/master/stable/aws-for-fluent-bit
fluentbit appVersion: 2.31.11
helm chart version: 0.1.28

we are seeing two types of errors in fluentbit log

2024-03-19T11:32:29.235887301Z stderr F [2024/03/19 11:32:29] [ info] [input:tail:tail.0] inode=26222862 handle rotation(): /var/log/containers/rest-api-qa-954d864f9-smkv5_participant1-qa_rest-api-c5dac2e01
1fe0f093560b815135fff49dfade0835e22fd71c88aed4fa4d86439.log => /var/log/pods/participant1-qa_rest-api-qa-954d864f9-smkv5_319b4e14-e50c-44c6-86ff-558547bbcb3c/rest-api/0.log.20240319-113228
2024-03-19T11:32:29.488386964Z stderr F [2024/03/19 11:32:29] [ info] [input] tail.0 resume (mem buf overlimit)
2024-03-19T11:32:49.909327531Z stderr F [2024/03/19 11:32:49] [ info] [input] tail.0 resume (mem buf overlimit)
2024-03-19T11:32:49.911154349Z stderr F [2024/03/19 11:32:49] [error] [plugins/in_tail/tail_file.c:1432 errno=2] No such file or directory
2024-03-19T11:32:49.911160979Z stderr F [2024/03/19 11:32:49] [error] [plugins/in_tail/tail_fs_inotify.c:147 errno=2] No such file or directory
2024-03-19T11:32:49.911163819Z stderr F [2024/03/19 11:32:49] [error] [input:tail:tail.0] inode=26222863 cannot register file /var/log/containers/rest-api-qa-954d864f9-smkv5_participant1-qa_rest-api-c5dac2e011fe0f093560b815135fff49dfade0835e22fd71c88aed4fa4d86439.log

also many occurances of this(our mem buffer config is Mem_Buf_Limit) when system is on high load:

2024-03-20T13:29:12.624465969Z stderr F [2024/03/20 13:29:12] [ warn] [input] tail.0 paused (mem buf overlimit)
2024-03-20T13:29:12.915368764Z stderr F [2024/03/20 13:29:12] [ info] [input] tail.0 resume (mem buf overlimit)
2024-03-20T13:29:12.923306843Z stderr F [2024/03/20 13:29:12] [ warn] [input] tail.0 paused (mem buf overlimit)
2024-03-20T13:29:12.954591621Z stderr F [2024/03/20 13:29:12] [ info] [input] tail.0 resume (mem buf overlimit)
2024-03-20T13:29:12.956495689Z stderr F [2024/03/20 13:29:12] [ warn] [input] tail.0 paused (mem buf overlimit)
2024-03-20T13:29:13.527593998Z stderr F [2024/03/20 13:29:13] [ info] [input] tail.0 resume (mem buf overlimit)

fyi: kubernates rotates container logs when it gets 10 MB, when system runs high load the log rotation is very frequent.

would you check our config and let us know how we can avoid missing logs in cloudwatch?. please let us know if you need anymore info from us.

Regards
Shekhar

@mukshe01 mukshe01 added the bug Something isn't working label Mar 25, 2024
@alanwu4321
Copy link

in ConfigMap aws-for-fluent-bit

I had to add auto_create_group true to the bottom, restart the pod, then it worked

[OUTPUT]
    Name                  cloudwatch_logs
    Match                 *
    region                ap-northeast-1
    log_group_name        /aws/eks/ca-prod/aws-fluentbit-logs
    log_stream_prefix     fluentbit-

@jdinsel-xealth
Copy link

Have you inspected the fluent-bit containers for their use of CPU or considered increasing the resource settings in the chart?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants