-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segfault in 1.7.9 #3687
Comments
Just to confirm we are also experiencing Fluent-bit pods CrashLoop after a few minutes connecting to fluentd service in cluster. Fluent-bit helm chart version: 0.15.15
Reverting helm chart to version 0.15.14, image version 1.7.8 resolves the issue. Fluent-bit config:
|
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
/remove-lifecycle stale |
We're experiencing this issue as well. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
/remove-lifecycle stale |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This issue was closed because it has been stalled for 5 days with no activity. |
Bug Report
Originally reported in #3661, but it seems to be somewhat different from the
getaddrinfo()
issue.When starting fluent-bit, which is configured to forward logs to a fluentd server, the client segfaults after ~15s:
I have observed this on 7 different servers, all running v1.7.9. They all had quite a few log messages in the queue.
Note that some entries actually make it through to fluentd
Configuration:
This is on EC2 HVM instances running debian stretch.
My guess would be that once fluentd starts throttling new connection attempts, fluent-bit is unable to handle that, or that once the bandwidth is sufficiently utilized (because of all the logs that are sent) too much delay in getaddrinfo() causes errors which are then not handled correctly.
Downgrading to 1.7.8 fixes the problem consistently across all 7 servers.
The text was updated successfully, but these errors were encountered: