-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fluent-bit 1.8.3 occasional crashes (caught signal SIGSEGV) #3955
Comments
The SIGSEGV seems to be a common topic in a lot of the issues. We're having the exact same behaviour in EKS, running as DaemonSet. The Pods just crash with exit code 139, a lot of the times right after start. Absolutely nothing in the logs:
|
Tried to downgrade all the way to version 1.6.0 and the issue went away. So the problem seems to be introduced somewhere between these versions. |
+1 .. we are struggling with the same .. Our CPUs spike to 100%. Stopping and starting the service via Completely unclear what the cause is, happening at random times of day/load. No further information found in "DEBUG level" logs. We have not yet tried previous versions. We have only recently begun experiencing this issue. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Had all kind of different issues since day one with fluent-bit and I have it enough, I've simply switched to Filebeat OSS and everything is running perfectly smooth, not a single issue so far. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Anything new on this, is this issue resolved? |
I sent a patch #4197 to fix Could you check it ? |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This issue was closed because it has been stalled for 5 days with no activity. |
Bug Report
Describe the bug
Fluent Bit running as systemd unit occasional crashes without being able to automatically recover, the only way to recover it is by removing DB and WAL files and then manually restarting the service unit.
Steps to reproduce the problem
There is no easy way to reproduce it, because it is not quite clear to me what is causing the issue.
Expected behavior
Be able to automatically recover by systemd (
Restart=always
) without removing DB and WAL files.Your Environment
1.8.3
systemd 219
AWS EC2
Amazon Linux 2
record_modifier
andnone
Additional context
We have to automate restarts (remove DB and WAL files and then restart the service) to prevent further dropping of logs data.
The text was updated successfully, but these errors were encountered: