-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[storage layer] Fluent-bit crashes on high contention if corrupted chunks exists #1950
Comments
thanks for reporting this issue. Are you able to reproduce the problem only with chunk if so, can you provide me that chunk ? (can be in private by email, slack or another if required) |
Thanks @edsiper for looking into this. I don't have that pod or file anymore. I can verify when it pops up again. One thing I forgot to mention, on the node that was crashing consistently on start after loading 1.2 GB of storage backlog, we've increased |
Good news: I was able to reproduce the issue Troubleshooting now |
I am having a similar problem after enabling filesystem storage in 1.3.7
|
I've added the proper fixes on 1.3 branch, would you please deploy and evaluate the following test image?
if you confirm is OK, I will proceed with a formal release. note: that image is based in Ubuntu and heavier than the normal one. |
Thanks for a quick fix @edsiper! Where can I get the custom release? Normally we use apt-get to install it from https://packages.fluentbit.io/ubuntu/bionic |
Ah! I thought you were using containers, which distro?
…On Fri, Feb 14, 2020, 18:02 Darek Grala ***@***.***> wrote:
Thanks for a quick fix @edsiper <https://github.com/edsiper>!
Where can I get the custom release? Normally we use apt-get to install it
from https://packages.fluentbit.io/ubuntu/bionic
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1950?email_source=notifications&email_token=AAC2INSTEBVJLTEZWM6FUFDRC4WIVA5CNFSM4KU4HKOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL22YJI#issuecomment-586525733>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAC2INVIFSAPY2A6FL3MNETRC4WIVANCNFSM4KU4HKOA>
.
|
Please give it a try to the following TEST PACKAGE: DISCLAIMER: THIS IS NOT AN OFFICIAL RELEASE! |
@dgrala do you have any feedback ^ ? |
Thanks @edsiper! We do run containers, but we use apt-get to build them. Thanks for the test package. Anything in particular we should watch out for? We don't have any decent test environment right now, so it will take us some time to set things up. |
A new issue we're seeing today is td-agent-bit hanging making no progress #1958 - not sure if related, we're still on the old version |
Thanks, haven't seen crashes lately. We've updated to 1.3.8 today. |
Bug Report
Describe the bug
It might be related to corrupted files in storage, or just a large backlog there. On start, fluentbit will try to load all the files, and crash soon afterwards with segfault.
To Reproduce
One example we had 1.2GB files stuck in storage. It repro'ed consistently. I moved the files out of storage, ran again, it didn't crash. When I moved them back, it didn't seem to crash. Might be hard to repro, but seems to happen regularly in our prod.
Expected behavior
No crashes
Screenshots
Your Environment
Problematic input (one of 3):
Environment name and version (e.g. Kubernetes? What version?):
k8s
Server type and version:
Operating System and version:
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
Filters and plugins:
Additional context
Ideally we would see no crashes. Seems after any crash, the storage files are corrupted and can't be loaded by subsequent runs.
The text was updated successfully, but these errors were encountered: