-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fluent Bit latest version (v1.8.9) throws broken pipe and connection errors on high load #4332
Comments
Hi @hossain-rayhan, I'll take a look at this but it'd be good if you could help me out a bit, is there any chance you can contact me in the fluent slack server? That would probably make the reproduction and RCA process way faster. My username there is Leonardo Almiñana. Cheers. |
Pinged on slack. Let me know how I can help. |
Following @edsiper suggestion, I also tested with the latest branch 1.8 and observed similar behavior. I can confirm it still happens with the latest codebase. |
@leonardo-albertovich I added reproduce steps in the issue describption. Please check and let me know if it helps. I also tested with the latest branch 1.8 and still observing the same behavior. |
One more observaion is, it works fine when the load is 5 Mb/s. However, when the load is 8Mb/s, it starts to throw the Broken Pipe and connection errors. @leonardo-albertovich Did you get time to check the reproduce steps? |
@leonardo-albertovich @edsiper this issue is blocking some AWS customers from upgrading to 1.8.x and use multiline support feature. Can you expedite this? |
I tested for 10 minutes as well. Still the same behavior. Also I don't think it's Firehose specific issue becuse the older Fluent Bit v1.7.5 works fine even for a load up to 50Mb/s. Also, how do define the debug message Error Logs:
|
Might be related that firehose autoscales, and in the beginning there is just not enough capacity to send it through. As for why it works for an older version, I do not know. Somehow that one sends it differently or does not reports the broken pipes. I am also seeing this, however there is not much data loss on the other side. I did capture the data in the firehose and mostly it did came through... |
Ok, I am seeing now that you responded to my hurried and then deleted comment. One can:
And sorry to just talk and not test more deterministically and substantially... |
@matthewfala is working on this. |
We are working on a solution to this issue which you can track also here: aws/aws-for-fluent-bit#288. If you would like to test the patch which may resolve your issues, please use the following image: If you try out the patch, please let us know if problems are resolved. There may be some instabilities with the patch, since it is still in testing. If you find any, please let me know and potentially post the debug logs so it can be resolved. |
The proposed solution above was accepted and merged into Fluent Bit 1.9 #4869. This should resolve broken pipe and connection errors on Kinesis Streams, Firehose, and S3 plugins as well as other, non-aws Fluent Bit plugins. The solution prioritizes internal Fluent Bit events to complete already started tasks above starting new tasks. This helps keep delay times minimal by reducing the amount of concurrent work. @PettitWesley This issue can be resolved now. |
Bug Report
Describe the bug
We are doing a performance test to send logs to Amazon Kinesis Firehose using the core kinesis_firehose plugin. We are sending logs at a rate of 20Mb/sec for 600 seconds where each record is ~1Kb. FluentBit versions v1.8.0 and higher seem to throw a lot of Broken pipe and network connectivity issues. However older versions work fine and can send all the logs to Kinesis Firehose without any issue. I can confirm v1.7.5 with an SSL error fix works fine.
v1.8.9:
throws Broken Pipe and network connection errors.v1.7.5:
works fine.To Reproduce
We are testing on an Amazon Linux 2 machine. Fluent Bit is present on that EC2, and sends logs to a Kinesis Firehose delivery stream in the same AWS account.
Fluent Bit installation: We cloned a specific branch from GitHub and build it following the README guideline. Then we ran it with our config file. We used the following commands:
git clone https://github.com/fluent/fluent-bit.git
cd fluent-bit/build
cmake ..
make
bin/fluent-bit -c /data/fluent-bit.conf
fluent-bit.conf:
Expected behavior
FluentBit should sent all data to Kinesis Firehose without any error
Your Environment
Version used: [v1.8.9, v1.7.5]
Environment name and version: Amazon EC2
Operating System and version: Amazon Linux 2
Filters and plugins: tail input plugin, kinesis_firehsoe output plugin.
Error Logs
Error logs from v1.8.9.
Reproduce Steps:
cd fluent-bit/build
andmake
bin/fluent-bit -c /home/ec2-user/fluent-bit/local-run/fluent-bit.conf
python3 generate.py > /home/ec2-user/fluent-bit/local-run/input/input.log
generate.py:
The text was updated successfully, but these errors were encountered: