-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[es_output] Generate_ID sometimes corrupts message structure during /_bulk requests.(possible cause: incorrect memory buffer size) #4311
Comments
I have an impression if header is too long(which looks like our case) it is being cut and we get an unexpected behaviour? |
Can confirm the issue is related to bulk header size. It looks like we need a mechanism to trim bulk header string if its longer than 165 symbols. |
We are using a workaround currently: truncate long index names using custom lua instructions. |
Thank you for reporting. The length of example is 94 and it causes this issue. $ irb
irb(main):001:0> "k8s-v1-abcdef-website-production-abcdef-website-production-abcdef-event-bus-handler-2021.11.10".length
=> 94 We can reproduce this issue using below command.
|
Signed-off-by: Takahiro Yamashita <[email protected]>
I sent a patch #4361. |
Thank you @nokute78 |
Signed-off-by: Takahiro Yamashita <[email protected]>
Signed-off-by: Takahiro Yamashita <[email protected]>
Bug Report
Describe the bug
After migrating to 1.8.X fluent -bit version we started to see lots of errors:
an id must be provided if version or value are set
Quick investigation showed that it might be caused by adjusting the code to work with ES 7.5+ API.
Exactly the same issue: #3909
In order to make it work we were forced to enable the setting: Generate_ID because we are using elasticsearch 6.4 and we cannot afford to migrate on newer version at the moment.
That solved our issue.
However we started to notice lots of 500 errors on our loadbalancer which is located in the front of elasticsearch coordinator nodes.
Using strace showed the issue with message corruption.
In the following examples i removed several fields for the sake of simplicity
Please check "_id" field. It is corrupted in both examples.
Example 1 of corrupted message:
Example 2 of corrupted message:
Unclear what happens to the message when code block responsible for generate_id begin to process:
https://github.com/fluent/fluent-bit/blob/master/plugins/out_es/es.c#L489-L507
I suspect the issue is either in 'MurmurHash3_x64_128' or 'snprintf' functions caused by incorrect buffer size.
Exception in elasticsearch:
Corruption happens always on column 166.
At the same time there is a preprocessor instruction:
Your Environment
Tried fluent-bit versions: 1.8.0 -> latest(1.8.9)
Version of elasticesarch cluster: 6.4.0
Fluent-bit primarily runs on EKS(AWS managed kubernetes service)
Fluent-bit configuration:
Please let me know if you need any other information
The text was updated successfully, but these errors were encountered: