-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fluent-bit prints garbage on shutdown #429
Comments
hmm looks like related to Elasticsearch. output plugin. |
Hi.
when shutdown and every 15 minnutes |
Hi, I have the same problem few minutes after restart and part of logs were not imported to elasticsearch. No errors in elasticsearch logs.
Config:
|
Hello |
Is this garbage reproducible with latest v0.12.13 ? |
v0.12.14 also produce the messages.
+ exec /fluent-bit/bin/fluent-bit -c /fluent-bit/etc/fluent-bit.conf |
We faced the same issues, I've builded fluent-bit with debugging symbols and here is what happened:
core file analysis:
Core file attached: fluent-bit compiled from master with ES index strftime patch introduced in the PR #512 |
I've tried to reproduce the "garbage on exit" issue without success. Would you please provide exact steps to reproduce it ? |
Unfortunately I can't reproduce it reliably as well, it happens with reasonably high rate of messages (more than 2000 sec in our case) and es output plugin. This happens 1 time per 24 hours in average in our case. |
I can try to gather all needed information, if you tell me what are you interested in? I can provide you with core file, or anything you request, if it will be possible. |
We see this error with fluent-bit 0.13 from the official docker image I can't find any evidence that this occurs with throughput spikes. CPU, memory and throughput have been on a baseline level. That means 230 log entries per minute, coming from 5 fluent-bit instances. |
I have same issues with both v0.12.14 and v0.12.15. "took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took""errors""took" This affects all pods |
I've tried to reproduce this problem without success. When this happens, is there any special condition happening in the remote Elasticsearch server like a premature TCP close, network outage or similar ? |
My guess is when there is too much retry on sending logs. Like wrong time
format.
…On Sat, 31 Mar 2018, 07:10 Eduardo Silva, ***@***.***> wrote:
I've tried to reproduce this problem without success.
When this happens, is there any special condition happening in the remote
Elasticsearch server like a premature TCP close, network outage or similar ?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#429 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADJrq6AZQ3o0wfZFLMSguEFlID-L_4jHks5tjw59gaJpZM4QbuRP>
.
|
does it happen only on shutdown or also on runtime ? |
I don't know if it's shutdown, or this message result in a pod killing. But
after this message, I have always a restart
…On Sun, 1 Apr 2018, 02:07 Eduardo Silva, ***@***.***> wrote:
does it happen only on shutdown or also on runtime ?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#429 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADJrq76iBLuLal0JBVerEaJrU5cjJ9hGks5tkBo6gaJpZM4QbuRP>
.
|
@edsiper Using
I am using Fluent-bit without elasticsearch output, but with two http outputs configured. For instance, instead of fluent-bit-configmap.yaml:
I made HTTP web-services that produced to Kafka and ElasticSearch to rule out those output plugins execution paths since I could not get meaningful error messages. The logs aren't helpful, but the timing between startup and last success is very close to the three minutes mentioned by @intelliguy:
|
@StevenACoffman @intelliguy @jgsqware when connecting to elasticsearch, are you using plain HTTP communication or HTTPS (TLS) ?, which specific elasticsearch version are you (each one) using ? |
I am not connecting to ElasticSearch at all, and still get the error. I am using plain HTTP to talk to another container I added to the pod. One HTTP port is a container that writes log files to AWS S3 storage. The other processes the logs and sends them on to Kafka. Mine may be an unrelated problem, but the error message was very similar, so I thought I would mention it here in the hope that there was a common root cause and this was a clue you could use. |
Is there any TLS involved or just plain HTTP?
…On Thu, Apr 5, 2018, 17:54 Steve Coffman ***@***.***> wrote:
I am *not* connecting to ElasticSearch at all, and still get the error. I
am using plain HTTP to talk to another container I added to the pod. One
HTTP port is a container that writes log files to AWS S3 storage. The other
processes the logs and sends them on to Kafka.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#429 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAWkNuND6zXCDWrOaK8m9UHgJjGtCABpks5tlq63gaJpZM4QbuRP>
.
|
For us, it's plain http
…On Fri, 6 Apr 2018, 02:05 Eduardo Silva, ***@***.***> wrote:
Is there any TLS involved or just plain HTTP?
On Thu, Apr 5, 2018, 17:54 Steve Coffman ***@***.***> wrote:
> I am *not* connecting to ElasticSearch at all, and still get the error. I
> am using plain HTTP to talk to another container I added to the pod. One
> HTTP port is a container that writes log files to AWS S3 storage. The
other
> processes the logs and sends them on to Kafka.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#429 (comment)
>,
> or mute the thread
> <
https://github.com/notifications/unsubscribe-auth/AAWkNuND6zXCDWrOaK8m9UHgJjGtCABpks5tlq63gaJpZM4QbuRP
>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#429 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADJrqx8rWyGvxJ57O9Ppe12ku8DwPRvCks5tlrFbgaJpZM4QbuRP>
.
|
Plain HTTP for us, no TLS. |
update: I am finally able to reproduce the garbage without stopping the service (kill -15 PID). Work in process. |
Signed-off-by: Eduardo Silva <[email protected]>
I've found the guilty code: https://github.com/fluent/fluent-bit/blob/master/plugins/out_es/es.c#L409 and it author:
It was a debug line that was not removed, it's not easy to catch since it does a fprintf() to the buffered stdout, so the message is only visible when the buffer is flushed by the OS. Fixed by 02d9505 All: if you see the message amd64.c:121: crash: Assertion `0' failed somewhere after this fix (in the new images to be available), please let me know ASAP. I am not confident that error is associated to the garbage message recently fixed.. |
regarding the core file provided in Dropbox, do you have the exact fluent-bit binary that you built and was used to generate the core file ? |
Signed-off-by: Eduardo Silva <[email protected]>
@edsiper In my configuration I am not using ElasticSearch output, and with the image
Again, this may be a separate issue. |
By the way, I am using this to receive the termination message:
|
Awesome for the garbage log.
We received too this message
`/tmp/src/lib/monkey/deps/flb_libco/amd64.c:121: crash: Assertion `0'
failed.`
…On Sat, 7 Apr 2018, 04:02 Steve Coffman, ***@***.***> wrote:
By the way, I am using this to receive the termination message:
kubectl logs -n logging fluent-bit-k9phx -c fluent-bit --previous
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#429 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADJrq106rUiaxbni8rXZF6j8bSfG3bKXks5tmB5GgaJpZM4QbuRP>
.
|
@jgsqware To clarify, you get that error message with |
I've filed #557 to troubleshoot/fix the crash issue. Since the garbage problem is fixed and releases available I am closing this ticket: |
…nt#429) Signed-off-by: Eduardo Silva <[email protected]>
Whenever I send a SIGTERM to fluent-bit, I get this output:
Seems harmless otherwise, but looks like some sort of buffer overrun or something.
The text was updated successfully, but these errors were encountered: