-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
invalid stream_id x, could not append content to multiline context #4190
Comments
Hi, @Tinche i was just considering filing the same issue when i saw you dropped yours. I can confirm issue and add some more context:
My preliminary conclusion: Seems to be Fluent-Bit problem in multiline core, not on your end (2b approved) |
More context: Tracked error message "invalid stream_id x, could not append content to multiline context" down to /fluent-bit/blob/master/src/multiline/flb_ml_stream.c
given this i guess
|
We too are seeing this issue on our production server after starting to use The errors always seem to start appearing on log rotation. This is how the typical log looks:
I set The same for another one:
After the errors start happening, no more logs are being processed for the pod the log files were rotated for until the fluent-bit daemon is restarted |
I am able to reproduce the issue (both with the 1.8.8 build and on master) using the following config Configuration~/fluent-bit.conf:
~/fluent-bit.logrotate:
The logrotate needs to create a new file (inode) on rotation to match the kubelet log rotation, so no Steps to reproduce the error:
NotesAdded some log statements to the
The changes to the 2 methods mentioned are (just added the
|
not completely related, but I would like to add that the test ( However, I ran the script first on Ubuntu focal which doesn't have
If I changed the
The output from the tests files:
and the others
so it would seem that the config for the output is not working as expected. The docs say that when you don't specify a
It would probably also be good to add a test for the multiline parser once the normal one is working again :) |
From all said above by @ggermis i think that it can be assumed that there is a problem somewhere in the flb_ml_stream ecosystem. For the configuration shown by @Tinche, simply change the:
section to
With this you will loose multiline context but fluent-bit won't hang. |
@RalfWenzel Thanks a lot, that would be very useful but it doesn't work - the json payload doesn't get parsed at all. |
@Tinche This looks to me as if we are discussing problems on different levels.
The resulting log-record in "/var/log/containers/my_pod_....log" then looks like:
This should work (at least does in my installation)
then multiline parsing (or changing the app) is required. Can you clarify (or provide some records from the original logs on node filesystem in /var/logs/containers) ? |
@RalfWenzel No, it's a single line just as you correctly assumed, it's just not getting parsed.
So at the final destination (ElasticSearch) it all ends up in a single string under the |
@Tinche i think i can help with this but it's an off topic in this thread because this should focus on the problem with the multiline parsing. I assume that the "invalid stream_id..." message was gone after you deployed using the approach proposed by me? |
@RalfWenzel Well, I rolled back the configuration since unstructured logs aren't useful to us. That said, I can apply your change on Monday and let it run for a few hours and see if any errors pop up. |
@Tinche Then a quick answer: Your JSON is not parsed because you did not tell fluent-bit to do so. Need a filter between input and output, Try:
|
@RalfWenzel Since I'm using the Helm chart by default the filter should be active (https://github.com/fluent/helm-charts/blob/55dd89e76a914800eec6a1bee57641b24f46744f/charts/fluent-bit/values.yaml#L208), but I guess it doesn't get used? In addition, thanks a lot for your help with this issue and coming up with a workaround. :) |
I found a root cause. I added below diff to print old/new stream_id. diff --git a/plugins/in_tail/tail_file.c b/plugins/in_tail/tail_file.c
index 958bd58d..59433dea 100644
--- a/plugins/in_tail/tail_file.c
+++ b/plugins/in_tail/tail_file.c
@@ -894,6 +894,7 @@ int flb_tail_file_append(char *path, struct stat *st, int mode,
goto error;
}
file->ml_stream_id = stream_id;
+ flb_error("new stream_id= %lu", stream_id);
}
/* Local buffer */
diff --git a/src/multiline/flb_ml_stream.c b/src/multiline/flb_ml_stream.c
index 9220e7b9..1e6b442a 100644
--- a/src/multiline/flb_ml_stream.c
+++ b/src/multiline/flb_ml_stream.c
@@ -289,7 +289,7 @@ void flb_ml_stream_id_destroy_all(struct flb_ml *ml, uint64_t stream_id)
struct flb_ml_group *group;
struct flb_ml_stream *mst;
struct flb_ml_parser_ins *parser_i;
-
+ flb_error("%s stream_id=%lu", __FUNCTION__, stream_id);
/* groups */
mk_list_foreach(head, &ml->groups) {
group = mk_list_entry(head, struct flb_ml_group, _head); Then the log indicates these stream_ids are same.
|
If stream_id is created by filename, rotated file id will be same. It causes releasing new multiline instance after file rotation. Signed-off-by: Takahiro Yamashita <[email protected]>
If stream_id is created by filename, rotated file id will be same. It causes releasing new multiline instance after file rotation. Signed-off-by: Takahiro Yamashita <[email protected]>
I sent a patch #4197. |
If stream_id is created by filename, rotated file id will be same. It causes releasing new multiline instance after file rotation. Signed-off-by: Takahiro Yamashita <[email protected]>
If stream_id is created by filename, rotated file id will be same. It causes releasing new multiline instance after file rotation. Signed-off-by: Takahiro Yamashita <[email protected]>
If stream_id is created by filename, rotated file id will be same. It causes releasing new multiline instance after file rotation. Signed-off-by: Takahiro Yamashita <[email protected]>
Confirming this looks to be fixed, thanks! |
If stream_id is created by filename, rotated file id will be same. It causes releasing new multiline instance after file rotation. Signed-off-by: Takahiro Yamashita <[email protected]>
Bug Report
Describe the bug
Since upgrading to 1.8.8 using the fluent/fluent-bit Helm chart, I see these errors in the logs of essentially every daemonset pod.
To Reproduce
I'm not actually sure what's causing this. Is there a way to make FluentBit show me some information about what's causing this exactly? I would also be ok with a way to disable multiline completely while still parsing structured logs from our services. (We log single JSON lines for the most part, and multiline isn't necessary.)
Expected behavior
FluentBit doesn't log tons of errors.
Screenshots
Your Environment
Additional context
I'm trying to discover if there's something wrong on our end emitting logs or a problem with FluentBit. Thanks in advance!
The text was updated successfully, but these errors were encountered: