Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle multiline from nested JSON strings #337

Closed
edsiper opened this issue Jul 26, 2017 · 67 comments
Closed

handle multiline from nested JSON strings #337

edsiper opened this issue Jul 26, 2017 · 67 comments
Assignees

Comments

@edsiper
Copy link
Member

edsiper commented Jul 26, 2017

there is a specific use case where an application running under Docker and generating multiline log messages ends up with logs as follows:

{"log":"2017-07-26 07:54:42.130  WARN parse organization id exception\n","stream":"stdout","time":"2017-07-25T07:54:42.131621351Z"}
{"log":"\n","stream":"stdout","time":"2017-07-25T07:54:42.13171348Z"}
{"log":"java.lang.NumberFormatException: For input string: \"PodQueryBykind\"\n","stream":"stdout","time":"2017-07-25T07:54:42.131723859Z"}

there are 3 JSON log entries, but the contained messages are multiline. We likely need to implement a specific feature in our parsers to reduce the pain.

@djotanov
Copy link

Any news on when could this be implemented?

@monotek
Copy link

monotek commented Jan 10, 2018

You could also try to use the detect-exceptions plugin mentioned in #476

https://github.com/GoogleCloudPlatform/fluent-plugin-detect-exceptions

Edit: ahh... damn... the fluentd plugins don't seem compatible :-(

@gavrie
Copy link
Contributor

gavrie commented May 2, 2018

It seems this issue is not being addressed. This is a showstopper for me, so I guess it's back to plain old fluentd...

@abhishek-buragadda
Copy link

@gavrie Can we handle the same issue in fluentd?

@gavrie
Copy link
Contributor

gavrie commented Jun 25, 2018 via email

@abhishek-buragadda
Copy link

@gavrie This is a paid plugin rgt? are there any open source plugins that does the same?

@gavrie
Copy link
Contributor

gavrie commented Jun 27, 2018 via email

@abhishek-buragadda
Copy link

@gavrie thanks .

@flypenguin
Copy link

flypenguin commented Jul 10, 2018

@edsiper just a small request for information - is this something which is being developed now (or soon)? if yes I could wait, if not I would have to look for alternatives. I think this use case is rather straightforward, so my (maybe naive ;) hope is that it is at least on the map.

@edsiper edsiper self-assigned this Jul 11, 2018
@Markbnj
Copy link

Markbnj commented Jul 26, 2018

This is an issue for us as well. I spun up a fluent-bit daemonset and was happy with the performance and footprint, but I have not been able to figure out a workaround for the issue with multiline logs. It's not difficult to create a parser for the additional lines which drops the docker cruft and captures the message content. The problem is that the named field gets added to the record for each line, which creates a json record with duplicate keys. One idea comes to mind which would solve this problem neatly: an option to allow the input plugin to append to an existing field when it encounters a new field of the same name. For now we're going back to fluentd because the detect_exceptions plugin helps with some of these cases.

@nikolay
Copy link

nikolay commented Jul 27, 2018

@Markbnj I think, for the time being, Fluent Bit is just an experiment, a POC, as it lacks rudimentary features and cannot be used in the real world.

@breeze7086
Copy link

breeze7086 commented Jul 27, 2018

Same issue for me.
The cause of this is default docker logging-driver of json-file.

@Markbnj
Copy link

Markbnj commented Jul 27, 2018

@nikolay Just FYI we're using it in the real world, as the leaf node collector for over 300 GCP instances, logging over 150 million events per day. The issues on K8S are due to docker's pretty silly logging format, which annotates each line with cruft that ranges from useless to semi-useless. Dealing with that format in a multiline logging scenario is probably beyond fluent-bit's charter, but unfortunately there are not really any easier places to deal with it when you're running on a hosted cluster.

@rawkode
Copy link

rawkode commented Jul 27, 2018

@Markbnj Fluentd doesn't have this problem with Docker logs, so why does fluentbit? As "silly" or "useless" as its output may be, this appears to be a solved problem; but not with fluent-bit

@Markbnj
Copy link

Markbnj commented Jul 27, 2018

@rawkode Actually fluentd has pretty much the same issue, except that it has the detect exceptions plugin, which does a pretty good job of detecting multiline exceptions. It doesn't handle all cases of multi-line container logs however.

@rawkode
Copy link

rawkode commented Jul 27, 2018

So if I use a different CRI implementation, this problem goes away?

Has anyone ported the plugin to fluentbit?

@Markbnj
Copy link

Markbnj commented Jul 27, 2018

@rawkode good question... I don't have experience actually running anything on cri0, but from looking at google's fluentd config for stackdriver logging it seems like you could also expect per line additions of at least timestamp and implied severity (based on stream) so probably the same issue in a slightly different format.

@michiel
Copy link
Contributor

michiel commented Jul 27, 2018

@Markbnj @nikolay same here, real world deployment. We are using fluent-bit as k8s node collector across various environments at a similar scale and event volume

@nikolay
Copy link

nikolay commented Jul 30, 2018

@Markbnj Look, your seemingly big numbers are meaningless when even their example setup does not work on a single node. It kinda works for you with tons of hacks and compromises, but Fluent Bit, unlike Fluentd, is targeting Kubernetes, and, yet, it is totally defunct with it. So, if Fluentd needs a plugin - it's understandable and acceptable, but Fluent Bit needs this basic use case out of the box without the requirement for a plugin from the future... as there's no such plugin at this point. So, I repeat what I said - Fluent Bit is possibly the future, but definitely not the present! At this point, it's just a POC, which hopefully will be shaped to something workable around v1.0... but it's still just a v0.13. My point was if I wasn't clear that it needs a big warning sign so that people don't spin wheels!

@nikolay
Copy link

nikolay commented Jul 31, 2018

@michiel As explained, the number of nodes is irrelevant when even the "hello world" equivalent fails with a single node! Provide versions of Fluent Bit and types of apps running in the cluster, which would be something substation other than just bragging!

@Markbnj
Copy link

Markbnj commented Jul 31, 2018

@nikolay our numbers aren't really big. I was just giving you a data point to consider. It doesn't seem accurate to me to suggest that fluent-bit is "targeting kubernetes" and is thus insufficient for its primary use case, although the authors can address that better than I can. Kubernetes is mentioned on one line of the readme, in the filters section. In other words, it is one potential source of logs that fluent-bit can be used to collect.

@nikolay
Copy link

nikolay commented Jul 31, 2018

@Markbnj That's what Eduardo said himself during KubeCon 2017, which I attended.

@flypenguin
Copy link

flypenguin commented Jul 31, 2018

IMHO pretty much the whole discussion is pointless. I really don't care if fluentbit is production or not, 0.x or not, supercool or not - it's useful to me. And getting this fixed makes it even more useful to me. what more is there to say? why even bother "warning" people who are happy with their choice so far?!

so if @nikolay wants to jump in here and troll an opinion, I personally choose to ignore him because I don't see him contributing anything remotely useful, just some strongly worded opinion about which label to attach to fluentbit, which does not help me at all and I franky don't care about that.

EDIT: changed subject ;) - I only speak for myself.

@edsiper
Copy link
Member Author

edsiper commented Jul 31, 2018

Our focus is cloud native in general which includes Kubernetes, and yes, this is a missing feature. It's good to have different opinions.

The ticket continue being an enhancement request; if this missing feature is a blocker for your environment you should go with Fluentd instead.

@rchench
Copy link

rchench commented Aug 21, 2018

+1 for this feature.
my use case is to collect logs for Spinnaker running in Kubernetes.
as majority of the Spinnaker services are written in java and I do have the needs to parse multiple lines of java exception and feed them into ES to trigger some follow-up actions.

and I like fluent-bit over fluentd as well :)

thanks.

@iwilltry42
Copy link

@breeze7086 , does your comment mean, that the problem won't exist with any other logging driver from docker?
I guess it definitely won't exist with the fluentd driver?

@ZMMWMY
Copy link

ZMMWMY commented Jun 24, 2019

@shahbour it is work , but the format so ..... , you know , anyway , thank you

@servo1x
Copy link

servo1x commented Jul 30, 2019

@stang does this work with Fluentbit 1.2, where they fix the UTF8 decoding? Or do we need to add the utf8 decoders back?

@MyOldSkyGrandfather
Copy link

any progress about this issue,dude

@sandeepbhojwani
Copy link

For mutliline in fluentd we use concat. We want to move to fluentbit fir performance reasons and multiline support is the only thing missing for us..please help :(

@albertocsm
Copy link

@stang @shahbour perfect. thanks!

@drorventura
Copy link

Thanks for the solution, it did work for me while parsing java nultiline log files but for other log outputs it parse them wrong with unwonted encoding.
In addition, it ignores the fluentbit.io/parser annotation

Is there a better way to solve it other then splitting the [INPUT]?

@moliqingwa
Copy link

Hi @kiich,

Super interested in how you implemetend the above @stang - would you be ok to share the lua filter config and code perhaps?

Here you go:

  • gist of the helpers.lua file (called from your lua filter in fluent-bit configuration)
  • gist of the JSON.lua file which a slightly modified version of a lua JSON library (original code is linked so you can see what we added)
  • and hereafter, an extract of our fluent-bit configuration:
[INPUT]
    Name             tail
    Path             /var/log/containers/*.log
    Parser           docker
    Tag              kube.*
    Refresh_Interval 5
    Mem_Buf_Limit    5MB
    Skip_Long_Lines  On
    DB               /tail-db/tail-containers-state.db
    DB.Sync          Normal
    Ignore_Older 2d
    Multiline On
    Multiline_Flush 5
    Parser_Firstline first_line
...
[FILTER]
    Name lua
    Match kube.*
    script /fluent-bit/etc/helpers.lua
    call process
...
[PARSER]
    Name        first_line
    Format      regex
    Regex       ^{"log":"(?!\\u0009)(?<log>\S(?:(\\")|[^"]){9}(?:(\\")|[^"])*)"

You might want to fine tune the Regex of the parser for your specific use case.

I'm running into the same issue and have been struggling loads with fluent-bit and multi-line and splunk!

In our case, fluent-bit was the only component used to collect and ship log straight to an elasticsearch instance and we didn't want to add more components to the stack, but you might be able to handle such a thing on the Splunk side.

Hi @kiich,

Super interested in how you implemetend the above @stang - would you be ok to share the lua filter config and code perhaps?

Here you go:

  • gist of the helpers.lua file (called from your lua filter in fluent-bit configuration)
  • gist of the JSON.lua file which a slightly modified version of a lua JSON library (original code is linked so you can see what we added)
  • and hereafter, an extract of our fluent-bit configuration:
[INPUT]
    Name             tail
    Path             /var/log/containers/*.log
    Parser           docker
    Tag              kube.*
    Refresh_Interval 5
    Mem_Buf_Limit    5MB
    Skip_Long_Lines  On
    DB               /tail-db/tail-containers-state.db
    DB.Sync          Normal
    Ignore_Older 2d
    Multiline On
    Multiline_Flush 5
    Parser_Firstline first_line
...
[FILTER]
    Name lua
    Match kube.*
    script /fluent-bit/etc/helpers.lua
    call process
...
[PARSER]
    Name        first_line
    Format      regex
    Regex       ^{"log":"(?!\\u0009)(?<log>\S(?:(\\")|[^"]){9}(?:(\\")|[^"])*)"

You might want to fine tune the Regex of the parser for your specific use case.

I'm running into the same issue and have been struggling loads with fluent-bit and multi-line and splunk!

In our case, fluent-bit was the only component used to collect and ship log straight to an elasticsearch instance and we didn't want to add more components to the stack, but you might be able to handle such a thing on the Splunk side.

Here you go:

@stang Hi,seems the lua config files can not be reachable. would you please share me one when you have time? Thanks

@vsinghal13
Copy link

vsinghal13 commented Nov 17, 2019

Our logs look like:

2019-11-17 07:14:12 +0000 [info]: create client with URL: https://100.64.0.1:443/api and apiVersion: v1
2019-11-17 07:14:13 +0000 [info]: using configuration file: <ROOT>
  <source>
    @type events
    deploy_namespace "demo"
  </source>
  <source>
    @type prometheus
    metrics_path "/metrics"
    port 24231
  </source>
</ROOT>
2019-11-17 07:14:13 +0000 [info]: starting fluentd-1.6.3 pid=8 ruby="2.6.3"

Before turning on the Multiline each line is read a separate line and the JSON is correct.

Eg:

{"log":"2019-11-17 07:14:12 +0000 [info]: create client with URL: https://100.64.0.1:443/api and apiVersion: v1","stream":"stdout","time":"2019-11-17T07:14:12.020572877Z"}

Using the below parser for Parser_Firstline .

(?<log>\d{4}-\d{1,2}-\d{1,2} \d{2}:\d{2}:\d{2}.*)$

But when the multiline feature is turned on, \\n is added which escapes the remaining key value pairs and consider it as part of the "log" itself :

{"log":"2019-11-17 06:53:51 +0000 [info]: create client with URL: https://100.64.0.1:443/api and apiVersion: v1\\n\",\"stream\":\"stdout\",\"time\":\"2019-11-17T06:53:51.792044138Z\"}"}

Can anyone suggest a way to resolve this?

@TomaszKlosinski
Copy link

TomaszKlosinski commented Dec 4, 2019

Hello, any news on this? And does filebeat actually support this?

Update: Yes, filebeat supports this and it can have multiple multiline parsers for different containers based on templating using kubernetes metadata.

@heartrobotninja
Copy link
Contributor

Has there been any work done in this area yet?

@jujugrrr
Copy link

@stang thank you, we followed the Lua approach and it's working well!

@salla2
Copy link

salla2 commented Apr 7, 2020

@stang any suggestions on below log format . We are having non json messages some times in logs while we have panics. Problem here is json log message single line working well but when we receive panic message each line considering as new line, what we want is wrap all the panic message to single message. Any suggestion please!
{"level":"info","ts":"2020-04-07T08:32:01.755-0600","logger":"cmwnext","caller":"middleware/middleware_logging.go:141","msg":"access.log","version":"463.0.0","interactionID":"f6d13193-ebca-4967-423a-07724e4e4d06","sessionID":"","userID":"","request.method":"GET","request.path":"/cmw/v1/panic","response.status.code":200,"response.status.text":"OK"}
panic: Bailing out with a panic from main. All is NOT well.
goroutine 1 [running]:
p-bitbucket.imovetv.com/hydra/cmwnext/pkg/resumes.NotGonnaResume(...)
/go/src/p-bitbucket.imovetv.com/hydra/cmwnext/pkg/resumes/resumes_handler.go:64
main.wireMeUpSomeWidespreadPanic(...)
/go/src/p-bitbucket.imovetv.com/hydra/cmwnext/cmd/cmwnext/route.go:268
main.ListenAndServe(...)
/go/src/p-bitbucket.imovetv.com/hydra/cmwnext/cmd/cmwnext/route.go:264
main.main()
/go/src/p-bitbucket.imovetv.com/hydra/cmwnext/cmd/cmwnext/main.go:77 +0x7d2
{"level":"warn","ts":"2020-04-07T08:32:16.964-0600","caller":"go-config/consul.go:136","msg":"CONSUL ACCEPTED:","address":"d-gp2-consul.imovetv.com/k8s/clusters/preview-qak8s/config/cmwnext"}

@isshwar
Copy link

isshwar commented Apr 8, 2020

@stang thank you, we followed the Lua approach and it's working well!

Hi Stang, I am trying to use the Lua approach but i am getting an error message

  • Parser docker - cannot be used in docker

  • [filter_lua] function process is not found

any idea why this is showing up. Also, just want to let you know that I am working docker logs .

Thanks
Eswar

@bilaltahirx
Copy link

Any update on this thread ?

@naviat
Copy link

naviat commented Sep 27, 2020

@stang thank you, we followed the Lua approach and it's working well!

Hi Stang, I am trying to use the Lua approach but i am getting an error message

  • Parser docker - cannot be used in docker
  • [filter_lua] function process is not found

any idea why this is showing up. Also, just want to let you know that I am working docker logs .

Thanks
Eswar

Your case is the same with me! Any update will help me out?

@debu99
Copy link

debu99 commented Apr 25, 2021

any progress?

@edsiper
Copy link
Member Author

edsiper commented Jul 20, 2021

all good now :) , thanks everyone!

Multiline Update

As part of Fluent Bit v1.8, we have released a new Multiline core functionality. This new big feature allows you to configure new [MULTILINE_PARSER]s that support multi formats/auto-detection, new multiline mode on Tail plugin, and also on v1.8.2 (to be released on July 20th, 2021) a new Multiline Filter.

For now, you can take at the following documentation resources:

Documentation pages now point to complete config examples that are available on our repository.

Thanks everyone for supporting this!

@edsiper edsiper closed this as completed Jul 20, 2021
@kiich
Copy link

kiich commented Jul 20, 2021

Amazing feature! Well done team! 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests