You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The container parser (at least in dockerformat) silently concats every log content that does not contain a final newline in the string and only then does it get parsed.
This is the usual JSON log from docker:
{"log":"This is a log\n","stream":"stdout","time":"2024-01-24T19:09:55.568450771Z"}
This is an abnormal one without a newline which I came up with during debugging an issue that made my debugging process hell:
{"log":"This is a log","stream":"stdout","time":"2024-01-24T19:09:55.568450771Z"}
Which means that this logfile:
{"log":"This is a log","stream":"stdout","time":"2024-01-23T19:09:55.568450771Z"}
{"log":"This is a log2","stream":"stdout","time":"2024-01-23T19:09:55.568450771Z"}
{"log":"This is a log3","stream":"stdout","time":"2024-01-23T19:09:55.568450771Z"}
{"log":"{\"message\":\"Starts synchronization of instances.\",\"request_id\":null}}\n","stream":"stdout","time":"2024-01-24T16:42:51.200033705Z"}
These options make it possible for Filebeat to decode logs structured as JSON messages. Filebeat processes the logs line by line, so the JSON decoding only works if there is one JSON object per message.
The decoding happens before line filtering. You can combine JSON decoding with filtering if you set the message_key option. This can be helpful in situations where the application logs are wrapped in JSON objects, like when using Docker.
The ndjson documentation suggests that it's useful for docker logs, but for that purpose the container log already exists, and if someone tries to use ndjson on top of parsing with container, they end up with garbage random error message about JSON decodes.
Fourth problem, deprecations and the container input.
log input is deprecated, container input is a wrapper for log.
One is marked as deprecated, the other isn't.
In fact it isn't even mentioned that one CAN migrate to filestream with the container parser.
Vice-versa, filestream documentation does not mention this is the replacement for the container input.
There is a related issue to this point -> #34393 - but it is mostly about introducing take_over option for the input now.
Not even autodiscovery is using the container input anymore - #35984 - yet people wanting to setup new Filebeat inputs for container logs are not warned or informed at all about the options and differences.
Why it matters
All of these issues combined will result in people doing double or triple JSON parsing and being ultimately very confused.
Here is an issue where OP(and many people in the comments, me included) tries to double-parse the log field which doesn't exist at that point anymore: #20053
Here is an issue where OP... does exactly the same thing, and on top of that they try to triple parse it with decode_json_fields processor: elastic/ecs-logging-java#43
Here's an issue where OP attempts to triple parse the log field again: SO64860153
There's many threads about these throughout the Filebeat forums too.
Actionable TL;DR:
1. Confirm/Deny that log field concat of logs without newline until one with newline is thrown in container parser with docker format is expected and document/fix the behavior
2. Document expected log formats for docker and criformat options in the container parser
2. Document supported logging drivers for the docker format, seems it only supports json-file and not the local driver for example, as per this forum post - the filebeat.yml reference documentation is better than the website doc
2. Confirm/Deny that final newline is kept for default docker JSON logs in container parser and document/fix the behavior
2. Make it very obvious that using container parser will turn default docker JSON log log field into a message field
3. Put a warning on the filestream ndjson parser that container parser ALREADY does what the ndjson example suggests to do, as that leads people to double parsing
4. Deprecate the container input in the documentation
4. Recommend filestream input with container parser instead of the container input in the container input documentation
4. Document that container filestream parser is the replacement for the container input in the filestream documentation and mention differences, if any
4. Document that migration from container input is currently not possible via take_over and is in the works in the container input documentation
5. Bonus issue, fix the unescaped asterisks in the container input documentation that are wrongly bolding text
The text was updated successfully, but these errors were encountered:
C0rn3j
changed the title
Filestream parser documentation related to containers is lacking and easily confuses people
Filebeat documentation related to parsing logs from containers is lacking and easily confuses people
Jan 25, 2024
The container parser (at least in
docker
format
) silently concats every log content that does not contain a final newline in the string and only then does it get parsed.This is the usual JSON log from docker:
This is an abnormal one without a newline which I came up with during debugging an issue that made my debugging process hell:
Which means that this logfile:
When going through this input:
Will end up spitting out this:
If this is the expected behavior, which it might as well be, I would expect it to be documented here:
https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html#_container
Onto my second issue.
https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html#_container
Use the container parser to extract information from containers log files. It parses lines into common message lines, extracting timestamps too.
Based on what I am getting from syslog parser here, crio logs look like this:
Meanwhile, docker by default logs JSON like this:
It is not immediately obvious from this that the docker logs get JSON decoded and
log
becomesmessage
It's a bit odd that the parser keeps the final newline as without it it doesn't even parse correctly.
Onto ticket number 3.
https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html#filebeat-input-filestream-ndjson
These options make it possible for Filebeat to decode logs structured as JSON messages. Filebeat processes the logs line by line, so the JSON decoding only works if there is one JSON object per message.
The decoding happens before line filtering. You can combine JSON decoding with filtering if you set the message_key option.
This can be helpful in situations where the application logs are wrapped in JSON objects, like when using Docker.
The
ndjson
documentation suggests that it's useful for docker logs, but for that purpose thecontainer
log already exists, and if someone tries to usendjson
on top of parsing withcontainer
, they end up with garbage random error message about JSON decodes.https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-container.html
Fourth problem, deprecations and the container input.
log
input is deprecated,container
input is a wrapper forlog
.One is marked as deprecated, the other isn't.
In fact it isn't even mentioned that one CAN migrate to filestream with the container parser.
Vice-versa, filestream documentation does not mention this is the replacement for the container input.
There is a related issue to this point -> #34393 - but it is mostly about introducing
take_over
option for the input now.Not even autodiscovery is using the container input anymore - #35984 - yet people wanting to setup new Filebeat inputs for container logs are not warned or informed at all about the options and differences.
Why it matters
All of these issues combined will result in people doing double or triple JSON parsing and being ultimately very confused.
Here is an issue where OP(and many people in the comments, me included) tries to double-parse the log field which doesn't exist at that point anymore: #20053
Here is an issue where OP... does exactly the same thing, and on top of that they try to triple parse it with
decode_json_fields
processor: elastic/ecs-logging-java#43Here's an issue where OP attempts to triple parse the log field again: SO64860153
There's many threads about these throughout the Filebeat forums too.
Actionable TL;DR:
log
field concat of logs without newline until one with newline is thrown incontainer
parser withdocker
format is expected and document/fix the behaviordocker
andcri
format
options in thecontainer
parserdocker
format, seems it only supportsjson-file
and not thelocal
driver for example, as per this forum post - the filebeat.yml reference documentation is better than the website docdocker
JSON logs incontainer
parser and document/fix the behaviorcontainer
parser will turn defaultdocker
JSON loglog
field into amessage
fieldndjson
parser thatcontainer
parser ALREADY does what thendjson
example suggests to do, as that leads people to double parsingcontainer
input in the documentationcontainer
parser instead of thecontainer
input in thecontainer
input documentationcontainer
filestream parser is the replacement for thecontainer
input in the filestream documentation and mention differences, if anycontainer
input is currently not possible viatake_over
and is in the works in thecontainer
input documentationcontainer
input documentation that are wrongly bolding textThe text was updated successfully, but these errors were encountered: