-
Notifications
You must be signed in to change notification settings - Fork 41
[journald_input] Write journald fields as attributes instead of body. #353
[journald_input] Write journald fields as attributes instead of body. #353
Conversation
The list of journald fields is defined here: https://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html The following field mappings are applied: MESSAGE => body __REALTIME_TIMESTAMP => timestamp PRIORITY => severity, severity_text All other fields are stored as attributes.
Codecov Report
@@ Coverage Diff @@
## main #353 +/- ##
=======================================
- Coverage 77.1% 77.0% -0.2%
=======================================
Files 94 94
Lines 4471 4492 +21
=======================================
+ Hits 3451 3460 +9
- Misses 701 707 +6
- Partials 319 325 +6
|
str, ok := v.(string) | ||
if !ok { | ||
// attributes only supports strings at the moment | ||
// https://github.com/open-telemetry/opentelemetry-log-collection/issues/190 | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we try to convert common types to a string? I think all values are strings in the first place, but I think we should avoid silently dropping a field in the event it is not a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, can do. The options were to wait for #190 to be fixed (I've not looked at what is needed for this) at which point we'll be able to do this properly or convert everything to a string.
"_BOOT_ID": "c4fa36de06824d21835c05ff80c54468", | ||
"_CAP_EFFECTIVE": "0", | ||
"_TRANSPORT": "journal", | ||
"_UID": "1000", | ||
"_EXE": "/usr/lib/systemd/systemd", | ||
"_AUDIT_LOGINUID": "1000", | ||
"_PID": "13894", | ||
"_CMDLINE": "/lib/systemd/systemd --user", | ||
"_MACHINE_ID": "d777d00e7caf45fbadedceba3975520d", | ||
"_SELINUX_CONTEXT": "unconfined\n", | ||
"CODE_FUNC": "unit_log_success", | ||
"SYSLOG_IDENTIFIER": "systemd", | ||
"_HOSTNAME": "myhostname", | ||
"MESSAGE_ID": "7ad2d189f7e94e70a38c781354912448", | ||
"_SYSTEMD_CGROUP": "/user.slice/user-1000.slice/user@1000.service/init.scope", | ||
"_SOURCE_REALTIME_TIMESTAMP": "1587047866229317", | ||
"USER_UNIT": "run-docker-netns-4f76d707d45f.mount", | ||
"SYSLOG_FACILITY": "3", | ||
"_SYSTEMD_SLICE": "user-1000.slice", | ||
"_AUDIT_SESSION": "286", | ||
"CODE_FILE": "../src/core/unit.c", | ||
"_SYSTEMD_USER_UNIT": "init.scope", | ||
"_COMM": "systemd", | ||
"USER_INVOCATION_ID": "88f7ca6bbf244dc8828fa901f9fe9be1", | ||
"CODE_LINE": "5487", | ||
"_SYSTEMD_INVOCATION_ID": "83f7fc7799064520b26eb6de1630429c", | ||
"_GID": "1000", | ||
"_SYSTEMD_UNIT": "user@1000.service", | ||
"_SYSTEMD_USER_SLICE": "-.slice", | ||
"__CURSOR": "s=b1e713b587ae4001a9ca482c4b12c005;i=1eed30;b=c4fa36de06824d21835c05ff80c54468;m=9f9d630205;t=5a369604ee333;x=16c2d4fd4fdb7c36", | ||
"__MONOTONIC_TIMESTAMP": "685540311557", | ||
"_SYSTEMD_OWNER_UID": "1000", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these fields are not compliant with OTel naming guidelines. I would rather recommend to make this change in one leap to the attribute names that will be used going forward than just moving all the fields to attributes as is and changing them after that. This would require semantic conventions PR first.
Another concern I have is that I'm not sure we need all these fields. Many of them don't provide real value IMO. I think we can reduce number of attributes to essential ones. That's why I suggested to provide an option to not parse the journald payload in the issue, in case if user do want all of these fields as is .
cc @djaglowski
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these fields are not compliant with OTel naming guidelines. I would rather recommend to make this change in one leap to the attribute names that will be used going forward than just moving all the fields to attributes as is and changing them after that. This would require semantic conventions PR first.
Unfortunately, I think waiting for the semantic conventions blocks most of this PR for now, as the body will have to remain structured in order to hold most of the fields. I am on board with waiting though, as this is the best way to ensure we are only moving in the right direction.
Shall we:
- Have a small PR now that just parses out severity
- Make a PR to spec that proposes semantic conventions
- Circle back to this after 2 is accepted
@gregoryfranklin, are you on board with this? Are you willing to propose the semantic conventions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another concern I have is that I'm not sure we need all these fields. Many of them don't provide real value IMO. I think we can reduce number of attributes to essential ones. That's why I suggested to provide an option to not parse the journald payload in the issue, in case if user do want all of these fields as is .
One design option that comes to mind is something like the preset
option used for severity parsing. Basically, we can have some predefined sets of values that will be parsed. One of these sets is the default. The user can optionally choose another set and/or add additional fields to any set.
# just the default set
- type: journald_input
# default set plus a few extra fields
- type: journald_input
fields:
- USER_INVOCATION_ID
- _SYSTEMD_USER_UNIT
# alternate predefined set plus a few extra fields
- type: journald_input
field_set: some_other_set
fields:
- USER_INVOCATION_ID
- _SYSTEMD_USER_UNIT
# only a few specific fields
- type: journald_input
field_set: none
fields:
- USER_INVOCATION_ID
- _SYSTEMD_USER_UNIT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we:
Have a small PR now that just parses out severity
Make a PR to spec that proposes semantic conventions
Circle back to this after 2 is accepted
The plan looks good to me. And I think 1 should be just an additive change - we can keep the payload and do not remove PRIORITY for now.
One design option that comes to mind is something like the preset option used for severity parsing. Basically, we can have some predefined sets of values that will be parsed. One of these sets is the default. The user can optionally choose another set and/or add additional fields to any set.
This sounds good. Maybe even just fields would be enough. But would it mean that we have to come up with attribute names (semantic conventions) for all existing journald fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe even just
fields
would be enough.
True. If you leave it unspecified you get all that we have semantic conventions for, otherwise you get what you specify.
But would it mean that we have to come up with attribute names (semantic conventions) for all existing journald fields?
I suppose we could make fields
a map, where a the key is the journald field name, and the value is the name of the attribute. User can leave value empty if we have a default mapping for the field, otherwise must specify.
- type: journald_input
fields:
_BOOT_ID: # has semconv => ok, defaults to semconv (e.g. "journald.boot.id")
_COMM: "custom.comm" # no semconv, but user named it => ok
_PID: "custom.pid" # has semconv, but user overrode it => ok
__CURSOR: # no semconv => unmarshal error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach. I believe that not all the fields are going to be log attributes, some of them should be resource attributes, right? In that case this configuration model needs to be adjusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, makes sense. We can finalize design during implementation, but something to this effect should work well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with waiting for semantic conventions to be added to the spec.
I've pushed a suggestion for what this could look like. Feedback is welcome.
Some of the fields will be resource attributes, so I've added a fields
and resource_fields
config item with a default value and users can override.
For the moment, anything that does not have a semantic convention defined gets passed through unmodified. I've added a commented out list of all the fields defined in the documentation.
Do we also want to specify a list of fields that we drop by default? There may be unknown fields, which I guess should be kept if they are unknown (passed through unmodified), so we should probably only drop known fields if they don't get assigned a semantic convention.
Closing this PR, as the codebase is being merged into collector contrib. On the topic of the PR, it seems the conclusion is that this is the right direction, but it should wait for semantic conventions to be added to the spec before we add support for mapping journald fields to attributes. |
Closes: open-telemetry/opentelemetry-collector-contrib#7298
The list of journald fields is defined here
The log data model suggests that the body should contain only the log message, other fields should be set as attributes.
This change applies the following field mappings:
MESSAGE => body
PRIORITY => severity, severity_text
__REALTIME_TIMESTAMP => timestamp
All other fields are stored as attributes.