Skip to content
This repository has been archived by the owner on May 25, 2022. It is now read-only.

[journald_input] Write journald fields as attributes instead of body. #353

Closed
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions docs/operators/journald_input.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ The `journald_input` operator reads logs from the systemd journal using the `jou

By default, `journalctl` will read from `/run/journal` or `/var/log/journal`. If either `directory` or `files` are set, `journalctl` will instead read from those.

The `journald_input` operator will use the `__REALTIME_TIMESTAMP` field of the journald entry as the parsed entry's timestamp. All other fields are added to the entry's body as returned by `journalctl`.
The `journald_input` operator will use the `__REALTIME_TIMESTAMP` field of the journald entry as the parsed entry's timestamp, the `PRIORITY` field of the journald entry as the parsed entry's severity, the `MESSAGE` field of the journald entry as the parsed entry's body. All other fields are added to the entry's attributes as returned by `journalctl`.

### Configuration Fields

Expand All @@ -16,7 +16,6 @@ The `journald_input` operator will use the `__REALTIME_TIMESTAMP` field of the j
| `files` | | A list of journal files to read entries from. |
| `units` | | A list of units to read entries from. |
| `priority` | `info` | Filter output by message priorities or priority ranges. |
| `write_to` | `$body` | The body [field](/docs/types/field.md) written to when creating a new log entry. |
| `start_at` | `end` | At startup, where to start reading logs from the file. Options are `beginning` or `end`. |
| `attributes` | {} | A map of `key: value` pairs to add to the entry's attributes. |
| `resource` | {} | A map of `key: value` pairs to add to the entry's resource. |
Expand Down Expand Up @@ -45,13 +44,14 @@ Output entry sample:
```json
"entry": {
"timestamp": "2020-04-16T11:05:49.516168-04:00",
"body": {
"severity": 9,
"severity_text": "info",
"body": "var-lib-docker-overlay2-bff8130ef3f66eeb81ce2102f1ac34cfa7a10fcbd1b8ae27c6c5a1543f64ddb7-merged.mount: Succeeded.",
"attributes": {
"CODE_FILE": "../src/core/unit.c",
"CODE_FUNC": "unit_log_success",
"CODE_LINE": "5487",
"MESSAGE": "var-lib-docker-overlay2-bff8130ef3f66eeb81ce2102f1ac34cfa7a10fcbd1b8ae27c6c5a1543f64ddb7-merged.mount: Succeeded.",
"MESSAGE_ID": "7ad2d189f7e94e70a38c781354912448",
"PRIORITY": "6",
"SYSLOG_FACILITY": "3",
"SYSLOG_IDENTIFIER": "systemd",
"USER_INVOCATION_ID": "de9283b4fd634213a50f5abe71b4d951",
Expand Down
64 changes: 62 additions & 2 deletions operator/builtin/input/journald/journald.go
Original file line number Diff line number Diff line change
Expand Up @@ -222,12 +222,35 @@ func (operator *JournaldInput) parseJournalEntry(line []byte) (*entry.Entry, str
return nil, "", errors.New("journald field for cursor is not a string")
}

entry, err := operator.NewEntry(body)
msg, ok := body["MESSAGE"]
if !ok {
return nil, "", errors.New("journald body missing MESSAGE field")
}
delete(body, "MESSAGE")

entry, err := operator.NewEntry(msg)
if err != nil {
return nil, "", fmt.Errorf("failed to create entry: %s", err)
}

entry.Timestamp = time.Unix(0, timestampInt*1000) // in microseconds

for k, v := range body {
switch k {
case "PRIORITY":
if err := addSeverity(entry, v); err != nil {
return nil, "", err
}
default:
str, ok := v.(string)
if !ok {
// attributes only supports strings at the moment
// https://github.com/open-telemetry/opentelemetry-log-collection/issues/190
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to convert common types to a string? I think all values are strings in the first place, but I think we should avoid silently dropping a field in the event it is not a string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, can do. The options were to wait for #190 to be fixed (I've not looked at what is needed for this) at which point we'll be able to do this properly or convert everything to a string.

}
entry.AddAttribute(k, str)
}
}

return entry, cursorString, nil
}

Expand All @@ -237,3 +260,40 @@ func (operator *JournaldInput) Stop() error {
operator.wg.Wait()
return nil
}

var severityMapping = [...]entry.Severity{
0: entry.Fatal,
1: entry.Error3,
2: entry.Error2,
3: entry.Error,
4: entry.Warn,
5: entry.Info2,
6: entry.Info,
7: entry.Debug,
}

var severityText = [...]string{
0: "emerg",
1: "alert",
2: "crit",
3: "err",
4: "warning",
5: "notice",
6: "info",
7: "debug",
}

func addSeverity(e *entry.Entry, sev interface{}) error {
sevInt, err := strconv.Atoi(sev.(string))
if err != nil {
return fmt.Errorf("severity field is not an int")
}

if sevInt < 0 || sevInt > 7 {
return fmt.Errorf("invalid severity '%d'", sevInt)
}

e.Severity = severityMapping[sevInt]
e.SeverityText = severityText[sevInt]
return nil
}
76 changes: 40 additions & 36 deletions operator/builtin/input/journald/journald_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -72,46 +72,50 @@ func TestInputJournald(t *testing.T) {
require.NoError(t, err)
defer op.Stop()

expected := map[string]interface{}{
"_BOOT_ID": "c4fa36de06824d21835c05ff80c54468",
"_CAP_EFFECTIVE": "0",
"_TRANSPORT": "journal",
"_UID": "1000",
"_EXE": "/usr/lib/systemd/systemd",
"_AUDIT_LOGINUID": "1000",
"MESSAGE": "run-docker-netns-4f76d707d45f.mount: Succeeded.",
"_PID": "13894",
"_CMDLINE": "/lib/systemd/systemd --user",
"_MACHINE_ID": "d777d00e7caf45fbadedceba3975520d",
"_SELINUX_CONTEXT": "unconfined\n",
"CODE_FUNC": "unit_log_success",
"SYSLOG_IDENTIFIER": "systemd",
"_HOSTNAME": "myhostname",
"MESSAGE_ID": "7ad2d189f7e94e70a38c781354912448",
"_SYSTEMD_CGROUP": "/user.slice/user-1000.slice/[email protected]/init.scope",
"_SOURCE_REALTIME_TIMESTAMP": "1587047866229317",
"USER_UNIT": "run-docker-netns-4f76d707d45f.mount",
"SYSLOG_FACILITY": "3",
"_SYSTEMD_SLICE": "user-1000.slice",
"_AUDIT_SESSION": "286",
"CODE_FILE": "../src/core/unit.c",
"_SYSTEMD_USER_UNIT": "init.scope",
"_COMM": "systemd",
"USER_INVOCATION_ID": "88f7ca6bbf244dc8828fa901f9fe9be1",
"CODE_LINE": "5487",
"_SYSTEMD_INVOCATION_ID": "83f7fc7799064520b26eb6de1630429c",
"PRIORITY": "6",
"_GID": "1000",
"_SYSTEMD_UNIT": "[email protected]",
"_SYSTEMD_USER_SLICE": "-.slice",
"__CURSOR": "s=b1e713b587ae4001a9ca482c4b12c005;i=1eed30;b=c4fa36de06824d21835c05ff80c54468;m=9f9d630205;t=5a369604ee333;x=16c2d4fd4fdb7c36",
"__MONOTONIC_TIMESTAMP": "685540311557",
"_SYSTEMD_OWNER_UID": "1000",
expected := &entry.Entry{
Timestamp: time.UnixMicro(1587047866229555),
Severity: entry.Info,
SeverityText: "info",
Body: "run-docker-netns-4f76d707d45f.mount: Succeeded.",
Attributes: map[string]string{
"_BOOT_ID": "c4fa36de06824d21835c05ff80c54468",
"_CAP_EFFECTIVE": "0",
"_TRANSPORT": "journal",
"_UID": "1000",
"_EXE": "/usr/lib/systemd/systemd",
"_AUDIT_LOGINUID": "1000",
"_PID": "13894",
"_CMDLINE": "/lib/systemd/systemd --user",
"_MACHINE_ID": "d777d00e7caf45fbadedceba3975520d",
"_SELINUX_CONTEXT": "unconfined\n",
"CODE_FUNC": "unit_log_success",
"SYSLOG_IDENTIFIER": "systemd",
"_HOSTNAME": "myhostname",
"MESSAGE_ID": "7ad2d189f7e94e70a38c781354912448",
"_SYSTEMD_CGROUP": "/user.slice/user-1000.slice/[email protected]/init.scope",
"_SOURCE_REALTIME_TIMESTAMP": "1587047866229317",
"USER_UNIT": "run-docker-netns-4f76d707d45f.mount",
"SYSLOG_FACILITY": "3",
"_SYSTEMD_SLICE": "user-1000.slice",
"_AUDIT_SESSION": "286",
"CODE_FILE": "../src/core/unit.c",
"_SYSTEMD_USER_UNIT": "init.scope",
"_COMM": "systemd",
"USER_INVOCATION_ID": "88f7ca6bbf244dc8828fa901f9fe9be1",
"CODE_LINE": "5487",
"_SYSTEMD_INVOCATION_ID": "83f7fc7799064520b26eb6de1630429c",
"_GID": "1000",
"_SYSTEMD_UNIT": "[email protected]",
"_SYSTEMD_USER_SLICE": "-.slice",
"__CURSOR": "s=b1e713b587ae4001a9ca482c4b12c005;i=1eed30;b=c4fa36de06824d21835c05ff80c54468;m=9f9d630205;t=5a369604ee333;x=16c2d4fd4fdb7c36",
"__MONOTONIC_TIMESTAMP": "685540311557",
"_SYSTEMD_OWNER_UID": "1000",
Copy link
Member

@dmitryax dmitryax Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these fields are not compliant with OTel naming guidelines. I would rather recommend to make this change in one leap to the attribute names that will be used going forward than just moving all the fields to attributes as is and changing them after that. This would require semantic conventions PR first.

Another concern I have is that I'm not sure we need all these fields. Many of them don't provide real value IMO. I think we can reduce number of attributes to essential ones. That's why I suggested to provide an option to not parse the journald payload in the issue, in case if user do want all of these fields as is .

cc @djaglowski

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these fields are not compliant with OTel naming guidelines. I would rather recommend to make this change in one leap to the attribute names that will be used going forward than just moving all the fields to attributes as is and changing them after that. This would require semantic conventions PR first.

Unfortunately, I think waiting for the semantic conventions blocks most of this PR for now, as the body will have to remain structured in order to hold most of the fields. I am on board with waiting though, as this is the best way to ensure we are only moving in the right direction.

Shall we:

  1. Have a small PR now that just parses out severity
  2. Make a PR to spec that proposes semantic conventions
  3. Circle back to this after 2 is accepted

@gregoryfranklin, are you on board with this? Are you willing to propose the semantic conventions?

Copy link
Member

@djaglowski djaglowski Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another concern I have is that I'm not sure we need all these fields. Many of them don't provide real value IMO. I think we can reduce number of attributes to essential ones. That's why I suggested to provide an option to not parse the journald payload in the issue, in case if user do want all of these fields as is .

One design option that comes to mind is something like the preset option used for severity parsing. Basically, we can have some predefined sets of values that will be parsed. One of these sets is the default. The user can optionally choose another set and/or add additional fields to any set.

# just the default set
- type: journald_input

# default set plus a few extra fields
- type: journald_input
  fields:
  - USER_INVOCATION_ID
  - _SYSTEMD_USER_UNIT

# alternate predefined set plus a few extra fields
- type: journald_input
  field_set: some_other_set
  fields:
  - USER_INVOCATION_ID
  - _SYSTEMD_USER_UNIT

# only a few specific fields
- type: journald_input
  field_set: none
  fields:
  - USER_INVOCATION_ID
  - _SYSTEMD_USER_UNIT

Copy link
Member

@dmitryax dmitryax Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we:

Have a small PR now that just parses out severity
Make a PR to spec that proposes semantic conventions
Circle back to this after 2 is accepted

The plan looks good to me. And I think 1 should be just an additive change - we can keep the payload and do not remove PRIORITY for now.

One design option that comes to mind is something like the preset option used for severity parsing. Basically, we can have some predefined sets of values that will be parsed. One of these sets is the default. The user can optionally choose another set and/or add additional fields to any set.

This sounds good. Maybe even just fields would be enough. But would it mean that we have to come up with attribute names (semantic conventions) for all existing journald fields?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe even just fields would be enough.

True. If you leave it unspecified you get all that we have semantic conventions for, otherwise you get what you specify.

But would it mean that we have to come up with attribute names (semantic conventions) for all existing journald fields?

I suppose we could make fields a map, where a the key is the journald field name, and the value is the name of the attribute. User can leave value empty if we have a default mapping for the field, otherwise must specify.

- type: journald_input
  fields:
    _BOOT_ID:            # has semconv => ok, defaults to semconv (e.g. "journald.boot.id")
    _COMM: "custom.comm" # no semconv, but user named it => ok
    _PID: "custom.pid"   # has semconv, but user overrode it => ok
    __CURSOR:            # no semconv => unmarshal error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach. I believe that not all the fields are going to be log attributes, some of them should be resource attributes, right? In that case this configuration model needs to be adjusted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense. We can finalize design during implementation, but something to this effect should work well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with waiting for semantic conventions to be added to the spec.

I've pushed a suggestion for what this could look like. Feedback is welcome.

Some of the fields will be resource attributes, so I've added a fields and resource_fields config item with a default value and users can override.

For the moment, anything that does not have a semantic convention defined gets passed through unmodified. I've added a commented out list of all the fields defined in the documentation.

Do we also want to specify a list of fields that we drop by default? There may be unknown fields, which I guess should be kept if they are unknown (passed through unmodified), so we should probably only drop known fields if they don't get assigned a semantic convention.

},
}

select {
case e := <-received:
require.Equal(t, expected, e.Body)
require.Equal(t, expected, e)
case <-time.After(time.Second):
require.FailNow(t, "Timed out waiting for entry to be read")
}
Expand Down