diff --git a/pkg/stanza/Makefile b/pkg/stanza/Makefile new file mode 100644 index 000000000000..ded7a36092dc --- /dev/null +++ b/pkg/stanza/Makefile @@ -0,0 +1 @@ +include ../../Makefile.Common diff --git a/pkg/stanza/doc.go b/pkg/stanza/doc.go new file mode 100644 index 000000000000..c05d13268ff3 --- /dev/null +++ b/pkg/stanza/doc.go @@ -0,0 +1,15 @@ +// Copyright The OpenTelemetry Authors +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package stanza // import "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza" diff --git a/pkg/stanza/docs/operators/README.md b/pkg/stanza/docs/operators/README.md new file mode 100644 index 000000000000..70a1d5b8870b --- /dev/null +++ b/pkg/stanza/docs/operators/README.md @@ -0,0 +1,37 @@ +## Status + +This library is in the process of being transferred from the [opentelemetry-log-collection](https://github.com/open-telemetry/opentelemetry-log-collection) repository. The code is not yet being used by the collector. + +## What is an operator? +An operator is a step in a log processing pipeline. It is used to perform a single action on a log entry, such as reading lines from a file, or parsing JSON from a field. Operators are used in log receivers to interpret logs into OpenTelemetry's log data model. + +For instance, a user may read lines from a file using the `file_input` operator. From there, the results of this operation may be sent to a `regex_parser` operator that isolate fields based on a regex pattern. Finally, it is common to convert fields into the log data model's top-level fields, such as timestamp, severity, scope, and trace. + + +## What operators are available? + +Parsers: +- [csv_parser](/docs/operators/csv_parser.md) +- [json_parser](/docs/operators/json_parser.md) +- [regex_parser](/docs/operators/regex_parser.md) +- [syslog_parser](/docs/operators/syslog_parser.md) +- [severity_parser](/docs/operators/severity_parser.md) +- [time_parser](/docs/operators/time_parser.md) +- [trace_parser](/docs/operators/trace_parser.md) +- [uri_parser](/docs/operators/uri_parser.md) + +Transformers: +- [add](/docs/operators/add.md) +- [copy](/docs/operators/copy.md) +- [filter](/docs/operators/filter.md) +- [flatten](/docs/operators/flatten.md) +- [metadata](/docs/operators/metadata.md) +- [move](/docs/operators/move.md) +- [recombine](/docs/operators/recombine.md) +- [remove](/docs/operators/remove.md) +- [retain](/docs/operators/retain.md) +- [router](/docs/operators/router.md) + +Outputs (Useful for debugging): +- [file_output](docs/operators/file_output.md) +- [stdout](/docs/operators/stdout.md) diff --git a/pkg/stanza/docs/operators/add.md b/pkg/stanza/docs/operators/add.md new file mode 100644 index 000000000000..10ba45ea4eb0 --- /dev/null +++ b/pkg/stanza/docs/operators/add.md @@ -0,0 +1,274 @@ +## `add` operator + +The `add` operator adds a value to an `entry`'s `body`, `attributes`, or `resource`. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `add` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `field` | required | The [field](/docs/types/field.md) to be added. | +| `value` | required | `value` is either a static value or an [expression](/docs/types/expression.md). If a value is specified, it will be added to each entry at the field defined by `field`. If an expression is specified, it will be evaluated for each entry and added at the field defined by `field`. | +| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](/docs/types/on_error.md). | +| `if` | | An [expression](/docs/types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. | + + +### Example Configurations: + +
+Add a string to the body + +```yaml +- type: add + field: body.key2 + value: body.val2 +``` + + + + + + + +
Input entry Output entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key1": "val1", + } +} +``` + + + +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key1": "val1", + "key2": "val2" + } +} +``` + +
+ +
+Add a value to the body using an expression + +```yaml +- type: add + field: body.key2 + value: EXPR(body.key1 + "_suffix") +``` + + + + + + + +
Input entry Output entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key1": "val1", + } +} +``` + + + +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key1": "val1", + "key2": "val_suffix" + } +} +``` + +
+ +
+Add an object to the body + +```yaml +- type: add + field: body.key2 + value: + nestedkey: nestedvalue +``` + + + + + + + +
Input entry Output entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key1": "val1", + } +} +``` + + + +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key1": "val1", + "key2": { + "nestedkey":"nested value" + } + } +} +``` + +
+ +
+Add a value to attributes + +```yaml +- type: add + field: attributes.key2 + value: val2 +``` + + + + + + + +
Input entry Output entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key1": "val1", + } +} +``` + + + +```json +{ + "resource": { }, + "attributes": { + "key2": "val2" + }, + "body": { + "key1": "val1" + } +} +``` + +
+ +
+Add a value to resource + +```yaml +- type: add + field: resource.key2 + value: val2 +``` + + + + + + + +
Input entry Output entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key1": "val1", + } +} +``` + + + +```json +{ + "resource": { + "key2": "val2" + }, + "attributes": { }, + "body": { + "key1": "val1" + } +} +``` + +
+ +Add a value to resource using an expression + +```yaml +- type: add + field: resource.key2 + value: EXPR(body.key1 + "_suffix") +``` + + + + + + + +
Input entry Output entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key1": "val1", + } +} +``` + + + +```json +{ + "resource": { + "key2": "val_suffix" + }, + "attributes": { }, + "body": { + "key1": "val1", + } +} +``` + +
\ No newline at end of file diff --git a/pkg/stanza/docs/operators/copy.md b/pkg/stanza/docs/operators/copy.md new file mode 100644 index 000000000000..5cccf796413d --- /dev/null +++ b/pkg/stanza/docs/operators/copy.md @@ -0,0 +1,198 @@ +## `copy` operator + +The `copy` operator copies a value from one [field](/docs/types/field.md) to another. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `copy` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `from` | required | The [field](/docs/types/field.md) from which the value should be copied. | +| `to` | required | The [field](/docs/types/field.md) to which the value should be copied. | +| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](/docs/types/on_error.md). | +| `if` | | An [expression](/docs/types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. | + +### Example Configurations: + +
+Copy a value from the body to resource + +```yaml +- type: copy + from: body.key + to: resource.newkey +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key":"value" + } +} +``` + + + +```json +{ + "resource": { + "newkey":"value" + }, + "attributes": { }, + "body": { + "key":"value" + } +} +``` + +
+ +
+ +Copy a value from the body to attributes +```yaml +- type: copy + from: body.key2 + to: attributes.newkey +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key1": "val1", + "key2": "val2" + } +} +``` + + + +```json +{ + "resource": { }, + "attributes": { + "newkey": "val2" + }, + "body": { + "key3": "val1", + "key2": "val2" + } +} +``` + +
+ +
+ +Copy a value from attributes to the body +```yaml +- type: copy + from: attributes.key + to: body.newkey +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "resource": { }, + "attributes": { + "key": "newval" + }, + "body": { + "key1": "val1", + "key2": "val2" + } +} +``` + + + +```json +{ + "resource": { }, + "attributes": { + "key": "newval" + }, + "body": { + "key3": "val1", + "key2": "val2", + "newkey": "newval" + } +} +``` + +
+ +
+ +Copy a value within the body +```yaml +- type: copy + from: body.obj.nested + to: body.newkey +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "obj": { + "nested":"nestedvalue" + } + } +} +``` + + + +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "obj": { + "nested":"nestedvalue" + }, + "newkey":"nestedvalue" + } +} +``` + +
\ No newline at end of file diff --git a/pkg/stanza/docs/operators/csv_parser.md b/pkg/stanza/docs/operators/csv_parser.md new file mode 100644 index 000000000000..5c475cfc43a0 --- /dev/null +++ b/pkg/stanza/docs/operators/csv_parser.md @@ -0,0 +1,209 @@ +## `csv_parser` operator + +The `csv_parser` operator parses the string-type field selected by `parse_from` with the given header values. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `csv_parser` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `header` | required when `header_attribute` not set | A string of delimited field names | +| `header_attribute` | required when `header` not set | An attribute name to read the header field from, to support dynamic field names | +| `delimiter` | `,` | A character that will be used as a delimiter. Values `\r` and `\n` cannot be used as a delimiter. | +| `lazy_quotes` | `false` | If true, a quote may appear in an unquoted field and a non-doubled quote may appear in a quoted field. | +| `parse_from` | `body` | The [field](/docs/types/field.md) from which the value will be parsed. | +| `parse_to` | `body` | The [field](/docs/types/field.md) to which the value will be parsed. | +| `preserve_to` | | Preserves the unparsed value at the specified [field](/docs/types/field.md). | +| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](/docs/types/on_error.md). | +| `timestamp` | `nil` | An optional [timestamp](/docs/types/timestamp.md) block which will parse a timestamp field before passing the entry to the output operator. | +| `severity` | `nil` | An optional [severity](/docs/types/severity.md) block which will parse a severity field before passing the entry to the output operator. | + +### Example Configurations + +#### Parse the field `message` with a csv parser + +Configuration: + +```yaml +- type: csv_parser + header: id,severity,message +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "body": "1,debug,Debug Message" +} +``` + + + +```json +{ + "body": { + "id": "1", + "severity": "debug", + "message": "Debug Message" + } +} +``` + +
+ +#### Parse the field `message` with a csv parser using tab delimiter + +Configuration: + +```yaml +- type: csv_parser + parse_from: body.message + header: id,severity,message + delimiter: "\t" +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "body": { + "message": "1 debug Debug Message" + } +} +``` + + + +```json +{ + "body": { + "id": "1", + "severity": "debug", + "message": "Debug Message" + } +} +``` + +
+ +#### Parse the field `message` with csv parser and also parse the timestamp + +Configuration: + +```yaml +- type: csv_parser + header: 'timestamp_field,severity,message' + timestamp: + parse_from: body.timestamp_field + layout_type: strptime + layout: '%Y-%m-%d' +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "timestamp": "", + "body": { + "message": "2021-03-17,debug,Debug Message" + } +} +``` + + + +```json +{ + "timestamp": "2021-03-17T00:00:00-00:00", + "body": { + "severity": "debug", + "message": "Debug Message" + } +} +``` + +
+ +#### Parse the field `message` using dynamic field names + +Dynamic field names can be had when leveraging file_input's `label_regex`. + +Configuration: + +```yaml +- type: file_input + include: + - ./dynamic.log + start_at: beginning + label_regex: '^#(?P.*?): (?P.*)' + +- type: csv_parser + delimiter: "," + header_attribute: Fields +``` + +Input File: + +``` +#Fields: "id,severity,message" +1,debug,Hello +``` + + + + + + + +
Input record Output record
+ +Entry (from file_input): + +```json +{ + "timestamp": "", + "labels": { + "fields": "id,severity,message" + }, + "record": { + "message": "1,debug,Hello" + } +} +``` + + + +```json +{ + "timestamp": "", + "labels": { + "fields": "id,severity,message" + }, + "record": { + "id": "1", + "severity": "debug", + "message": "Hello" + } +} +``` + +
diff --git a/pkg/stanza/docs/operators/file_input.md b/pkg/stanza/docs/operators/file_input.md new file mode 100644 index 000000000000..5cf50d93502c --- /dev/null +++ b/pkg/stanza/docs/operators/file_input.md @@ -0,0 +1,143 @@ +## `file_input` operator + +The `file_input` operator reads logs from files. It will place the lines read into the `message` field of the new entry. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `file_input` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `include` | required | A list of file glob patterns that match the file paths to be read. | +| `exclude` | [] | A list of file glob patterns to exclude from reading. | +| `poll_interval` | 200ms | The duration between filesystem polls. | +| `multiline` | | A `multiline` configuration block. See below for details. | +| `force_flush_period` | `500ms` | Time since last read of data from file, after which currently buffered log should be send to pipeline. Takes [duration](../types/duration.md) as value. Zero means waiting for new data forever. | +| `encoding` | `utf-8` | The encoding of the file being read. See the list of supported encodings below for available options. | +| `include_file_name` | `true` | Whether to add the file name as the attribute `log.file.name`. | +| `include_file_path` | `false` | Whether to add the file path as the attribute `log.file.path`. | +| `include_file_name_resolved` | `false` | Whether to add the file name after symlinks resolution as the attribute `log.file.name_resolved`. | +| `include_file_path_resolved` | `false` | Whether to add the file path after symlinks resolution as the attribute `log.file.path_resolved`. | +| `start_at` | `end` | At startup, where to start reading logs from the file. Options are `beginning` or `end`. | +| `fingerprint_size` | `1kb` | The number of bytes with which to identify a file. The first bytes in the file are used as the fingerprint. Decreasing this value at any point will cause existing fingerprints to forgotten, meaning that all files will be read from the beginning (one time). | +| `max_log_size` | `1MiB` | The maximum size of a log entry to read before failing. Protects against reading large amounts of data into memory |. +| `max_concurrent_files` | 1024 | The maximum number of log files from which logs will be read concurrently (minimum = 2). If the number of files matched in the `include` pattern exceeds half of this number, then files will be processed in batches. One batch will be processed per `poll_interval`. | +| `attributes` | {} | A map of `key: value` pairs to add to the entry's attributes. | +| `resource` | {} | A map of `key: value` pairs to add to the entry's resource. | + +Note that by default, no logs will be read unless the monitored file is actively being written to because `start_at` defaults to `end`. + +`include` and `exclude` fields use `github.com/bmatcuk/doublestar` for expression language. +For reference documentation see [here](https://github.com/bmatcuk/doublestar#patterns). + +#### `multiline` configuration + +If set, the `multiline` configuration block instructs the `file_input` operator to split log entries on a pattern other than newlines. + +The `multiline` configuration block must contain exactly one of `line_start_pattern` or `line_end_pattern`. These are regex patterns that +match either the beginning of a new log entry, or the end of a log entry. + +If using multiline, last log can sometimes be not flushed due to waiting for more content. +In order to forcefully flush last buffered log after certain period of time, +use `force_flush_period` option. + +Also refer to [recombine](/docs/operators/recombine.md) operator for merging events with greater control. + +### File rotation + +When files are rotated and its new names are no longer captured in `include` pattern (i.e. tailing symlink files), it could result in data loss. +To avoid the data loss, choose move/create rotation method and set `max_concurrent_files` higher than the twice of the number of files to tail. + +### Supported encodings + +| Key | Description +| --- | --- | +| `nop` | No encoding validation. Treats the file as a stream of raw bytes | +| `utf-8` | UTF-8 encoding | +| `utf-16le` | UTF-16 encoding with little-endian byte order | +| `utf-16be` | UTF-16 encoding with little-endian byte order | +| `ascii` | ASCII encoding | +| `big5` | The Big5 Chinese character encoding | + +Other less common encodings are supported on a best-effort basis. See [https://www.iana.org/assignments/character-sets/character-sets.xhtml](https://www.iana.org/assignments/character-sets/character-sets.xhtml) for other encodings available. + + +### Example Configurations + +#### Simple file input + +Configuration: +```yaml +- type: file_input + include: + - ./test.log +``` + + + + + + + +
`./test.log` Output bodies
+ +``` +log1 +log2 +log3 +``` + + + +```json +{ + "body": "log1" +}, +{ + "body": "log2" +}, +{ + "body": "log3" +} +``` + +
+ +#### Multiline file input + +Configuration: +```yaml +- type: file_input + include: + - ./test.log + multiline: + line_start_pattern: 'START ' +``` + + + + + + + +
`./test.log` Output bodies
+ +``` +START log1 +log2 +START log3 +log4 +``` + + + +```json +{ + "body": "START log1\nlog2\n" +}, +{ + "body": "START log3\nlog4\n" +} +``` + +
diff --git a/pkg/stanza/docs/operators/file_output.md b/pkg/stanza/docs/operators/file_output.md new file mode 100644 index 000000000000..a5b1fca4e5ca --- /dev/null +++ b/pkg/stanza/docs/operators/file_output.md @@ -0,0 +1,31 @@ +## `file_output` operator + +The `file_output` operator will write log entries to a file. By default, they will be written as JSON-formatted lines, but if a `Format` is provided, that format will be used as a template to render each log line + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `file_output` | A unique identifier for the operator. | +| `path` | required | The file path to which entries will be written. | +| `format` | | A [go template](https://golang.org/pkg/text/template/) that will be used to render each entry into a log line. | + + +### Example Configurations + +#### Simple configuration + +Configuration: +```yaml +- type: file_output + path: /tmp/output.json +``` + +#### Custom format + +Configuration: +```yaml +- type: file_output + path: /tmp/output.log + format: "Time: {{.Timestamp}} Body: {{.Body}}\n" +``` diff --git a/pkg/stanza/docs/operators/filter.md b/pkg/stanza/docs/operators/filter.md new file mode 100644 index 000000000000..a32b71d7fb84 --- /dev/null +++ b/pkg/stanza/docs/operators/filter.md @@ -0,0 +1,38 @@ +## `filter` operator + +The `filter` operator filters incoming entries that match an expression. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `filter` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `expr` | required | Incoming entries that match this [expression](/docs/types/expression.md) will be dropped. | +| `drop_ratio` | 1.0 | The probability a matching entry is dropped (used for sampling). A value of 1.0 will drop 100% of matching entries, while a value of 0.0 will drop 0%. | + +### Examples + +#### Filter entries based on a regex pattern + +```yaml +- type: filter + expr: 'body.message matches "^LOG: .* END$"' + output: my_output +``` + +#### Filter entries based on a label value + +```yaml +- type: filter + expr: 'attributes.env == "production"' + output: my_output +``` + +#### Filter entries based on an environment variable + +```yaml +- type: filter + expr: 'body.message == env("MY_ENV_VARIABLE")' + output: my_output +``` diff --git a/pkg/stanza/docs/operators/flatten.md b/pkg/stanza/docs/operators/flatten.md new file mode 100644 index 000000000000..f37d6d1abd14 --- /dev/null +++ b/pkg/stanza/docs/operators/flatten.md @@ -0,0 +1,116 @@ +## `flatten` operator + +The `flatten` operator flattens a field by moving its children up to the same level as the field. +The operator only flattens a single level deep. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `flatten` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `field` | required | The [field](/docs/types/field.md) to be flattened. | +| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](/docs/types/on_error.md). | +| `if` | | An [expression](/docs/types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. | + +### Example Configurations: + +
+Flatten an object to the base of the body +
+
+ +```yaml +- type: flatten + field: body.key1 +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key1": { + "nested1": "nestedval1", + "nested2": "nestedval2" + }, + "key2": "val2" + } +} +``` + + + +```json + { + "resource": { }, + "attributes": { }, + "body": { + "nested1": "nestedval1", + "nested2": "nestedval2", + "key2": "val2" + } + } +``` + +
+ +
+Flatten an object within another object +
+
+ +```yaml +- type: flatten + field: body.wrapper.key1 +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "wrapper": { + "key1": { + "nested1": "nestedval1", + "nested2": "nestedval2" + }, + "key2": "val2" + } + } +} +``` + + + +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "wrapper": { + "nested1": "nestedval1", + "nested2": "nestedval2", + "key2": "val2" + } + } +} +``` + +
diff --git a/pkg/stanza/docs/operators/generate_input.md b/pkg/stanza/docs/operators/generate_input.md new file mode 100644 index 000000000000..1e5ec64c88ca --- /dev/null +++ b/pkg/stanza/docs/operators/generate_input.md @@ -0,0 +1,44 @@ +## `generate_input` operator + +The `generate_input` operator generates log entries with a static body. This is useful for testing pipelines, especially when +coupled with the [`rate_limit`](/docs/operators/rate_limit.md) operator. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `generate_input` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `entry` | | A [entry](/docs/types/entry.md) log entry to repeatedly generate. | +| `count` | 0 | The number of entries to generate before stopping. A value of 0 indicates unlimited. | +| `static` | `false` | If true, the timestamp of the entry will remain static after each invocation. | + +### Example Configurations + +#### Mock a file input + +Configuration: +```yaml +- type: generate_input + entry: + body: + message1: log1 + message2: log2 +``` + +Output bodies: +```json +{ + "body": { + "message1": "log1", + "message2": "log2" + }, +}, +{ + "body": { + "message1": "log1", + "message2": "log2" + }, +}, +... +``` diff --git a/pkg/stanza/docs/operators/journald_input.md b/pkg/stanza/docs/operators/journald_input.md new file mode 100644 index 000000000000..bf80c11570a1 --- /dev/null +++ b/pkg/stanza/docs/operators/journald_input.md @@ -0,0 +1,84 @@ +## `journald_input` operator + +The `journald_input` operator reads logs from the systemd journal using the `journalctl` binary, which must be in the `$PATH` of the agentt. + +By default, `journalctl` will read from `/run/journal` or `/var/log/journal`. If either `directory` or `files` are set, `journalctl` will instead read from those. + +The `journald_input` operator will use the `__REALTIME_TIMESTAMP` field of the journald entry as the parsed entry's timestamp. All other fields are added to the entry's body as returned by `journalctl`. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `journald_input` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `directory` | | A directory containing journal files to read entries from. | +| `files` | | A list of journal files to read entries from. | +| `units` | | A list of units to read entries from. | +| `priority` | `info` | Filter output by message priorities or priority ranges. | +| `start_at` | `end` | At startup, where to start reading logs from the file. Options are `beginning` or `end`. | +| `attributes` | {} | A map of `key: value` pairs to add to the entry's attributes. | +| `resource` | {} | A map of `key: value` pairs to add to the entry's resource. | + +### Example Configurations +```yaml +- type: journald_input + units: + - ssh + - kubelet + priority: info +``` + +```yaml +- type: journald_input + priority: emerg..err +``` +#### Simple journald input + +Configuration: +```yaml +- type: journald_input +``` + +Output entry sample: +```json +"entry": { + "timestamp": "2020-04-16T11:05:49.516168-04:00", + "body": { + "CODE_FILE": "../src/core/unit.c", + "CODE_FUNC": "unit_log_success", + "CODE_LINE": "5487", + "MESSAGE": "var-lib-docker-overlay2-bff8130ef3f66eeb81ce2102f1ac34cfa7a10fcbd1b8ae27c6c5a1543f64ddb7-merged.mount: Succeeded.", + "MESSAGE_ID": "7ad2d189f7e94e70a38c781354912448", + "PRIORITY": "6", + "SYSLOG_FACILITY": "3", + "SYSLOG_IDENTIFIER": "systemd", + "USER_INVOCATION_ID": "de9283b4fd634213a50f5abe71b4d951", + "USER_UNIT": "var-lib-docker-overlay2-bff8130ef3f66eeb81ce2102f1ac34cfa7a10fcbd1b8ae27c6c5a1543f64ddb7-merged.mount", + "_AUDIT_LOGINUID": "1000", + "_AUDIT_SESSION": "299", + "_BOOT_ID": "c4fa36de06824d21835c05ff80c54468", + "_CAP_EFFECTIVE": "0", + "_CMDLINE": "/lib/systemd/systemd --user", + "_COMM": "systemd", + "_EXE": "/usr/lib/systemd/systemd", + "_GID": "1000", + "_HOSTNAME": "testhost", + "_MACHINE_ID": "d777d00e7caf45fbadedceba3975520d", + "_PID": "18667", + "_SELINUX_CONTEXT": "unconfined\n", + "_SOURCE_REALTIME_TIMESTAMP": "1587049549515868", + "_SYSTEMD_CGROUP": "/user.slice/user-1000.slice/user@1000.service/init.scope", + "_SYSTEMD_INVOCATION_ID": "da8b20bdc65e4f6f9ca35d6352199b56", + "_SYSTEMD_OWNER_UID": "1000", + "_SYSTEMD_SLICE": "user-1000.slice", + "_SYSTEMD_UNIT": "user@1000.service", + "_SYSTEMD_USER_SLICE": "-.slice", + "_SYSTEMD_USER_UNIT": "init.scope", + "_TRANSPORT": "journal", + "_UID": "1000", + "__CURSOR": "s=b1e713b587ae4001a9ca482c4b12c005;i=1efec9;b=c4fa36de06824d21835c05ff80c54468;m=a001b7ec5a;t=5a369c4a3cd88;x=f9717e0b5608807b", + "__MONOTONIC_TIMESTAMP": "687223598170" + } +} +``` diff --git a/pkg/stanza/docs/operators/json_parser.md b/pkg/stanza/docs/operators/json_parser.md new file mode 100644 index 000000000000..b3c79c6f0dfa --- /dev/null +++ b/pkg/stanza/docs/operators/json_parser.md @@ -0,0 +1,203 @@ +## `json_parser` operator + +The `json_parser` operator parses the string-type field selected by `parse_from` as JSON. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `json_parser` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `parse_from` | `body` | The [field](/docs/types/field.md) from which the value will be parsed. | +| `parse_to` | `body` | The [field](/docs/types/field.md) to which the value will be parsed. | +| `preserve_to` | | Preserves the unparsed value at the specified [field](/docs/types/field.md). | +| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](/docs/types/on_error.md). | +| `if` | | An [expression](/docs/types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. | +| `timestamp` | `nil` | An optional [timestamp](/docs/types/timestamp.md) block which will parse a timestamp field before passing the entry to the output operator. | +| `severity` | `nil` | An optional [severity](/docs/types/severity.md) block which will parse a severity field before passing the entry to the output operator. | + + +### Example Configurations + + +#### Parse the field `message` as JSON + +Configuration: +```yaml +- type: json_parser + parse_from: body.message +``` + + + + + + + +
Input body Output body
+ +```json +{ + "timestamp": "", + "body": { + "message": "{\"key\": \"val\"}" + } +} +``` + + + +```json +{ + "timestamp": "", + "body": { + "key": "val" + } +} +``` + +
+ +#### Parse a nested field to a different field, preserving original + +Configuration: +```yaml +- type: json_parser + parse_from: body.message.embedded + parse_to: body.parsed + preserve_to: body.message.embedded +``` + + + + + + + +
Input body Output body
+ +```json +{ + "timestamp": "", + "body": { + "message": { + "embedded": "{\"key\": \"val\"}" + } + } +} +``` + + + +```json +{ + "timestamp": "", + "body": { + "message": { + "embedded": "{\"key\": \"val\"}" + }, + "parsed": { + "key": "val" + } + } +} +``` + +
+ +#### Parse the field `message` as JSON, and parse the timestamp + +Configuration: +```yaml +- type: json_parser + parse_from: body.message + timestamp: + parse_from: body.seconds_since_epoch + layout_type: epoch + layout: s +``` + + + + + + + +
Input body Output body
+ +```json +{ + "timestamp": "", + "body": { + "message": "{\"key\": \"val\", \"seconds_since_epoch\": 1136214245}" + } +} +``` + + + +```json +{ + "timestamp": "2006-01-02T15:04:05-07:00", + "body": { + "key": "val" + } +} +``` + +
+ +#### Parse the body only if it starts and ends with brackets + +Configuration: +```yaml +- type: json_parser + if: '$matches "^{.*}$"' +``` + + + + + + + + + + + + +
Input body Output body
+ +```json +{ + "body": "{\"key\": \"val\"}" +} +``` + + + +```json +{ + "body": { + "key": "val" + } +} +``` + +
+ +```json +{ + "body": "notjson" +} +``` + + + +```json +{ + "body": "notjson" +} +``` + +
diff --git a/pkg/stanza/docs/operators/k8s_event_input.md b/pkg/stanza/docs/operators/k8s_event_input.md new file mode 100644 index 000000000000..b1fd6d07424d --- /dev/null +++ b/pkg/stanza/docs/operators/k8s_event_input.md @@ -0,0 +1,68 @@ +## `k8s_event_input` operator + +The `k8s_event_input` operator generates logs from Kubernetes events. It does this by connecting to the +Kubernetes API, and currently requires that Stanza is running inside a Kubernetes cluster. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `k8s_event_input` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `namespaces` | All namespaces | An array of namespaces to collect events from.. | +| `discover_namespaces` | `true` | If true, the operator will regularly poll for new namespaces to include. | +| `discovery_interval ` | `1m` | The interval at which the operator searches for new namespaces to follow. | +| `attributes` | {} | A map of `key: value` pairs to add to the entry's attributes. | +| `resource` | {} | A map of `key: value` pairs to add to the entry's resource. | + +### Example Configurations + +#### Mock a file input + +Configuration: +```yaml +- type: k8s_event_input +``` + +Output events: +```json +{ + "timestamp": "2020-08-13T17:41:44.581552468Z", + "severity": 0, + "attributes": { + "event_type": "ADDED" + }, + "body": { + "count": 1, + "eventTime": null, + "firstTimestamp": "2020-08-13T16:43:57Z", + "involvedObject": { + "apiVersion": "v1", + "fieldPath": "spec.containers{stanza}", + "kind": "Pod", + "name": "stanza-g6rzd", + "namespace": "default", + "resourceVersion": "18292818", + "uid": "47d965e6-4bb3-4c58-a089-1a8b16bf21b0" + }, + "lastTimestamp": "2020-08-13T16:43:57Z", + "message": "Pulling image \"observiq/stanza:dev\"", + "metadata": { + "creationTimestamp": "2020-08-13T16:43:57Z", + "name": "stanza-g6rzd.162ae19292cebe25", + "namespace": "default", + "resourceVersion": "29923", + "selfLink": "/api/v1/namespaces/default/events/stanza-g6rzd.162ae19292cebe25", + "uid": "d210b74b-5c58-473f-ac51-3e21f6f8e2d1" + }, + "reason": "Pulling", + "reportingComponent": "", + "reportingInstance": "", + "source": { + "component": "kubelet", + "host": "kube-master-1" + }, + "type": "Normal" + } +} +``` diff --git a/pkg/stanza/docs/operators/key_value_parser.md b/pkg/stanza/docs/operators/key_value_parser.md new file mode 100644 index 000000000000..c20ea1b2ee91 --- /dev/null +++ b/pkg/stanza/docs/operators/key_value_parser.md @@ -0,0 +1,178 @@ +## `key_value_parser` operator + +The `key_value_parser` operator parses the string-type field selected by `parse_from` into key value pairs. All values are of type string. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `key_value_parser` | A unique identifier for the operator. | +| `delimiter` | `=` | The delimiter used for splitting a value into a key value pair. | +| `pair_delimiter` | | The delimiter used for separating key value pairs, defaults to whitespace. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `parse_from` | `body` | A [field](/docs/types/field.md) that indicates the field to be parsed into key value pairs. | +| `parse_to` | `body` | A [field](/docs/types/field.md) that indicates the field to be parsed as into key value pairs. | +| `preserve_to` | | Preserves the unparsed value at the specified [field](/docs/types/field.md). | +| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](/docs/types/on_error.md). | +| `if` | | An [expression](/docs/types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. | +| `timestamp` | `nil` | An optional [timestamp](/docs/types/timestamp.md) block which will parse a timestamp field before passing the entry to the output operator. | +| `severity` | `nil` | An optional [severity](/docs/types/severity.md) block which will parse a severity field before passing the entry to the output operator. | + + +### Example Configurations + +#### Parse the field `message` into key value pairs + +Configuration: +```yaml +- type: key_value_parser + parse_from: message +``` + + + + + + + +
Input body Output body
+ +```json +{ + "timestamp": "", + "body": { + "message": "name=stanza" + } +} +``` + + + +```json +{ + "timestamp": "", + "body": { + "name": "stanza" + } +} +``` + +
+ +#### Parse the field `message` into key value pairs, using a non default delimiter + +Configuration: +```yaml +- type: key_value_parser + parse_from: message + delimiter: ":" +``` + + + + + +
Input body Output body
+ +```json +{ + "timestamp": "", + "body": { + "message": "name:stanza" + } +} +``` + + + +```json +{ + "timestamp": "", + "body": { + "name": "stanza" + } +} +``` + +#### Parse the field `message` into key value pairs, using a non default pair delimiter + +Configuration: +```yaml +- type: key_value_parser + parse_from: message + pair_delimiter: "!" +``` + + + + + + + +
Input body Output body
+ +```json +{ + "timestamp": "", + "body": { + "message": "name=stanza ! org=otel ! group=dev" + } +} +``` + + + +```json +{ + "timestamp": "", + "body": { + "name": "stanza", + "org": "otel", + "group": "dev" + } +} +``` + +
+ +#### Parse the field `message` as key value pairs, and parse the timestamp + +Configuration: +```yaml +- type: key_value_parser + parse_from: message + timestamp: + parse_from: seconds_since_epoch + layout_type: epoch + layout: s +``` + + + + + + + +
Input body Output body
+ +```json +{ + "timestamp": "", + "body": { + "message": "name=stanza seconds_since_epoch=1136214245" + } +} +``` + + + +```json +{ + "timestamp": "2006-01-02T15:04:05-07:00", + "body": { + "name": "stanza" + } +} +``` + +
diff --git a/pkg/stanza/docs/operators/move.md b/pkg/stanza/docs/operators/move.md new file mode 100644 index 000000000000..f52cefcda53d --- /dev/null +++ b/pkg/stanza/docs/operators/move.md @@ -0,0 +1,279 @@ +## `move` operator + +The `move` operator moves (or renames) a field from one location to another. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `move` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `from` | required | The [field](/docs/types/field.md) from which the value will be moved. | +| `to` | required | The [field](/docs/types/field.md) to which the value will be moved. | +| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](/docs/types/on_error.md). | +| `if` | | An [expression](/docs/types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. | + +### Example Configurations: + +Rename value +```yaml +- type: move + from: body.key1 + to: body.key3 +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key1": "val1", + "key2": "val2" + } +} +``` + + + +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key3": "val1", + "key2": "val2" + } +} +``` + +
+
+ +Move a value from the body to resource + +```yaml +- type: move + from: body.uuid + to: resoruce.uuid +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "uuid": "091edc50-d91a-460d-83cd-089a62937738" + } +} +``` + + + +```json +{ + "resource": { + "uuid": "091edc50-d91a-460d-83cd-089a62937738" + }, + "attributes": { }, + "body": { } +} +``` + +
+ +
+ +Move a value from the body to attributes + +```yaml +- type: move + from: body.ip + to: attributes.ip +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "ip": "8.8.8.8" + } +} +``` + + + +```json +{ + "resource": { }, + "attributes": { + "ip": "8.8.8.8" + }, + "body": { } +} +``` + +
+ +
+ +Replace the body with an individual value nested within the body +```yaml +- type: move + from: body.log + to: body +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "log": "The log line" + } +} +``` + + + +```json +{ + "resource": { }, + "attributes": { }, + "body": "The log line" +} +``` + +
+ +
+ +Remove a layer from the body +```yaml +- type: move + from: body.wrapper + to: body +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "wrapper": { + "key1": "val1", + "key2": "val2", + "key3": "val3" + } + } +} +} +``` + + + +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key1": "val1", + "key2": "val2", + "key3": "val3" + } +} +``` + +
+ +
+ +Merge a layer to the body +```yaml +- type: move + from: body.object + to: body +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "firstTimestamp": "2020-08-13T16:43:57Z", + "object": { + "apiVersion": "v1", + "name": "stanza-g6rzd", + "uid": "47d965e6-4bb3-4c58-a089-1a8b16bf21b0" + }, + "lastTimestamp": "2020-08-13T16:43:57Z", + } +} +``` + + + +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "firstTimestamp": "2020-08-13T16:43:57Z", + "apiVersion": "v1", + "name": "stanza-g6rzd", + "uid": "47d965e6-4bb3-4c58-a089-1a8b16bf21b0", + "lastTimestamp": "2020-08-13T16:43:57Z", + } +} +``` + +
+ diff --git a/pkg/stanza/docs/operators/recombine.md b/pkg/stanza/docs/operators/recombine.md new file mode 100644 index 000000000000..377c77a2aea3 --- /dev/null +++ b/pkg/stanza/docs/operators/recombine.md @@ -0,0 +1,147 @@ +## `recombine` operator + +The `recombine` operator combines consecutive logs into single logs based on simple expression rules. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `recombine` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](/docs/types/on_error.md). | +| `is_first_entry` | | An [expression](/docs/types/expression.md) that returns true if the entry being processed is the first entry in a multiline series. | +| `is_last_entry` | | An [expression](/docs/types/expression.md) that returns true if the entry being processed is the last entry in a multiline series. | +| `combine_field` | required | The [field](/docs/types/field.md) from all the entries that will recombined. | +| `combine_with` | `"\n"` | The string that is put between the combined entries. This can be an empty string as well. When using special characters like `\n`, be sure to enclose the value in double quotes: `"\n"`. | +| `max_batch_size` | 1000 | The maximum number of consecutive entries that will be combined into a single entry. | +| `overwrite_with` | `oldest` | Whether to use the fields from the `oldest` or the `newest` entry for all the fields that are not combined. | +| `force_flush_period` | `5s` | Flush timeout after which entries will be flushed aborting the wait for their sub parts to be merged with. | +| `source_identifier` | `$attributes["file.path"]` | The [field](/docs/types/field.md) to separate one source of logs from others when combining them. | +| `max_sources` | 1000 | The maximum number of unique sources allowed concurrently to be tracked for combining separately. | + +Exactly one of `is_first_entry` and `is_last_entry` must be specified. + +NOTE: this operator is only designed to work with a single input. It does not keep track of what operator entries are coming from, so it can't combine based on source. + +### Example Configurations + +#### Recombine Kubernetes logs in the CRI format + +Kubernetes logs in the CRI format have a tag that indicates whether the log entry is part of a longer log line (P) or the final entry (F). Using this tag, we can recombine the CRI logs back into complete log lines. + +Configuration: + +```yaml +- type: file_input + include: + - ./input.log +- type: regex_parser + regex: '^(?P[^\s]+) (?P\w+) (?P\w) (?P.*)' +- type: recombine + combine_field: body.message + combine_with: "" + is_last_entry: "body.logtag == 'F'" + overwrite_with: "newest" +``` + +Input file: + +``` +2016-10-06T00:17:09.669794202Z stdout F Single entry log 1 +2016-10-06T00:17:10.113242941Z stdout P This is a very very long line th +2016-10-06T00:17:10.113242941Z stdout P at is really really long and spa +2016-10-06T00:17:10.113242941Z stdout F ns across multiple log entries +``` + +Output logs: + +```json +[ + { + "timestamp": "2020-12-04T13:03:38.41149-05:00", + "severity": 0, + "body": { + "message": "Single entry log 1", + "logtag": "F", + "stream": "stdout", + "timestamp": "2016-10-06T00:17:09.669794202Z" + } + }, + { + "timestamp": "2020-12-04T13:03:38.411664-05:00", + "severity": 0, + "body": { + "message": "This is a very very long line that is really really long and spans across multiple log entries", + "logtag": "F", + "stream": "stdout", + "timestamp": "2016-10-06T00:17:10.113242941Z" + } + } +] +``` + +#### Recombine stack traces into multiline logs + +Some apps output multiple log lines which are in fact a single log message. A common example is a stack trace: + +```console +java.lang.Exception: Stack trace + at java.lang.Thread.dumpStack(Thread.java:1336) + at Main.demo3(Main.java:15) + at Main.demo2(Main.java:12) + at Main.demo1(Main.java:9) + at Main.demo(Main.java:6) + at Main.main(Main.java:3) +``` + +To recombine such log lines into a single log message, you need a way to tell when a log message starts or ends. +In the example above, the first line differs from the other lines in not starting with a whitespace. +This can be expressed with the following configuration: + +```yaml +- type: recombine + combine_field: body.message + is_first_entry: body.message matches "^[^\s]" +``` + +Given the following input file: + +``` +Log message 1 +Error: java.lang.Exception: Stack trace + at java.lang.Thread.dumpStack(Thread.java:1336) + at Main.demo3(Main.java:15) + at Main.demo2(Main.java:12) + at Main.demo1(Main.java:9) + at Main.demo(Main.java:6) + at Main.main(Main.java:3) +Another log message +``` + +The following logs will be output: + +```json +[ + { + "timestamp": "2020-12-04T13:03:38.41149-05:00", + "severity": 0, + "body": { + "message": "Log message 1", + } + }, + { + "timestamp": "2020-12-04T13:03:38.41149-05:00", + "severity": 0, + "body": { + "message": "Error: java.lang.Exception: Stack trace\n at java.lang.Thread.dumpStack(Thread.java:1336)\n at Main.demo3(Main.java:15)\n at Main.demo2(Main.java:12)\n at Main.demo1(Main.java:9)\n at Main.demo(Main.java:6)\n at Main.main(Main.java:3)", + } + }, + { + "timestamp": "2020-12-04T13:03:38.41149-05:00", + "severity": 0, + "body": { + "message": "Another log message", + } + }, +] +``` diff --git a/pkg/stanza/docs/operators/regex_parser.md b/pkg/stanza/docs/operators/regex_parser.md new file mode 100644 index 000000000000..2136bf259a9a --- /dev/null +++ b/pkg/stanza/docs/operators/regex_parser.md @@ -0,0 +1,290 @@ +## `regex_parser` operator + +The `regex_parser` operator parses the string-type field selected by `parse_from` with the given regular expression pattern. + +#### Regex Syntax + +This operator makes use of [Go regular expression](https://github.com/google/re2/wiki/Syntax). When writing a regex, consider using a tool such as (regex101)[https://regex101.com/?flavor=golang]. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `regex_parser` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `regex` | required | A [Go regular expression](https://github.com/google/re2/wiki/Syntax). The named capture groups will be extracted as fields in the parsed body. | +| `parse_from` | `body` | The [field](/docs/types/field.md) from which the value will be parsed. | +| `parse_to` | `body` | The [field](/docs/types/field.md) to which the value will be parsed. | +| `preserve_to` | | Preserves the unparsed value at the specified [field](/docs/types/field.md). | +| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](/docs/types/on_error.md). | +| `if` | | An [expression](/docs/types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. | +| `timestamp` | `nil` | An optional [timestamp](/docs/types/timestamp.md) block which will parse a timestamp field before passing the entry to the output operator. | +| `severity` | `nil` | An optional [severity](/docs/types/severity.md) block which will parse a severity field before passing the entry to the output operator. | + +### Example Configurations + + +#### Parse the field `message` with a regular expression + +Configuration: +```yaml +- type: regex_parser + parse_from: body.message + regex: '^Host=(?P[^,]+), Type=(?P.*)$' +``` + + + + + + + +
Input body Output body
+ +```json +{ + "timestamp": "", + "body": { + "message": "Host=127.0.0.1, Type=HTTP" + } +} +``` + + + +```json +{ + "timestamp": "", + "body": { + "host": "127.0.0.1", + "type": "HTTP" + } +} +``` + +
+ +#### Parse a nested field to a different field, preserving original + +Configuration: +```yaml +- type: regex_parser + parse_from: body.message.embedded + parse_to: body.parsed + regex: '^Host=(?P[^,]+), Type=(?P.*)$' + preserve_to: body.message.embedded +``` + + + + + + + +
Input body Output body
+ +```json +{ + "timestamp": "", + "body": { + "message": { + "embedded": "Host=127.0.0.1, Type=HTTP" + } + } +} +``` + + + +```json +{ + "timestamp": "", + "body": { + "message": { + "embedded": "Host=127.0.0.1, Type=HTTP" + }, + "parsed": { + "host": "127.0.0.1", + "type": "HTTP" + } + } +} +``` + +
+ + +#### Parse the body with a regular expression and also parse the timestamp + +Configuration: +```yaml +- type: regex_parser + regex: '^Time=(?P\d{4}-\d{2}-\d{2}), Host=(?P[^,]+), Type=(?P.*)$' + timestamp: + parse_from: body.timestamp_field + layout_type: strptime + layout: '%Y-%m-%d' +``` + + + + + + + +
Input body Output body
+ +```json +{ + "timestamp": "", + "body": "Time=2020-01-31, Host=127.0.0.1, Type=HTTP" +} +``` + + + +```json +{ + "timestamp": "2020-01-31T00:00:00-00:00", + "body": { + "host": "127.0.0.1", + "type": "HTTP" + } +} +``` + +
+ +#### Parse the message field only if "type" is "hostname" + +Configuration: +```yaml +- type: regex_parser + regex: '^Host=(?)$' + parse_from: body.message + if: 'body.type == "hostname"' +``` + + + + + + + + + + + + +
Input body Output body
+ +```json +{ + "body": { + "message": "Host=testhost", + "type": "hostname" + } +} +``` + + + +```json +{ + "body": { + "host": "testhost", + "type": "hostname" + } +} +``` + +
+ +```json +{ + "body": { + "message": "Key=value", + "type": "keypair" + } +} +``` + + + +```json +{ + "body": { + "message": "Key=value", + "type": "keypair" + } +} +``` + +
+ +#### Parse the message field only if "type" is "hostname" + +Configuration: +```yaml +- type: regex_parser + regex: '^Host=(?)$' + parse_from: body.message + if: 'body.type == "hostname"' +``` + + + + + + + + + + + + +
Input body Output body
+ +```json +{ + "body": { + "message": "Host=testhost", + "type": "hostname" + } +} +``` + + + +```json +{ + "body": { + "host": "testhost", + "type": "hostname" + } +} +``` + +
+ +```json +{ + "body": { + "message": "Key=value", + "type": "keypair" + } +} +``` + + + +```json +{ + "body": { + "message": "Key=value", + "type": "keypair" + } +} +``` + +
diff --git a/pkg/stanza/docs/operators/remove.md b/pkg/stanza/docs/operators/remove.md new file mode 100644 index 000000000000..be1f36fdd5a7 --- /dev/null +++ b/pkg/stanza/docs/operators/remove.md @@ -0,0 +1,266 @@ +## `remove` operator + +The `remove` operator removes a field from a record. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `remove` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `field` | required | The [field](/docs/types/field.md) to remove. if 'attributes' or 'resource' is specified, all fields of that type will be removed. | +| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](/docs/types/on_error.md). | +| `if` | | An [expression](/docs/types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. | + +### Example Configurations: + +
+ +Remove a value from the body +```yaml +- type: remove + field: body.key1 +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key1": "val1", + } +} +``` + + + +```json +{ + "resource": { }, + "attributes": { }, + "body": { } +} +``` + +
+ +
+ +Remove an object from the body +```yaml +- type: remove + field: body.object +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "object": { + "nestedkey": "nestedval" + }, + "key": "val" + }, +} +``` + + + +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key": "val" + } +} +``` + +
+ +
+ +Remove a value from attributes +```yaml +- type: remove + field: attributes.otherkey +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "resource": { }, + "attributes": { + "otherkey": "val" + }, + "body": { + "key": "val" + }, +} +``` + + + +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key": "val" + } +} +``` + +
+ +
+ +Remove a value from resource +```yaml +- type: remove + field: resource.otherkey +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "resource": { + "otherkey": "val" + }, + "attributes": { }, + "body": { + "key": "val" + }, +} +``` + + + +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key": "val" + } +} +``` + +
+ +
+ +Remove all resource fields +```yaml +- type: remove + field: resource +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "resource": { + "key1.0": "val", + "key2.0": "val" + }, + "attributes": { }, + "body": { + "key": "val" + }, +} +``` + + + +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key": "val" + } +} +``` + +
+ +
+ +Remove all attributes +```yaml +- type: remove + field: attributes +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "resource": { }, + "attributes": { + "key1.0": "val", + "key2.0": "val" + }, + "body": { + "key": "val" + }, +} +``` + + + +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key": "val" + } +} +``` + +
\ No newline at end of file diff --git a/pkg/stanza/docs/operators/retain.md b/pkg/stanza/docs/operators/retain.md new file mode 100644 index 000000000000..3393e6115519 --- /dev/null +++ b/pkg/stanza/docs/operators/retain.md @@ -0,0 +1,263 @@ +## `Retain` operator + +The `retain` operator keeps the specified list of fields, and removes the rest. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `retain` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `fields` | required | A list of [fields](/docs/types/field.md) to be kept. | +| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](/docs/types/on_error.md). | +| `if` | | An [expression](/docs/types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. | +
+NOTE: If no fields in a group (attributes, resource, or body) are specified, that entire group will be retained. +
+ +### Example Configurations: + +
+Retain fields in the body + +```yaml +- type: retain + fields: + - body.key1 + - body.key2 +``` + + + + + + + +
Input Entry Output Entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key1": "val1", + "key2": "val2", + "key3": "val3", + "key4": "val4" + } +} +``` + + + +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key1": "val1", + "key2": "val2" + } +} +``` + +
+ +
+Retain an object in the body + +```yaml +- type: retain + fields: + - body.object +``` + + + + + + + +
Input record Output record
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "key1": "val1", + "object": { + "nestedkey": "val2", + } + } +} +``` + + + +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "object": { + "nestedkey": "val2", + } + } +} +``` + +
+ +
+Retain fields from resource + +```yaml +- type: retain + fields: + - resource.key1 + - resource.key2 +``` + + + + + + + +
Input record Output record
+ +```json +{ + "resource": { + "key1": "val1", + "key2": "val2", + "key3": "val3" + }, + "attributes": { }, + "body": { + "key1": "val1", + } + } +} +``` + + + +```json +{ + "resource": { + "key1": "val1", + "key2": "val2", + }, + "attributes": { }, + "body": { + "key1": "val1", + } +} +``` + +
+ +
+Retain fields from attributes + +```yaml +- type: retain + fields: + - attributes.key1 + - attributes.key2 +``` + + + + + + + +
Input record Output record
+ +```json +{ + "resource": { }, + "attributes": { + "key1": "val1", + "key2": "val2", + "key3": "val3" + }, + "body": { + "key1": "val1", + } +} +``` + + + +```json +{ + "resource": { }, + "attributes": { + "key1": "val1", + "key2": "val2", + }, + "body": { + "key1": "val1", + } +} +``` + +
+ +
+Retain fields from all sources + +```yaml +- type: retain + fields: + - resource.key1 + - attributes.key3 + - body.key5 +``` + + + + + + + +
Input record Output record
+ +```json +{ + "resource": { + "key1": "val1", + "key2": "val2" + }, + "attributes": { + "key3": "val3", + "key4": "val4" + }, + "body": { + "key5": "val5", + "key6": "val6", + } +} +``` + + + +```json +{ + "resource": { + "key1": "val1", + }, + "attributes": { + "key3": "val3", + }, + "body": { + "key5": "val5", + } +} +``` + +
\ No newline at end of file diff --git a/pkg/stanza/docs/operators/router.md b/pkg/stanza/docs/operators/router.md new file mode 100644 index 000000000000..0f2da7445669 --- /dev/null +++ b/pkg/stanza/docs/operators/router.md @@ -0,0 +1,58 @@ +## `router` operator + +The `router` operator allows logs to be routed dynamically based on their content. + +The operator is configured with a list of routes, where each route has an associated expression. +An entry sent to the router operator is forwarded to the first route in the list whose associated +expression returns `true`. + +An entry that does not match any of the routes is dropped and not processed further. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `router` | A unique identifier for the operator. | +| `routes` | required | A list of routes. See below for details. | +| `default` | | The operator(s) that will receive any entries not matched by any of the routes. | + +#### Route configuration + +| Field | Default | Description | +| --- | --- | --- | +| `output` | required | The connected operator(s) that will receive all outbound entries for this route. | +| `expr` | required | An [expression](/docs/types/expression.md) that returns a boolean. The body of the routed entry is available as `$`. | +| `attributes` | {} | A map of `key: value` pairs to add to an entry that matches the route. | + + +### Examples + +#### Forward entries to different parsers based on content + +```yaml +- type: router + routes: + - output: my_json_parser + expr: 'body.format == "json"' + - output: my_syslog_parser + expr: 'body.format == "syslog"' +``` + +#### Drop entries based on content + +```yaml +- type: router + routes: + - output: my_output + expr: 'body.message matches "^LOG: .* END$"' +``` + +#### Route with a default + +```yaml +- type: router + routes: + - output: my_json_parser + expr: 'body.format == "json"' + default: catchall +``` diff --git a/pkg/stanza/docs/operators/severity_parser.md b/pkg/stanza/docs/operators/severity_parser.md new file mode 100644 index 000000000000..07e6556b9734 --- /dev/null +++ b/pkg/stanza/docs/operators/severity_parser.md @@ -0,0 +1,21 @@ +## `severity_parser` operator + +The `severity_parser` operator sets the severity on an entry by parsing a value from the body. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `severity_parser` | A unique identifier for the operator. | +| `output` | Next in pipeline | The `id` for the operator to send parsed entries to. | +| `parse_from` | required | The [field](/docs/types/field.md) from which the value will be parsed. | +| `preserve_to` | | Preserves the unparsed value at the specified [field](/docs/types/field.md). | +| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](/docs/types/on_error.md). | +| `preset` | `default` | A predefined set of values that should be interpreted at specific severity levels. | +| `mapping` | | A formatted set of values that should be interpreted as severity levels. | +| `if` | | An [expression](/docs/types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. | + + +### Example Configurations + +Several detailed examples are available [here](/docs/types/severity.md). diff --git a/pkg/stanza/docs/operators/stdin.md b/pkg/stanza/docs/operators/stdin.md new file mode 100644 index 000000000000..0103b75174d0 --- /dev/null +++ b/pkg/stanza/docs/operators/stdin.md @@ -0,0 +1,33 @@ +## `stdin` operator + +The `stdin` generates entries from lines written to stdin. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `stdin` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | + +### Example Configurations + +#### Mock a file input + +Configuration: +```yaml +- type: stdin +``` + +Command: +```bash +echo "test" | stanza -c ./config.yaml +``` + +Output bodies: +```json +{ + "timestamp": "2020-11-10T11:09:56.505467-05:00", + "severity": 0, + "body": "test" +} +``` diff --git a/pkg/stanza/docs/operators/stdout.md b/pkg/stanza/docs/operators/stdout.md new file mode 100644 index 000000000000..cbd2d66860ed --- /dev/null +++ b/pkg/stanza/docs/operators/stdout.md @@ -0,0 +1,21 @@ +## `stdout` operator + +The `stdout` operator will write entries to stdout in JSON format. This is particularly useful for debugging a config file +or running one-time batch processing jobs. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `stdout` | A unique identifier for the operator. | + + +### Example Configurations + +#### Simple configuration + +Configuration: +```yaml +- id: my_stdout + type: stdout +``` diff --git a/pkg/stanza/docs/operators/syslog_input.md b/pkg/stanza/docs/operators/syslog_input.md new file mode 100644 index 000000000000..014d2ba566ab --- /dev/null +++ b/pkg/stanza/docs/operators/syslog_input.md @@ -0,0 +1,44 @@ +## `syslog_input` operator + +The `syslog_input` operator listens for syslog format logs from UDP/TCP packages. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `syslog_input` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `tcp` | {} | A [tcp_input config](./tcp_input.md#configuration-fields) to defined syslog_parser operator. | +| `udp` | {} | A [udp_input config](./udp_input.md#configuration-fields) to defined syslog_parser operator. | +| `syslog` | required | A [syslog parser config](./syslog_parser.md#configuration-fields) to defined syslog_parser operator. | +| `attributes` | {} | A map of `key: value` pairs to add to the entry's attributes. | +| `resource` | {} | A map of `key: value` pairs to add to the entry's resource. | + + + + + +### Example Configurations + +#### Simple + +TCP Configuration: +```yaml +- type: syslog_input + tcp: + listen_address: "0.0.0.0:54526" + syslog: + protocol: rfc5424 +``` + +UDP Configuration: + +```yaml +- type: syslog_input + udp: + listen_address: "0.0.0.0:54526" + syslog: + protocol: rfc3164 + location: UTC +``` + diff --git a/pkg/stanza/docs/operators/syslog_parser.md b/pkg/stanza/docs/operators/syslog_parser.md new file mode 100644 index 000000000000..3d441e8743bb --- /dev/null +++ b/pkg/stanza/docs/operators/syslog_parser.md @@ -0,0 +1,65 @@ +## `syslog_parser` operator + +The `syslog_parser` operator parses the string-type field selected by `parse_from` as syslog. Timestamp parsing is handled automatically by this operator. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `syslog_parser` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `parse_from` | `body` | The [field](/docs/types/field.md) from which the value will be parsed. | +| `parse_to` | `body` | The [field](/docs/types/field.md) to which the value will be parsed. | +| `preserve_to` | | Preserves the unparsed value at the specified [field](/docs/types/field.md). | +| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](/docs/types/on_error.md). | +| `protocol` | required | The protocol to parse the syslog messages as. Options are `rfc3164` and `rfc5424`. | +| `location` | `UTC` | The geographic location (timezone) to use when parsing the timestamp (Syslog RFC 3164 only). The available locations depend on the local IANA Time Zone database. [This page](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) contains many examples, such as `America/New_York`. | +| `timestamp` | `nil` | An optional [timestamp](/docs/types/timestamp.md) block which will parse a timestamp field before passing the entry to the output operator | +| `severity` | `nil` | An optional [severity](/docs/types/severity.md) block which will parse a severity field before passing the entry to the output operator | +| `if` | | An [expression](/docs/types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. | + +### Example Configurations + + +#### Parse the field `message` as syslog + +Configuration: +```yaml +- type: syslog_parser + protocol: rfc3164 +``` + + + + + + + +
Input body Output body
+ +```json +{ + "timestamp": "", + "body": "<34>Jan 12 06:30:00 1.2.3.4 apache_server: test message" +} +``` + + + +```json +{ + "timestamp": "2020-01-12T06:30:00Z", + "body": { + "appname": "apache_server", + "facility": 4, + "hostname": "1.2.3.4", + "message": "test message", + "msg_id": null, + "priority": 34, + "proc_id": null, + "severity": 2 + } +} +``` + +
diff --git a/pkg/stanza/docs/operators/tcp_input.md b/pkg/stanza/docs/operators/tcp_input.md new file mode 100644 index 000000000000..36b9f2094062 --- /dev/null +++ b/pkg/stanza/docs/operators/tcp_input.md @@ -0,0 +1,85 @@ +## `tcp_input` operator + +The `tcp_input` operator listens for logs on one or more TCP connections. The operator assumes that logs are newline separated. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `tcp_input` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `max_log_size` | `1MiB` | The maximum size of a log entry to read before failing. Protects against reading large amounts of data into memory. | +| `listen_address` | required | A listen address of the form `:`. | +| `tls` | nil | An optional `TLS` configuration (see the TLS configuration section). | +| `attributes` | {} | A map of `key: value` pairs to add to the entry's attributes. | +| `resource` | {} | A map of `key: value` pairs to add to the entry's resource. | +| `add_attributes` | false | Adds `net.*` attributes according to [semantic convention][https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/span-general.md#general-network-connection-attributes]. | +| `multiline` | | A `multiline` configuration block. See below for details. | +| `encoding` | `utf-8` | The encoding of the file being read. See the list of supported encodings below for available options. | + +#### TLS Configuration + +The `tcp_input` operator supports TLS, disabled by default. +config more detail [opentelemetry-collector#configtls](https://github.com/open-telemetry/opentelemetry-collector/tree/main/config/configtls#tls-configuration-settings). + +| Field | Default | Description | +| --- | --- | --- | +| `cert_file` | | Path to the TLS cert to use for TLS required connections. | +| `key_file` | | Path to the TLS key to use for TLS required connections. | +| `ca_file` | | Path to the CA cert. For a client this verifies the server certificate. For a server this verifies client certificates. If empty uses system root CA. | +| `client_ca_file` | | Path to the TLS cert to use by the server to verify a client certificate. (optional) | + +#### `multiline` configuration + +If set, the `multiline` configuration block instructs the `tcp_input` operator to split log entries on a pattern other than newlines. + +The `multiline` configuration block must contain exactly one of `line_start_pattern` or `line_end_pattern`. These are regex patterns that +match either the beginning of a new log entry, or the end of a log entry. + +#### Supported encodings + +| Key | Description +| --- | --- | +| `nop` | No encoding validation. Treats the file as a stream of raw bytes | +| `utf-8` | UTF-8 encoding | +| `utf-16le` | UTF-16 encoding with little-endian byte order | +| `utf-16be` | UTF-16 encoding with little-endian byte order | +| `ascii` | ASCII encoding | +| `big5` | The Big5 Chinese character encoding | + +Other less common encodings are supported on a best-effort basis. +See [https://www.iana.org/assignments/character-sets/character-sets.xhtml](https://www.iana.org/assignments/character-sets/character-sets.xhtml) +for other encodings available. + +### Example Configurations + +#### Simple + +Configuration: + +```yaml +- type: tcp_input + listen_address: "0.0.0.0:54525" +``` + +Send a log: + +```bash +$ nc localhost 54525 < message1 +heredoc> message2 +heredoc> EOF +``` + +Generated entries: + +```json +{ + "timestamp": "2020-04-30T12:10:17.656726-04:00", + "body": "message1" +}, +{ + "timestamp": "2020-04-30T12:10:17.657143-04:00", + "body": "message2" +} +``` diff --git a/pkg/stanza/docs/operators/time_parser.md b/pkg/stanza/docs/operators/time_parser.md new file mode 100644 index 000000000000..efadfd2dc72f --- /dev/null +++ b/pkg/stanza/docs/operators/time_parser.md @@ -0,0 +1,21 @@ +## `time_parser` operator + +The `time_parser` operator sets the timestamp on an entry by parsing a value from the body. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `time_parser` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `parse_from` | required | The [field](/docs/types/field.md) from which the value will be parsed. | +| `layout_type` | `strptime` | The type of timestamp. Valid values are `strptime`, `gotime`, and `epoch`. | +| `layout` | required | The exact layout of the timestamp to be parsed. | +| `if` | | An [expression](/docs/types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. | +| `preserve_to` | | Preserves the unparsed value at the specified [field](/docs/types/field.md). | +| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](/docs/types/on_error.md). | + + +### Example Configurations + +Several detailed examples are available [here](/docs/types/timestamp.md). diff --git a/pkg/stanza/docs/operators/trace_parser.md b/pkg/stanza/docs/operators/trace_parser.md new file mode 100644 index 000000000000..5fafb8ea78e4 --- /dev/null +++ b/pkg/stanza/docs/operators/trace_parser.md @@ -0,0 +1,19 @@ +## `trace_parser` operator + +The `trace_parser` operator sets the trace on an entry by parsing a value from the body. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `trace_parser` | A unique identifier for the operator. | +| `output` | Next in pipeline | The `id` for the operator to send parsed entries to. | +| `trace_id.parse_from` | `trace_id` | A [field](/docs/types/field.md) that indicates the field to be parsed as a trace ID. | +| `span_id.parse_from` | `span_id` | A [field](/docs/types/field.md) that indicates the field to be parsed as a span ID. | +| `trace_flags.parse_from` | `trace_flags` | A [field](/docs/types/field.md) that indicates the field to be parsed as trace flags. | +| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](/docs/types/on_error.md). | + + +### Example Configurations + +Several detailed examples are available [here](/docs/types/trace.md). diff --git a/pkg/stanza/docs/operators/udp_input.md b/pkg/stanza/docs/operators/udp_input.md new file mode 100644 index 000000000000..30b84ad841b0 --- /dev/null +++ b/pkg/stanza/docs/operators/udp_input.md @@ -0,0 +1,70 @@ +## `udp_input` operator + +The `udp_input` operator listens for logs from UDP packets. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `udp_input` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `listen_address` | required | A listen address of the form `:`. | +| `attributes` | {} | A map of `key: value` pairs to add to the entry's attributes. | +| `resource` | {} | A map of `key: value` pairs to add to the entry's resource. | +| `add_attributes` | false | Adds `net.*` attributes according to [semantic convention][https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/span-general.md#general-network-connection-attributes]. | +| `multiline` | | A `multiline` configuration block. See below for details. | +| `encoding` | `utf-8` | The encoding of the file being read. See the list of supported encodings below for available options. | + +#### `multiline` configuration + +If set, the `multiline` configuration block instructs the `udp_input` operator to split log entries on a pattern other than newlines. + +**note** If `multiline` is not set at all, it wont't split log entries at all. Every UDP packet is going to be treated as log. +**note** `multiline` detection works per UDP packet due to protocol limitations. + +The `multiline` configuration block must contain exactly one of `line_start_pattern` or `line_end_pattern`. These are regex patterns that +match either the beginning of a new log entry, or the end of a log entry. + +#### Supported encodings + +| Key | Description +| --- | --- | +| `nop` | No encoding validation. Treats the file as a stream of raw bytes | +| `utf-8` | UTF-8 encoding | +| `utf-16le` | UTF-16 encoding with little-endian byte order | +| `utf-16be` | UTF-16 encoding with little-endian byte order | +| `ascii` | ASCII encoding | +| `big5` | The Big5 Chinese character encoding | + +Other less common encodings are supported on a best-effort basis. +See [https://www.iana.org/assignments/character-sets/character-sets.xhtml](https://www.iana.org/assignments/character-sets/character-sets.xhtml) +for other encodings available. + +### Example Configurations + +#### Simple + +Configuration: + +```yaml +- type: udp_input + listen_address: "0.0.0.0:54526" +``` + +Send a log: + +```bash +$ nc -u localhost 54525 < message1 +heredoc> message2 +heredoc> EOF +``` + +Generated entries: + +```json +{ + "timestamp": "2020-04-30T12:10:17.656726-04:00", + "body": "message1\nmessage2\n" +} +``` diff --git a/pkg/stanza/docs/operators/uri_parser.md b/pkg/stanza/docs/operators/uri_parser.md new file mode 100644 index 000000000000..6848b32a676d --- /dev/null +++ b/pkg/stanza/docs/operators/uri_parser.md @@ -0,0 +1,180 @@ +## `uri_parser` operator + +The `uri_parser` operator parses the string-type field selected by `parse_from` as [URI](https://tools.ietf.org/html/rfc3986). + +`uri_parser` can handle: +- Absolute URI + - `https://google.com/v1/app?user_id=2&uuid=57b4dad2-063c-4965-941c-adfd4098face` +- Relative URI + - `/app?user=admin` +- Query string + - `?request=681e6fc4-3314-4ccc-933e-4f9c9f0efd24&env=stage&env=dev` + - Query string must start with a question mark + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `uri_parser` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `parse_from` | `body` | The [field](/docs/types/field.md) from which the value will be parsed. | +| `parse_to` | `body` | The [field](/docs/types/field.md) to which the value will be parsed. | +| `preserve_to` | | Preserves the unparsed value at the specified [field](/docs/types/field.md). | +| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](/docs/types/on_error.md). | +| `if` | | An [expression](/docs/types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. | + + +### Output Fields + +The following fields are returned. Empty fields are not returned. + +| Field | Type | Example | Description | +| --- | --- | --- | --- | +| scheme | `string` | `"http"` | [URI Scheme](https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml). HTTP, HTTPS, FTP, etc. | +| user | `string` | `"dev"` | [Userinfo](https://tools.ietf.org/html/rfc3986#section-3.2) username. Password is always ignored. | +| host | `string` | `"golang.org"` | The [hostname](https://tools.ietf.org/html/rfc3986#section-3.2.2) such as `www.example.com`, `example.com`, `example`. A scheme is required in order to parse the `host` field. | +| port | `string` | `"8443"` | The [port](https://tools.ietf.org/html/rfc3986#section-3.2.3) the request is sent to. A scheme is required in order to parse the `port` field. | +| path | `string` | `"/v1/app"` | URI request [path](https://tools.ietf.org/html/rfc3986#section-3.3). | +| query | `map[string][]string` | `"query":{"user":["admin"]}` | Parsed URI [query string](https://tools.ietf.org/html/rfc3986#section-3.4). | + + +### Example Configurations + + +#### Parse the field `body.message` as absolute URI + +Configuration: +```yaml +- type: uri_parser + parse_from: body.message +``` + + + + + + + +
Input body Output body
+ +```json +{ + "timestamp": "", + "body": { + "message": "https://dev:pass@google.com/app?user_id=2&token=001" + } +} +``` + + + +```json +{ + "timestamp": "", + "body": { + "host": "google.com", + "path": "/app", + "query": { + "user_id": [ + "2" + ], + "token": [ + "001" + ] + }, + "scheme": "https", + "user": "dev" + } +} +``` + +
+ +#### Parse the field `body.message` as relative URI + +Configuration: +```yaml +- type: uri_parser + parse_from: body.message +``` + + + + + + + +
Input body Output body
+ +```json +{ + "timestamp": "", + "body": { + "message": "/app?user=admin" + } +} +``` + + + +```json +{ + "timestamp": "", + "body": { + "path": "/app", + "query": { + "user": [ + "admin" + ] + } + } +} +``` + +
+ +#### Parse the field `body.query` as URI query string + +Configuration: +```yaml +- type: uri_parser + parse_from: body.query +``` + + + + + + + +
Input body Output body
+ +```json +{ + "timestamp": "", + "body": { + "query": "?request=681e6fc4-3314-4ccc-933e-4f9c9f0efd24&env=stage&env=dev" + } +} +``` + + + +```json +{ + "timestamp": "", + "body": { + "query": { + "env": [ + "stage", + "dev" + ], + "request": [ + "681e6fc4-3314-4ccc-933e-4f9c9f0efd24" + ] + } + } +} +``` + +
diff --git a/pkg/stanza/docs/operators/windows_eventlog_input.md b/pkg/stanza/docs/operators/windows_eventlog_input.md new file mode 100644 index 000000000000..94e791289520 --- /dev/null +++ b/pkg/stanza/docs/operators/windows_eventlog_input.md @@ -0,0 +1,54 @@ +## `windows_eventlog_input` operator + +The `windows_eventlog_input` operator reads logs from the windows event log API. + +### Configuration Fields + +| Field | Default | Description | +| --- | --- | --- | +| `id` | `windows_eventlog_input` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `channel` | required | The windows event log channel to monitor. | +| `max_reads` | 100 | The maximum number of bodies read into memory, before beginning a new batch. | +| `start_at` | `end` | On first startup, where to start reading logs from the API. Options are `beginning` or `end`. | +| `poll_interval` | 1s | The interval at which the channel is checked for new log entries. This check begins again after all new bodies have been read. | +| `attributes` | {} | A map of `key: value` pairs to add to the entry's attributes. | +| `resource` | {} | A map of `key: value` pairs to add to the entry's resource. | + +### Example Configurations + +#### Simple + +Configuration: +```yaml +- type: windows_eventlog_input + channel: application +``` + +Output entry sample: +```json +{ + "timestamp": "2020-04-30T12:10:17.656726-04:00", + "severity": 30, + "body": { + "event_id": { + "qualifiers": 0, + "id": 1000 + }, + "provider": { + "name": "provider name", + "guid": "provider guid", + "event_source": "event source" + }, + "system_time": "2020-04-30T12:10:17.656726789Z", + "computer": "example computer", + "channel": "application", + "record_id": 1, + "level": "Information", + "message": "example message", + "task": "example task", + "opcode": "example opcode", + "keywords": ["example keyword"] + } +} +``` diff --git a/pkg/stanza/docs/types/bytesize.md b/pkg/stanza/docs/types/bytesize.md new file mode 100644 index 000000000000..fe1c0fac3e88 --- /dev/null +++ b/pkg/stanza/docs/types/bytesize.md @@ -0,0 +1,28 @@ +# ByteSize + +ByteSizes are a type that allows specifying a number of bytes in a human-readable format. See the examples for details. + + +## Examples + +### Various ways to specify 5000 bytes + +```yaml +- type: some_operator + bytes: 5000 +``` + +```yaml +- type: some_operator + bytes: 5kb +``` + +```yaml +- type: some_operator + bytes: 4.88KiB +``` + +```yaml +- type: some_operator + bytes: 5e3 +``` diff --git a/pkg/stanza/docs/types/duration.md b/pkg/stanza/docs/types/duration.md new file mode 100644 index 000000000000..9b35c90bd797 --- /dev/null +++ b/pkg/stanza/docs/types/duration.md @@ -0,0 +1,31 @@ +# Durations + +Durations are lengths of time that are specified as part of a pluign configuration using a number or string. + +If a number is specified, it will be interpreted as a number of seconds. + +If a string is specified, it will be interpreted according to Golang's [`time.ParseDuration`](https://golang.org/src/time/format.go?s=40541:40587#L1369) documentation. + +## Examples + +### Various ways to specify a duration of 1 minute + +```yaml +- type: some_operator + duration: 1m +``` + +```yaml +- type: some_operator + duration: 60s +``` + +```yaml +- type: some_operator + duration: 60 +``` + +```yaml +- type: some_operator + duration: 60.0 +``` diff --git a/pkg/stanza/docs/types/entry.md b/pkg/stanza/docs/types/entry.md new file mode 100644 index 000000000000..5bd407946573 --- /dev/null +++ b/pkg/stanza/docs/types/entry.md @@ -0,0 +1,39 @@ +# Entry + +Entry is the base representation of log data as it moves through a pipeline. All operators either create, modify, or consume entries. + +## Structure +| Field | Description | +| --- | --- | +| `timestamp` | The timestamp associated with the log (RFC 3339). | +| `severity` | The [severity](/docs/types/field.md) of the log. | +| `severity_text` | The original text that was interpreted as a [severity](/docs/types/field.md). | +| `resource` | A map of key/value pairs that describe the resource from which the log originated. | +| `attributes` | A map of key/value pairs that provide additional context to the log. This value is often used by a consumer to filter logs. | +| `body` | The contents of the log. This value is often modified and restructured in the pipeline. It may be a string, number, or object. | + + +Represented in `json` format, an entry may look like the following: + +```json +{ + "resource": { + "uuid": "11112222-3333-4444-5555-666677778888", + }, + "attributes": { + "env": "prod", + }, + "body": { + "message": "Something happened.", + "details": { + "count": 100, + "reason": "event", + }, + }, + "timestamp": "2020-01-31T00:00:00-00:00", + "severity": 30, + "severity_text": "INFO", +} +``` + +Throughout the documentation, `json` format is used to represent entries. Fields are typically omitted unless relevant to the behavior being described. diff --git a/pkg/stanza/docs/types/expression.md b/pkg/stanza/docs/types/expression.md new file mode 100644 index 000000000000..f1e232438832 --- /dev/null +++ b/pkg/stanza/docs/types/expression.md @@ -0,0 +1,24 @@ +# Expressions + +Expressions give the config flexibility by allowing dynamic business logic rules to be included in static configs. +Most notably, expressions can be used to route messages and add new fields based on the contents of the log entry +being processed. + +For reference documentation of the expression language, see [here](https://github.com/antonmedv/expr/blob/master/docs/Language-Definition.md). + +Available to the expressions are a few special variables: +- `body` contains the entry's body +- `attributes` contains the entry's attributes +- `resource` contains the entry's resource +- `timestamp` contains the entry's timestamp +- `env()` is a function that allows you to read environment variables + +## Examples + +### Add a label from an environment variable + +```yaml +- type: metadata + attributes: + stack: 'EXPR(env("STACK"))' +``` diff --git a/pkg/stanza/docs/types/field.md b/pkg/stanza/docs/types/field.md new file mode 100644 index 000000000000..5eacdd59d4c6 --- /dev/null +++ b/pkg/stanza/docs/types/field.md @@ -0,0 +1,104 @@ +## Fields + +A _Field_ is a reference to a value in a log [entry](/docs/types/field.md). + +Many [operators](/docs/operators/README.md) use fields in their configurations. For example, parsers use fields to specify which value to parse and where to write a new value. + +Fields are `.`-delimited strings which allow you to select attributes or body on the entry. + +Fields can be used to select body, resource, or attribute values. For values on the body, use the prefix `body` such as `body.my_value`. To select an attributes, prefix your field with `attributes` such as with `attributes.my_attribute`. For resource values, use the prefix `resource`. + +If a field contains a dot in it, a field can alternatively use bracket syntax for traversing through a map. For example, to select the key `k8s.cluster.name` on the entry's body, you can use the field `body["k8s.cluster.name"]`. + +Body fields can be nested arbitrarily deeply, such as `body.my_value.my_nested_value`. + +If a field does not start with `resource`, `attributes`, or `body`, then `body` is assumed. For example, `my_value` is equivalent to `body.my_value`. + +## Examples + +#### Using fields with the add and remove operators. + +Config: +```yaml +- type: add + field: body.key3 + value: val3 +- type: remove + field: body.key2.nested_key1 +- type: add + field: attributes.my_attribute + value: my_attribute_value +``` + + + + + + + +
Input entry Output entry
+ +```json +{ + "timestamp": "", + "attributes": {}, + "body": { + "key1": "value1", + "key2": { + "nested_key1": "nested_value1", + "nested_key2": "nested_value2" + } + } +} +``` + + + +```json +{ + "timestamp": "", + "attributes": { + "my_attribute": "my_attribute_value" + }, + "body": { + "key1": "value1", + "key2": { + "nested_key2": "nested_value2" + }, + "key3": "value3" + } +} +``` + +
+ + +#### Using fields to refer to various values + +Given the following entry, we can use fields as follows: + +```json +{ + "resource": { + "uuid": "11112222-3333-4444-5555-666677778888", + }, + "attributes": { + "env": "prod", + }, + "body": { + "message": "Something happened.", + "details": { + "count": 100, + "reason": "event", + }, + }, +} +``` + +| Field | Refers to Value | +| --- | --- | +| body.message | `"Something happened."` | +| message | `"Something happened."` | +| body.details.count | `100` | +| attributes.env | `"prod"` | +| resource.uuid | `"11112222-3333-4444-5555-666677778888"` | diff --git a/pkg/stanza/docs/types/on_error.md b/pkg/stanza/docs/types/on_error.md new file mode 100644 index 000000000000..b7eed74abb2a --- /dev/null +++ b/pkg/stanza/docs/types/on_error.md @@ -0,0 +1,10 @@ +# `on_error` parameter +The `on_error` parameter determines the error handling strategy an operator should use when it fails to process an entry. There are 2 supported values: `drop` and `send`. + +Regardless of the method selected, all processing errors will be logged by the operator. + +### `drop` +In this mode, if an operator fails to process an entry, it will drop the entry altogether. This will stop the entry from being sent further down the pipeline. + +### `send` +In this mode, if an operator fails to process an entry, it will still send the entry down the pipeline. This may result in downstream operators receiving entries in an undesired format. \ No newline at end of file diff --git a/pkg/stanza/docs/types/pipeline.md b/pkg/stanza/docs/types/pipeline.md new file mode 100644 index 000000000000..b67221ff1e8c --- /dev/null +++ b/pkg/stanza/docs/types/pipeline.md @@ -0,0 +1,166 @@ +# Pipeline + +A pipeline is made up of [operators](/docs/operators/README.md). The pipeline defines how stanza should input, process, and output logs. + + +## Linear Pipelines + +Many stanza pipelines are a linear sequence of operators. Logs flow from one operator to the next, according to the order in which they are defined. + +For example, the following pipeline will read logs from a file, parse them as `json`, and print them to `stdout`: +```yaml +pipeline: + - type: file_input + include: + - my-log.json + - type: json_parser + - type: stdout +``` + +Notice that every operator has a `type` field. The `type` of operator must always be specified. + + +## `id` and `output` + +Linear pipelines are sufficient for many use cases, but stanza is also capabile of processing non-linear pipelines as well. In order to use non-linear pipelines, the `id` and `output` fields must be understood. Let's take a close look at these. + +Each operator in a pipeline has a unique `id`. By default, `id` will take the same value as `type`. Alternately, you can specify an `id` for any operator. If your pipeline contains multiple operators of the same `type`, then the `id` field must be used. + +All operators (except output operators) support an `output` field. By default, the output field takes the value of the next operator's `id`. + +Let's look at how these default values work together by considering the linear pipeline shown above. The following pipeline would be exactly the same (although much more verbosely defined): + +```yaml +pipeline: + - type: file_input + id: file_input + include: + - my-log.json + output: json_parser + - type: json_parser + id: json_parser + output: stdout + - type: stdout + id: stdout +``` + +Additionally, we could accomplish the same task using custom `id`'s. + +```yaml +pipeline: + - type: file_input + id: my_file + include: + - my-log.json + output: my_parser + - type: json_parser + id: my_parser + output: my_out + - type: stdout + id: my_out +``` + +We could even shuffle the order of operators, so long as we're explicitly declaring each output. This is a little counterintuitive, so it isn't recommended. However, it is shown here to highlight the fact that operators in a pipeline are ultimately connected via `output`'s and `id`'s. + +```yaml +pipeline: + - type: stdout # 3rd operator + id: my_out + - type: json_parser # 2nd operator + id: my_parser + output: my_out + - type: file_input # 1st operator + id: my_file + include: + - my-log.json + output: my_parser +``` + +Finally, we could even remove some of the `id`'s and `output`'s, and depend on the default values. This is even less readable, so again would not be recommended. However, it is provided here to demonstrate that default values can be depended upon. + +```yaml +pipeline: + - type: json_parser # 2nd operator + - type: stdout # 3rd operator + - type: file_input # 1st operator + include: + - my-log.json + output: json_parser +``` + +## Non-Linear Pipelines + +Now that we understand how `id` and `output` work together, we can configure stanza to run more complex pipelines. Technically, the structure of a stanza pipeline is limited only in that it must be a [directed, acyclic, graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph). + +Let's consider a pipeline with two inputs and one output: +```yaml +pipeline: + - type: file_input + include: + - my-log.json + output: stdout # flow directly to stdout + + - type: windows_eventlog_input + channel: security + # implicitly flow to stdout + + - type: stdout +``` + +Here's another, where we read from two files that should be parsed differently: +```yaml +pipeline: + # Read and parse a JSON file + - type: file_input + id: file_input_one + include: + - my-log.json + - type: json_parser + output: stdout # flow directly to stdout + + # Read and parse a text file + - type: file_input + id: file_input_two + include: + - my-other-log.txt + - type: regex_parser + regex: ... # regex appropriate to file format + # implicitly flow to stdout + + # Print + - type: stdout +``` + +Finally, in some cases, you might expect multiple log formats to come from a single input. This solution uses the [router](/docs/operators/router.md) operator. The `router` operator allows one to define multiple "routes", each of which has an `output`. + + +```yaml +pipeline: + # Read log file + - type: file_input + include: + - my-log.txt + + # Route based on log type + - type: router + routes: + - expr: 'body startsWith "ERROR"' + output: error_parser + - expr: 'body startsWith "INFO"' + output: info_parser + + # Parse logs with format one + - type: regex_parser + id: error_parser + regex: ... # regex appropriate to parsing error logs + output: stdout # flow directly to stdout + + # Parse logs with format two + - type: regex_parser + id: info_parser + regex: ... # regex appropriate to parsing info logs + output: stdout # flow directly to stdout + + # Print + - type: stdout +``` \ No newline at end of file diff --git a/pkg/stanza/docs/types/scope_name.md b/pkg/stanza/docs/types/scope_name.md new file mode 100644 index 000000000000..7a71fac0852a --- /dev/null +++ b/pkg/stanza/docs/types/scope_name.md @@ -0,0 +1,64 @@ +## Scope Name Parsing + +A Scope Name may be parsed from a log entry in order to indicate the code from which a log was emitted. + +### `scope_name` parsing parameters + +Parser operators can parse a scope name and attach the resulting value to a log entry. + +| Field | Default | Description | +| --- | --- | --- | +| `parse_from` | required | The [field](/docs/types/field.md) from which the value will be parsed. | +| `preserve_to` | | Preserves the unparsed value at the specified [field](/docs/types/field.md). | + + +### How to use `scope_name` parsing + +All parser operators, such as [`regex_parser`](/docs/operators/regex_parser.md) support these fields inside of a `scope_name` block. + +If a `scope_name` block is specified, the parser operator will perform the parsing _after_ performing its other parsing actions, but _before_ passing the entry to the specified output operator. + + +### Example Configurations + +#### Parse a scope_name from a string + +Configuration: +```yaml +- type: regex_parser + regexp: '^(?P\S*)\s-\s(?P.*)' + scope_name: + parse_from: body.scope_name_field +``` + + + + + + + +
Input entry Output entry
+ +```json +{ + "resource": { }, + "attributes": { }, + "body": "com.example.Foo - some message", + "scope_name": "", +} +``` + + + +```json +{ + "resource": { }, + "attributes": { }, + "body": { + "message": "some message", + }, + "scope_name": "com.example.Foo", +} +``` + +
diff --git a/pkg/stanza/docs/types/severity.md b/pkg/stanza/docs/types/severity.md new file mode 100644 index 000000000000..8e29583deb1a --- /dev/null +++ b/pkg/stanza/docs/types/severity.md @@ -0,0 +1,692 @@ +## Severity Parsing + +Severity is represented as a number from 1 to 24. The meaning of these severity levels are defined in the [OpenTelemetry Logs Data Model](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model.md#field-severitynumber). + +> Note: A `default` severity level is also supported, and is used when a value cannot be mapped to any other level. + +### `severity` parsing parameters + +Parser operators can parse a severity and attach the resulting value to a log entry. + +| Field | Default | Description | +| --- | --- | --- | +| `parse_from` | required | The [field](/docs/types/field.md) from which the value will be parsed. | +| `preserve_to` | | Preserves the unparsed value at the specified [field](/docs/types/field.md). | +| `preset` | `default` | A predefined set of values that should be interpretted at specific severity levels. | +| `mapping` | | A custom set of values that should be interpretted at designated severity levels. | + + +### How severity `mapping` works + +Severity parsing behavior is defined in a config file using a severity `mapping`. The general structure of the `mapping` is as follows: + +```yaml +... + mapping: + severity_alias: value | list of values | range | special + severity_alias: value | list of values | range | special +``` + +The following aliases are used to represent the possible severity levels: + +| Severity Number | Alias | +| --- | --- | +| 0 | `default` | +| 1 | `trace` | +| 2 | `trace2` | +| 3 | `trace3` | +| 4 | `trace4` | +| 5 | `debug` | +| 6 | `debug2` | +| 7 | `debug3` | +| 8 | `debug4` | +| 9 | `info` | +| 10 | `info2` | +| 11 | `info3` | +| 12 | `info4` | +| 13 | `warn` | +| 14 | `warn2` | +| 15 | `warn3` | +| 16 | `warn4` | +| 17 | `error` | +| 18 | `error2` | +| 19 | `error3` | +| 20 | `error4` | +| 21 | `fatal` | +| 22 | `fatal2` | +| 23 | `fatal3` | +| 24 | `fatal4` | + +The following example illustrates many of the ways in which mapping can configured: +```yaml +... + mapping: + + # single value to be parsed as "error" + error: oops + + # list of values to be parsed as "warn" + warn: + - hey! + - YSK + + # range of values to be parsed as "info" + info: + - min: 300 + max: 399 + + # special value representing the range 200-299, to be parsed as "debug" + debug: 2xx + + # single value to be parsed as a "info3" + info3: medium + + # mix and match the above concepts + fatal: + - really serious + - min: 9001 + max: 9050 + - 5xx +``` + +### How to simplify configuration with a `preset` + +A `preset` can reduce the amount of configuration needed in the `mapping` structure by initializing the severity mapping with common values. Values specified in the more verbose `mapping` structure will then be added to the severity map. + +By default, a common `preset` is used. Alternately, `preset: none` can be specified to start with an empty mapping. + +The following configurations are equivalent: + +```yaml +... + mapping: + error: 404 +``` + +```yaml +... + preset: default + mapping: + error: 404 +``` + +```yaml +... + preset: none + mapping: + trace: trace + trace2: trace2 + trace3: trace3 + trace4: trace4 + debug: debug + debug2: debug2 + debug3: debug3 + debug4: debug4 + info: info + info2: info2 + info3: info3 + info4: info4 + warn: warn + warn2: warn2 + warn3: warn3 + warn4: warn4 + error: + - error + - 404 + error2: error2 + error3: error3 + error4: error4 + fatal: fatal + fatal2: fatal2 + fatal3: fatal3 + fatal4: fatal4 +``` + +Additional built-in presets coming soon + + +### How to use severity parsing + +All parser operators, such as [`regex_parser`](/docs/operators/regex_parser.md) support these fields inside of a `severity` block. + +If a severity block is specified, the parser operator will perform the severity parsing _after_ performing its other parsing actions, but _before_ passing the entry to the specified output operator. + +```yaml +- type: regex_parser + regexp: '^StatusCode=(?P\d{3}), Host=(?P[^,]+)' + severity: + parse_from: body.severity_field + mapping: + warn: 5xx + error: 4xx + info: 3xx + debug: 2xx +``` + +--- + +As a special case, the [`severity_parser`](/docs/operators/severity_parser.md) operator supports these fields inline. This is because severity parsing is the primary purpose of the operator. +```yaml +- type: severity_parser + parse_from: body.severity_field + mapping: + warn: 5xx + error: 4xx + info: 3xx + debug: 2xx +``` + +### Example Configurations + +#### Parse a severity from a standard value + +Configuration: +```yaml +- type: severity_parser + parse_from: body.severity_field +``` + +Note that the default `preset` is in place, and no additional values have been specified. + + + + + + + +
Input entry Output entry
+ +```json +{ + "severity": "default", + "body": { + "severity_field": "ERROR" + } +} +``` + + + +```json +{ + "severity": "error", + "body": {} +} +``` + +
+ +#### Parse a severity from a non-standard value + +Configuration: +```yaml +- type: severity_parser + parse_from: body.severity_field + mapping: + error: nooo! +``` + +Note that the default `preset` is in place, and one additional values has been specified. + + + + + + + + + + + +
Input entry Output entry
+ +```json +{ + "severity": "default", + "body": { + "severity_field": "nooo!" + } +} +``` + + + +```json +{ + "severity": "error", + "body": {} +} +``` + +
+ +```json +{ + "severity": "default", + "body": { + "severity_field": "ERROR" + } +} +``` + + + +```json +{ + "severity": "error", + "body": {} +} +``` + +
+ +#### Parse a severity from any of several non-standard values + +Configuration: +```yaml +- type: severity_parser + parse_from: body.severity_field + mapping: + error: + - nooo! + - nooooooo + info: HEY + debug: 1234 +``` + + + + + + + + + + + + + + + + + + + + + + + +
Input entry Output entry
+ +```json +{ + "severity": "default", + "body": { + "severity_field": "nooo!" + } +} +``` + + + +```json +{ + "severity": "error", + "body": {} +} +``` + +
+ +```json +{ + "severity": "default", + "body": { + "severity_field": "nooooooo" + } +} +``` + + + +```json +{ + "severity": "error", + "body": {} +} +``` + +
+ +```json +{ + "severity": "default", + "body": { + "severity_field": "hey" + } +} +``` + + + +```json +{ + "severity": "info", + "body": {} +} +``` + +
+ +```json +{ + "severity": "default", + "body": { + "severity_field": 1234 + } +} +``` + + + +```json +{ + "severity": "debug", + "body": {} +} +``` + +
+ +```json +{ + "severity": "default", + "body": { + "severity_field": "unknown" + } +} +``` + + + +```json +{ + "severity": "default", + "body": {} +} +``` + +
+ +#### Parse a severity from a range of values + +Configuration: +```yaml +- type: severity_parser + parse_from: body.severity_field + mapping: + error: + - min: 1 + max: 5 + fatal: + - min: 6 + max: 10 +``` + + + + + + + + + + + + + + + +
Input entry Output entry
+ +```json +{ + "severity": "default", + "body": { + "severity_field": 3 + } +} +``` + + + +```json +{ + "severity": "error", + "body": {} +} +``` + +
+ +```json +{ + "severity": "default", + "body": { + "severity_field": 9 + } +} +``` + + + +```json +{ + "severity": "fatal", + "body": {} +} +``` + +
+ +```json +{ + "severity": "default", + "body": { + "severity_field": 12 + } +} +``` + + + +```json +{ + "severity": "default", + "body": {} +} +``` + +
+ +#### Parse a severity from a HTTP Status Codes value + +Special values are provided to represent http status code ranges. + +| Value | Meaning | +| --- | --- | +| 2xx | 200 - 299 | +| 3xx | 300 - 399 | +| 4xx | 400 - 499 | +| 5xx | 500 - 599 | + +Configuration: +```yaml +- type: severity_parser + parse_from: body.severity_field + mapping: + warn: 5xx + error: 4xx + info: 3xx + debug: 2xx +``` + +Equivalent Configuration: +```yaml +- id: my_severity_parser + type: severity_parser + parse_from: body.severity_field + mapping: + warn: + - min: 500 + max: 599 + error: + - min: 400 + max: 499 + info: + - min: 300 + max: 399 + debug: + - min: 200 + max: 299 + output: my_next_operator +``` + + + + + + + + + + + + + + + +
Input entry Output entry
+ +```json +{ + "severity": "default", + "body": { + "severity_field": 302 + } +} +``` + + + +```json +{ + "severity": "info", + "body": {} +} +``` + +
+ +```json +{ + "severity": "default", + "body": { + "severity_field": 404 + } +} +``` + + + +```json +{ + "severity": "error", + "body": {} +} +``` + +
+ +```json +{ + "severity": "default", + "body": { + "severity_field": 200 + } +} +``` + + + +```json +{ + "severity": "debug", + "body": {} +} +``` + +
+ +#### Parse a severity from a value without using the default preset + +Configuration: +```yaml +- type: severity_parser + parse_from: body.severity_field + preset: none + mapping: + error: nooo! +``` + + + + + + + + + + + +
Input entry Output entry
+ +```json +{ + "severity": "default", + "body": { + "severity_field": "nooo!" + } +} +``` + + + +```json +{ + "severity": "error", + "body": {} +} +``` + +
+ +```json +{ + "severity": "default", + "body": { + "severity_field": "ERROR" + } +} +``` + + + +```json +{ + "severity": "default", + "body": {} +} +``` + +
diff --git a/pkg/stanza/docs/types/timestamp.md b/pkg/stanza/docs/types/timestamp.md new file mode 100644 index 000000000000..ca70b423f9f9 --- /dev/null +++ b/pkg/stanza/docs/types/timestamp.md @@ -0,0 +1,177 @@ +## `timestamp` parsing parameters + +Parser operators can parse a timestamp and attach the resulting time value to a log entry. + +| Field | Default | Description | +| --- | --- | --- | +| `parse_from` | required | The [field](/docs/types/field.md) from which the value will be parsed. | +| `layout_type` | `strptime` | The type of timestamp. Valid values are `strptime`, `gotime`, and `epoch`. | +| `layout` | required | The exact layout of the timestamp to be parsed. | +| `preserve_to` | | Preserves the unparsed value at the specified [field](/docs/types/field.md). | +| `location` | `Local` | The geographic location (timezone) to use when parsing a timestamp that does not include a timezone. The available locations depend on the local IANA Time Zone database. [This page](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) contains many examples, such as `America/New_York`. | + + +### How to specify timestamp parsing parameters + +Most parser operators, such as [`regex_parser`](/docs/operators/regex_parser.md) support these fields inside of a `timestamp` block. + +If a timestamp block is specified, the parser operator will perform the timestamp parsing _after_ performing its other parsing actions, but _before_ passing the entry to the specified output operator. + +```yaml +- type: regex_parser + regexp: '^Time=(?P\d{4}-\d{2}-\d{2}), Host=(?P[^,]+)' + timestamp: + parse_from: body.timestamp_field + layout_type: strptime + layout: '%Y-%m-%d' +``` + +--- + +As a special case, the [`time_parser`](/docs/operators/time_parser.md) operator supports these fields inline. This is because time parsing is the primary purpose of the operator. +```yaml +- type: time_parser + parse_from: body.timestamp_field + layout_type: strptime + layout: '%Y-%m-%d' +``` + +### Example Configurations + +#### Parse a timestamp using a `strptime` layout + +The default `layout_type` is `strptime`, which uses "directives" such as `%Y` (4-digit year) and `%H` (2-digit hour). A full list of supported directives is found [here](https://github.com/observiq/ctimefmt/blob/3e07deba22cf7a753f197ef33892023052f26614/ctimefmt.go#L63). + +Configuration: +```yaml +- type: time_parser + parse_from: body.timestamp_field + layout_type: strptime + layout: '%a %b %e %H:%M:%S %Z %Y' +``` + + + + + + + +
Input entry Output entry
+ +```json +{ + "timestamp": "", + "body": { + "timestamp_field": "Jun 5 13:50:27 EST 2020" + } +} +``` + + + +```json +{ + "timestamp": "2020-06-05T13:50:27-05:00", + "body": {} +} +``` + +
+ +#### Parse a timestamp using a `gotime` layout + +The `gotime` layout type uses Golang's native time parsing capabilities. Golang takes an [unconventional approach](https://www.pauladamsmith.com/blog/2011/05/go_time.html) to time parsing. Finer details are well-documented [here](https://golang.org/src/time/format.go?s=25102:25148#L9). + +Configuration: +```yaml +- type: time_parser + parse_from: body.timestamp_field + layout_type: gotime + layout: Jan 2 15:04:05 MST 2006 +``` + + + + + + + +
Input entry Output entry
+ +```json +{ + "timestamp": "", + "body": { + "timestamp_field": "Jun 5 13:50:27 EST 2020" + } +} +``` + + + +```json +{ + "timestamp": "2020-06-05T13:50:27-05:00", + "body": {} +} +``` + +
+ +#### Parse a timestamp using an `epoch` layout (and preserve the original value) + +The `epoch` layout type uses can consume epoch-based timestamps. The following layouts are supported: + +| Layout | Meaning | Example | `parse_from` data type support | +| --- | --- | --- | --- | +| `s` | Seconds since the epoch | 1136214245 | `string`, `int64`, `float64` | +| `ms` | Milliseconds since the epoch | 1136214245123 | `string`, `int64`, `float64` | +| `us` | Microseconds since the epoch | 1136214245123456 | `string`, `int64`, `float64` | +| `ns` | Nanoseconds since the epoch | 1136214245123456789 | `string`, `int64`, `float64`[2] | +| `s.ms` | Seconds plus milliseconds since the epoch | 1136214245.123 | `string`, `int64`[1], `float64` | +| `s.us` | Seconds plus microseconds since the epoch | 1136214245.123456 | `string`, `int64`[1], `float64` | +| `s.ns` | Seconds plus nanoseconds since the epoch | 1136214245.123456789 | `string`, `int64`[1], `float64`[2] | + +[1] Interpretted as seconds. Equivalent to using `s` layout.
+[2] Due to floating point precision limitations, loss of up to 100ns may be expected. + + + +Configuration: +```yaml +- type: time_parser + parse_from: body.timestamp_field + layout_type: epoch + layout: s + preserve: true +``` + + + + + + + +
Input entry Output entry
+ +```json +{ + "timestamp": "", + "body": { + "timestamp_field": 1136214245 + } +} +``` + + + +```json +{ + "timestamp": "2006-01-02T15:04:05-07:00", + "body": { + "timestamp_field": 1136214245 + } +} +``` + +
diff --git a/pkg/stanza/docs/types/trace.md b/pkg/stanza/docs/types/trace.md new file mode 100644 index 000000000000..2c41408111c2 --- /dev/null +++ b/pkg/stanza/docs/types/trace.md @@ -0,0 +1,47 @@ +## Trace Parsing + +Traces context fields are defined in the [OpenTelemetry Logs Data Model](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model.md#trace-context-fields). + + +### `trace` parsing parameters + +Parser operators can parse a trace context and attach the resulting values to a log entry. + +| Field | Default | Description | +| --- | --- | --- | +| `trace_id.parse_from` | `trace_id` | A [field](/docs/types/field.md) that indicates the field to be parsed as a trace ID. | +| `span_id.parse_from` | `span_id` | A [field](/docs/types/field.md) that indicates the field to be parsed as a span ID. | +| `trace_flags.parse_from` | `trace_flags` | A [field](/docs/types/field.md) that indicates the field to be parsed as trace flags. | + + +### How to use trace parsing + +All parser operators, such as [`regex_parser`](/docs/operators/regex_parser.md) support these fields inside of a `trace` block. + +If a `trace` block is specified, the parser operator will perform the trace parsing _after_ performing its other parsing actions, but _before_ passing the entry to the specified output operator. + +```yaml +- type: regex_parser + regexp: '^TraceID=(?P\S*) SpanID=(?P\S*) TraceFlags=(?P\d*)' + trace: + trace_id: + parse_from: body.trace_id + span_id: + parse_from: body.span_id + trace_flags: + parse_from: body.trace_flags +``` + +--- + +As a special case, the [`trace_parser`](/docs/operators/trace_parser.md) operator supports these fields inline. This is because trace parsing is the primary purpose of the operator. + +```yaml +- type: trace_parser + trace_id: + parse_from: body.trace_id + span_id: + parse_from: body.span_id + trace_flags: + parse_from: body.trace_flags +``` diff --git a/pkg/stanza/go.mod b/pkg/stanza/go.mod new file mode 100644 index 000000000000..2f999a048b96 --- /dev/null +++ b/pkg/stanza/go.mod @@ -0,0 +1,3 @@ +module github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza + +go 1.17 diff --git a/versions.yaml b/versions.yaml index f87d47df31c9..006b84721fd6 100644 --- a/versions.yaml +++ b/versions.yaml @@ -106,6 +106,7 @@ module-sets: - github.com/open-telemetry/opentelemetry-collector-contrib/pkg/batchpersignal - github.com/open-telemetry/opentelemetry-collector-contrib/pkg/experimentalmetricmetadata - github.com/open-telemetry/opentelemetry-collector-contrib/pkg/resourcetotelemetry + - github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza - github.com/open-telemetry/opentelemetry-collector-contrib/pkg/translator/jaeger - github.com/open-telemetry/opentelemetry-collector-contrib/pkg/translator/opencensus - github.com/open-telemetry/opentelemetry-collector-contrib/pkg/translator/prometheusremotewrite