Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(file source): Better multi-line support #1852

Merged
merged 18 commits into from
Feb 22, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/CODEOWNERS
This CODEOWNERS file contains errors

CODEOWNERS errors

  • Unknown owner on line 3: make sure @a-rodin exists and has write access to the repository
    /distribution/ @a-rodin
  • Unknown owner on line 7: make sure @a-rodin exists and has write access to the repository
    /scripts/build-* @a-rodin
  • Unknown owner on line 8: make sure @a-rodin exists and has write access to the repository
    /scripts/ci-docker-images/ @a-rodin
  • Unknown owner on line 9: make sure @a-rodin exists and has write access to the repository
    /scripts/check-* @a-rodin
  • Unknown owner on line 10: make sure @a-rodin exists and has write access to the repository
    /scripts/docker-* @a-rodin
  • Unknown owner on line 11: make sure @binarylogic exists and has write access to the repository
    /scripts/generate* @binarylogic
  • Unknown owner on line 12: make sure @a-rodin exists and has write access to the repository
    /scripts/package-* @a-rodin
  • Unknown owner on line 13: make sure @a-rodin exists and has write access to the repository
    /scripts/release-* @a-rodin
  • Unknown owner on line 14: make sure @a-rodin exists and has write access to the repository
    /scripts/upgrade-* @a-rodin
  • Unknown owner on line 16: make sure @a-rodin exists and has write access to the repository
    /src/buffers/ @a-rodin
  • Unknown owner on line 18: make sure @Jeffail exists and has write access to the repository
    /src/generate.rs @Jeffail
  • Unknown owner on line 19: make sure @Jeffail exists and has write access to the repository
    /src/list.rs @Jeffail
  • Unknown owner on line 22: make sure @LucioFranco exists and has write access to the repository
    /src/region.rs @LucioFranco
  • Unknown owner on line 23: make sure @LucioFranco exists and has write access to the repository
    /src/runtime.rs @LucioFranco
  • Unknown owner on line 25: make sure @LucioFranco exists and has write access to the repository
    /src/trace.rs @LucioFranco
  • Unknown owner on line 29: make sure @LucioFranco exists and has write access to the repository
    …/sinks/aws_kinesis_streams.rs @LucioFranco
  • Unknown owner on line 30: make sure @LucioFranco exists and has write access to the repository
    /src/sinks/aws_s3.rs @LucioFranco
  • Unknown owner on line 31: make sure @LucioFranco exists and has write access to the repository
    /src/sinks/blackhole.rs @LucioFranco
  • Unknown owner on line 32: make sure @a-rodin exists and has write access to the repository
    /src/sinks/clickhouse.rs @a-rodin
  • Unknown owner on line 34: make sure @LucioFranco exists and has write access to the repository
    /src/sinks/elasticsearch.rs @LucioFranco
  • Unknown owner on line 36: make sure @LucioFranco exists and has write access to the repository
    /src/sinks/http.rs @LucioFranco
  • Unknown owner on line 44: make sure @LucioFranco exists and has write access to the repository
    /src/sources/docker.rs @LucioFranco
  • Unknown owner on line 45: make sure @LucioFranco exists and has write access to the repository
    /src/sources/file/mod.rs @LucioFranco
  • Unknown owner on line 46: make sure @MOZGIII exists and has write access to the repository
    /src/sources/file/line_agg.rs @MOZGIII
  • Unknown owner on line 48: make sure @a-rodin exists and has write access to the repository
    /src/sources/kafka.rs @a-rodin
  • Unknown owner on line 57: make sure @LucioFranco exists and has write access to the repository
    /src/transforms/add_fields.rs @LucioFranco
  • Unknown owner on line 64: make sure @LucioFranco exists and has write access to the repository
    …src/transforms/json_parser.rs @LucioFranco
  • Unknown owner on line 66: make sure @a-rodin exists and has write access to the repository
    /src/transforms/lua.rs @a-rodin
  • Unknown owner on line 67: make sure @MOZGIII exists and has write access to the repository
    /src/transforms/merge.rs @MOZGIII
  • Unknown owner on line 69: make sure @LucioFranco exists and has write access to the repository
    …c/transforms/remove_fields.rs @LucioFranco
  • Unknown owner on line 75: make sure @MOZGIII exists and has write access to the repository
    /src/event/merge.rs @MOZGIII
  • Unknown owner on line 77: make sure @binarylogic exists and has write access to the repository
    /website/ @binarylogic
Original file line number Diff line number Diff line change
@@ -42,7 +42,8 @@
/src/sinks/tcp.rs @lukesteensen

/src/sources/docker.rs @LucioFranco
/src/sources/file.rs @LucioFranco
/src/sources/file/mod.rs @LucioFranco
/src/sources/file/line_agg.rs @MOZGIII
/src/sources/journald.rs @bruceg
/src/sources/kafka.rs @a-rodin
/src/sources/stdin.rs @bruceg
86 changes: 84 additions & 2 deletions .meta/sources/file.toml
Original file line number Diff line number Diff line change
@@ -138,7 +138,7 @@ fingerprint. This is helpful if all files share a common header.\

[sources.file.options.message_start_indicator]
type = "string"
category = "Multi-line"
category = "Multi-line (deprecated)"
examples = ["^(INFO|ERROR)"]
description = """\
When present, Vector will aggregate multiple lines into a single event, using \
@@ -149,7 +149,7 @@ a regular expression, so remember to anchor as appropriate.\

[sources.file.options.multi_line_timeout]
type = "int"
category = "Multi-line"
category = "Multi-line (deprecated)"
default = 1000
unit = "milliseconds"
description = """\
@@ -178,6 +178,88 @@ Instead of balancing read capacity fairly across all watched files, prioritize \
draining the oldest files before moving on to read data from younger files.\
"""

[sources.file.options.multiline]
type = "table"
category = "Multiline"
common = true
required = false
description = """\
Multiline parsing configuration. \
If not speicified, multiline parsing is disabled.\

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling issue. Should be "specified."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rcollette That spelling error has been addressed in later code. Thanks for letting us know though!

"""

[sources.file.options.multiline.children.start_pattern]
type = "string"
category = "Multiline"
examples = ["^[^\\s]", "\\\\$", "^(INFO|ERROR) ", "[^;]$"]
common = true
required = true
description = """\
Start pattern to look for as a beginning of the message.\
"""

[sources.file.options.multiline.children.condition_pattern]
type = "string"
category = "Multiline"
examples = ["^[\\s]+", "\\\\$", "^(INFO|ERROR) ", ";$"]
common = true
required = true
description = """\
Condition pattern to look for. Exact behavior is configured via `mode`.\
"""

[sources.file.options.multiline.children.mode]
type = "string"
category = "Multiline"
examples = ["continue_through", "continue_past", "halt_before", "halt_with"]
common = true
required = true
description = """\
Mode of operation, specifies how the condition pattern is interpreted.\
"""

[sources.file.options.multiline.children.mode.enum]
continue_through = """\
All consecutive lines matching this pattern are included in the group. \
The first line (the line that matched the start pattern) does not need \
to match the `ContinueThrough` pattern. \
This is useful in cases such as a Java stack trace, where some indicator \
in the line (such as leading whitespace) indicates that it is an \
extension of the preceeding line.\
"""
continue_past = """\
All consecutive lines matching this pattern, plus one additional line, \
are included in the group. \
This is useful in cases where a log message ends with a continuation \
marker, such as a backslash, indicating that the following line is part \
of the same message.\
"""
halt_before = """\
All consecutive lines not matching this pattern are included in the \
group. \
This is useful where a log line contains a marker indicating that it \
begins a new message.\
"""
halt_with = """\
All consecutive lines, up to and including the first line matching this \
pattern, are included in the group. \
This is useful where a log line ends with a termination marker, such as \
a semicolon.\
"""

[sources.file.options.multiline.children.timeout_ms]
type = "int"
category = "Multiline"
examples = [1000, 600000]
unit = "milliseconds"
common = true
required = true
description = """\
The maximum time to wait for the continuation. Once this timeout is \
reached, the buffered message is guaraneed to be flushed, even if \
incomplete.\
"""

[sources.file.output.log.fields.file]
type = "string"
examples = ["/var/log/nginx.log"]
44 changes: 43 additions & 1 deletion config/vector.spec.toml
Original file line number Diff line number Diff line change
@@ -233,7 +233,7 @@ dns_servers = ["0.0.0.0:53"]
host_key = "host"

#
# Multi-line
# Multi-line (deprecated)
#

# When present, Vector will aggregate multiple lines into a single event, using
@@ -314,6 +314,48 @@ dns_servers = ["0.0.0.0:53"]
strategy = "checksum"
strategy = "device_and_inode"

#
# Multiline
#

[sources.file.multiline]
# Condition pattern to look for. Exact behavior is configured via `mode`.
#
# * required
# * type: string
condition_pattern = "^[\\s]+"
condition_pattern = "\\\\$"
condition_pattern = "^(INFO|ERROR) "
condition_pattern = ";$"

# Mode of operation, specifies how the condition pattern is interpreted.
#
# * required
# * type: string
# * enum: "continue_through", "continue_past", "halt_before", and "halt_with"
mode = "continue_through"
mode = "continue_past"
mode = "halt_before"
mode = "halt_with"

# Start pattern to look for as a beginning of the message.
#
# * required
# * type: string
start_pattern = "^[^\\s]"
start_pattern = "\\\\$"
start_pattern = "^(INFO|ERROR) "
start_pattern = "[^;]$"

# The maximum time to wait for the continuation. Once this timeout is reached,
# the buffered message is guaraneed to be flushed, even if incomplete.
#
# * required
# * type: int
# * unit: milliseconds
timeout_ms = 1000
timeout_ms = 600000

# Ingests data through log records from journald and outputs `log` events.
[sources.journald]
# The component type. This is a required field that tells Vector which
Loading