Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick #17253 to 7.7: Ignore trailing spaces in CEF messages #17284

Merged
merged 2 commits into from
Apr 3, 2020

Conversation

adriansr
Copy link
Contributor

@adriansr adriansr commented Mar 27, 2020

Cherry-pick of PR #17253 to 7.7 branch. Original message:

What does this PR do?

This patch updates the Ragel state machine to skip trailing space at the end of CEF messages.

Some CEF exporters, Check Point for example, have been observed to add a trailing space to CEF messages:

"CEF:0:| [...] src=127.0.0.1 "

Currently, this space character is interpreted as part of the last field's value, which can cause decoding errors if the value is an integer or an IP address:

"error": {
    "message": "sourceAddress: value is not a valid IP address"
  }

For maximizing compatibility, we also want to ignore other kinds of white-space characters (newline, carriage return, tab). For example we can get a trailing newline when processing CEF messages from UDP input instead of syslog, which removes newlines.

Trailing space in other extensions' values is preserved, as the CEF standard permits (but discourages) it's use in non-final extensions.

Why is it important?

We've observed users experiencing this problem, trying to fix it (unsuccessfully) with a processor:

- dissect:
      when:
        equals:
          event.module: "cef"
      tokenizer: "%{sourceAddress} "
      field: "cef.extensions.sourceAddress"
      target_prefix: "cef.extensions"

Why a draft?

While the current solution is complete, I'm not satisfied of the changes that introduces to the Ragel SM definition.

Originally I wanted to get something more elegant like this to work:

extensions = (extension " ")* final_extension space*

So that we can keep the extension machine as-is, and add a specialized final_extension machine that will disallow trailing space. However, I didn't manage to get this kind of pattern to work. It requires rewriting a lot of the capture actions to allow for the necessary backtracking, and I wasn't confident enough to get that right without introducing more problems than I was solving, or dedicating too many hours to this fix, due to my limited experience with ragel.

The current solution works by accident. I decided to try a different approach starting by disallowing trailing space in all extensions, and found out that it works as I wanted and captures white-space as value in all extensions but the last. This is a side effect of how an extension value is captured differently in extension_key vs extension_eof. The former captures the previous value up to mark-1, that is the start of the current key minus the space separator. The later captures up to extValueEnd, which is not incremented for trailing space.

The current machine definition is an ugly mix of " " and space usage, because we want to have the space character as the only separator, but trim al trailing space: [\t\v\f\n\r ]

This patch updates the ragel state machine to skip trailing spaces
at the end of CEF messages.

Some CEF exporters, Check Point for example, have been observed to add a
trailing space to CEF messages:

> "CEF:0:| [...] src=127.0.0.1 "

Currently, this space character is interpreted as part of the last
field's value, which can cause decoding errors if the value is an
integer or an IP address.

For maximizing compatibility, we also want to ignore other kinds of
space characters (new line, carriage return, tab). For example we
can get a trailing newline when processing CEF messages from UDP 
input instead of syslog, which removes newlines.

Spaces in non-final extensions are preserved, as the CEF standard
permits (but discourages) it's use in non-final extensions.

(cherry picked from commit a1ec55a)
@elasticmachine
Copy link
Collaborator

Pinging @elastic/siem (Team:SIEM)

@adriansr adriansr merged commit a6ce4a9 into elastic:7.7 Apr 3, 2020
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
This patch updates the ragel state machine to skip trailing spaces
at the end of CEF messages.

Some CEF exporters, Check Point for example, have been observed to add a
trailing space to CEF messages:

> "CEF:0:| [...] src=127.0.0.1 "

Currently, this space character is interpreted as part of the last
field's value, which can cause decoding errors if the value is an
integer or an IP address.

For maximizing compatibility, we also want to ignore other kinds of
space characters (new line, carriage return, tab). For example we
can get a trailing newline when processing CEF messages from UDP 
input instead of syslog, which removes newlines.

Spaces in non-final extensions are preserved, as the CEF standard
permits (but discourages) it's use in non-final extensions.

(cherry picked from commit ee5d5bd)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants