Cherry-pick #17253 to 7.x: Ignore trailing spaces in CEF messages #17283
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Cherry-pick of PR #17253 to 7.x branch. Original message:
What does this PR do?
This patch updates the Ragel state machine to skip trailing space at the end of CEF messages.
Some CEF exporters, Check Point for example, have been observed to add a trailing space to CEF messages:
Currently, this space character is interpreted as part of the last field's value, which can cause decoding errors if the value is an integer or an IP address:
For maximizing compatibility, we also want to ignore other kinds of white-space characters (newline, carriage return, tab). For example we can get a trailing newline when processing CEF messages from UDP input instead of syslog, which removes newlines.
Trailing space in other extensions' values is preserved, as the CEF standard permits (but discourages) it's use in non-final extensions.
Why is it important?
We've observed users experiencing this problem, trying to fix it (unsuccessfully) with a processor:
Why a draft?
While the current solution is complete, I'm not satisfied of the changes that introduces to the Ragel SM definition.
Originally I wanted to get something more elegant like this to work:
So that we can keep the
extension
machine as-is, and add a specializedfinal_extension
machine that will disallow trailing space. However, I didn't manage to get this kind of pattern to work. It requires rewriting a lot of the capture actions to allow for the necessary backtracking, and I wasn't confident enough to get that right without introducing more problems than I was solving, or dedicating too many hours to this fix, due to my limited experience with ragel.The current solution works by accident. I decided to try a different approach starting by disallowing trailing space in all extensions, and found out that it works as I wanted and captures white-space as value in all extensions but the last. This is a side effect of how an extension value is captured differently in
extension_key
vsextension_eof
. The former captures the previous value up tomark-1
, that is the start of the current key minus the space separator. The later captures up toextValueEnd
, which is not incremented for trailing space.The current machine definition is an ugly mix of
" "
andspace
usage, because we want to have the space character as the only separator, but trim al trailing space:[\t\v\f\n\r ]