-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INGEST: Extend KV Processor (#31789) #32232
Conversation
Pinging @elastic/es-core-infra |
Added more capabilities supported by LS to the KV processor: * Stripping of brackets and quotes from values (`include_brackets` in corresponding LS filter) * Adding key prefixes * Trimming specified chars from keys and values Refactored the way the filter is configured to avoid conditionals during execution. Refactored Tests a little to not have to add more redundant getters for new parameters. Closes elastic#31786
c63c53c
to
0416679
Compare
@rjernst @talevy this should be good to review. |
@original-brownbear I think all the changes here are good ... but I don't think it resolves #31786 If I understand correctly the #31786 is to treat anything in quotes for the value as a single entity. For example:
should result in |
@jakelandis yea, the example you're making is still not fixed, but that won't work in LS either (unless you're doing a bunch of Regex trickery). e.g.
{
"docs": [
{
"doc": {
"_index": "_index",
"_type": "_type",
"_id": "_id",
"_source": {
"a": "foo",
"b": "bar baz",
"c": "lol",
"message": "a='foo' b='bar baz' c='lol'"
},
"_ingest": {
"timestamp": "2018-07-20T15:55:56.635799Z"
}
}
}
]
} I think that's what this comment #31786 (comment) is basically asking for after we suggest that the tricky mix of quotes and no quotes is better handled by Grok. |
@@ -1732,6 +1732,10 @@ For example, if you have a log message which contains `ip=1.2.3.4 error=REFUSED` | |||
| `include_keys` | no | `null` | List of keys to filter and insert into document. Defaults to including all keys | |||
| `exclude_keys` | no | `null` | List of keys to exclude from document | |||
| `ignore_missing` | no | `false` | If `true` and `field` does not exist or is `null`, the processor quietly exits without modifying the document | |||
| `prefix` | no | `null` | Prefix to be added to extracted keys | |||
| `trim_key` | no | `null` | String of characters to trim from extracted keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you want to call out these are regex character classes ?
edit: clarified question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakelandis I'd rather not talk about regex patterns here I think. If I do that instead of simply saying I'm trimming all the chars in the given string, then that's probably not really true if you look at the code (I'm building a pattern from the string, arbitrary patterns won't work here).
@original-brownbear - fair points. I think we could change the I did some light testing and the code looks good. So LGTM, but may want to wait for one more given my experience on this code base. Also, can you leave #31786 open ? This change helps, but does not solve the original intent of that issue, and I think (given a bit more time) that a solution to the original issue is feasible. |
@jakelandis thanks, sure I can leave it open, would be neat if we found an elegant solution here that isn't just a tricky regex :) |
@rjernst could you take a look here? :) |
I defer to @talevy on this one, as he is much more familiar with the logstash side this PR intends to mimic. |
Cool! I think it is great to go through the shortcomings of some of our existing processors. This comment (#32232 (comment)) causes me to wonder a few things. If this isn't fixing the issue, why are we claiming it closes the issue? I have been away from KV use-cases in Logstash for a while, but I remember its specific behavior was built for very specific application log-lines in mind. Documenting this would be helpful. The reason it lacked parity in the first place was because the case for doing this extra regex magic did not feel right in a processor that simply is called "Key-Value". It felt like it was doing a lot more. I think it is totally reasonable to re-evaluate this and bring in the parity, but I think it would be helpful to bring in more examples as to why it does all this. If it is doing these extra things because a handful of specific log lines do these things, it may be best to provide examples of how to use grok+kv together to achieve proper parsing, rather than to bake in heuristics into this processor. Let me know your thoughts! |
@talevy I think I would agree with @jakelandis here that there should be an easy and clean way to achieve this kind of parsing in KV. Probably better to have this discussion in the related issue though :), this PR is really more about getting feature parity with LS. I changed the PR description accordingly to "relates" instead of "fixes" :) |
thanks @original-brownbear. that helps. I think that is a good plan |
@talevy sweet :) Would you still review this one as a step 1 towards that? |
yup! @original-brownbear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good!
In addition to successful tests of trimming keys/values, and strip_brackets. Should we have tests for those when set and do not match the input keys/values? So, when their patterns do not match
| `prefix` | no | `null` | Prefix to be added to extracted keys | ||
| `trim_key` | no | `null` | String of characters to trim from extracted keys | ||
| `trim_value` | no | `null` | String of characters to trim from extracted values | ||
| `strip_brackets` | no | `false` | If `true` strip brackets `()`, `<>`, `[]` as well as quotes `'` and `"` from extracted values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be renamed to be more general than just brackets. maybe strip_surrounding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@talevy yea I agree, brackets isn't a good term here, I went with it because that's what LS uses (see https://github.com/logstash-plugins/logstash-filter-kv/blob/master/lib/logstash/filters/kv.rb#L290). surrounding
sounds kinda strange too, not that I have a better suggestion ... maybe strip_delimiters
but then that's somewhat weird too since we're already splitting twice elsewhere :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is part of the hidden magic that I didn't love in that filter. But I agree, I cannot think of a better word, and having a separate property just for quotes feels like too much variation
if (prefix != null) { | ||
config.put("prefix", prefix); | ||
} | ||
return FACTORY.create(null, randomAlphaOfLength(10), config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should have additional explicit tests of the factory as we do in KeyValueProcessorFactoryTests for these new config options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
@talevy when patterns don't match nothing happens, I think that's kinda redundant to test isn't it? (Especially since this already happens implicitly in those tests that mix a few quote/bracket kinds) |
@original-brownbear true. I guess I was looking for something more explicit just in case there was different behavior when none match. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@talevy thanks! |
I have POC [1] for supporting quotes (lifted some code from my dissect implementation). Reg-ex's got real ugly real quick. I will work on getting it cleaned up and better tested and submit a PR. |
* INGEST: Extend KV Processor (elastic#31789) Added more capabilities supported by LS to the KV processor: * Stripping of brackets and quotes from values (`include_brackets` in corresponding LS filter) * Adding key prefixes * Trimming specified chars from keys and values Refactored the way the filter is configured to avoid conditionals during execution. Refactored Tests a little to not have to add more redundant getters for new parameters. Relates elastic#31786 * Add documentation
* es/master: (23 commits) Switch full-cluster-restart to new style Requests (#32140) [DOCS] Clarified that you must remove X-Pack plugin when upgrading from pre-6.3. (#32016) Remove BouncyCastle dependency from runtime (#32193) INGEST: Extend KV Processor (#31789) (#32232) INGEST: Make a few Processors callable by Painless (#32170) Add region ISO code to GeoIP Ingest plugin (#31669) [Tests] Remove QueryStringQueryBuilderTests#toQuery class assertions (#32236) Make sure that field aliases count towards the total fields limit. (#32222) Switch rolling restart to new style Requests (#32147) muting failing test for internal auto date histogram to avoid failure before fix is merged MINOR: Remove unused `IndexDynamicSettings` (#32237) Fix multi level nested sort (#32204) Enhance Parent circuit breaker error message (#32056) [ML] Use default request durability for .ml-state index (#32233) Remove indices stats timeout from monitoring docs Rename ranking evaluation response section (#32166) Dependencies: Upgrade to joda time 2.10 (#32160) Remove aliases resolution limitations when security is enabled (#31952) Ensure that field aliases cannot be used in multi-fields. (#32219) TESTS: Check for Netty resource leaks (#31861) ...
* INGEST: Extend KV Processor (#31789) Added more capabilities supported by LS to the KV processor: * Stripping of brackets and quotes from values (`include_brackets` in corresponding LS filter) * Adding key prefixes * Trimming specified chars from keys and values Refactored the way the filter is configured to avoid conditionals during execution. Refactored Tests a little to not have to add more redundant getters for new parameters. Relates #31786 * Add documentation
Added more capabilities supported by LS to the KV processor:
include_brackets
in corresponding LS filter)Refactored the way the filter is configured to avoid conditionals during execution.
Refactored Tests a little to not have to add more redundant getters for new parameters.
relates to #31786