-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KV Ingest Processor splitting on whitespace in message #31786
Comments
Reading the docs about the kv processor I think what's happening here is the expected behavior as:
The KV processor maybe too simple for what you need to achieve, given that some of your values are enclosed in double quotes, if they contain spaces and some not. If the order of your fields doesn't change, perhaps the grok processor could be of better use here, as for example you can use the quoted string pattern to match values enclosed in double quotes. |
Pinging @elastic/es-core-infra |
Thanks for the response @dliappis Unfortunately, the fields aren't consistent across all applications, they're just all using the logfmt format. It would be great if the kv filter was to be extended a bit further to match similar capabilities as the Logstash one. |
@jakelandis I'll handle this unless you already started somehow ?:) |
@original-brownbear - all yours |
Added more capabilities supported by LS to the KV processor: * Stripping of brackets and quotes from values (`include_brackets` in corresponding LS filter) * Adding key prefixes * Trimming specified chars from keys and values Refactored the way the filter is configured to avoid conditionals during execution. Refactored Tests a little to not have to add more redundant getters for new parameters. Closes elastic#31786
* INGEST: Extend KV Processor (#31789) Added more capabilities supported by LS to the KV processor: * Stripping of brackets and quotes from values (`include_brackets` in corresponding LS filter) * Adding key prefixes * Trimming specified chars from keys and values Refactored the way the filter is configured to avoid conditionals during execution. Refactored Tests a little to not have to add more redundant getters for new parameters. Relates #31786 * Add documentation
* INGEST: Extend KV Processor (elastic#31789) Added more capabilities supported by LS to the KV processor: * Stripping of brackets and quotes from values (`include_brackets` in corresponding LS filter) * Adding key prefixes * Trimming specified chars from keys and values Refactored the way the filter is configured to avoid conditionals during execution. Refactored Tests a little to not have to add more redundant getters for new parameters. Relates elastic#31786 * Add documentation
* INGEST: Extend KV Processor (#31789) Added more capabilities supported by LS to the KV processor: * Stripping of brackets and quotes from values (`include_brackets` in corresponding LS filter) * Adding key prefixes * Trimming specified chars from keys and values Refactored the way the filter is configured to avoid conditionals during execution. Refactored Tests a little to not have to add more redundant getters for new parameters. Relates #31786 * Add documentation
@jakelandis assigning you here since you wanted to experiment some more with this :) |
Hi @original-brownbear any idea when this will be pushed / released into mainstream? I am currently writing a Filebeat module to dissect log messages sent from a Fortigate firewall. elastic/beats#13245 Here you have an ingest simulate pipeline sample. Funny thing is also that the POST /_ingest/pipeline/_simulate
{
"pipeline": {
"description": "_description",
"processors": [
{
"kv": {
"field_split": " ",
"value_split": "=",
"field": "message",
"target_field": "fortinet.message",
"ignore_failure": true,
"exclude_keys":[
"srccountry",
"dstcountry"
],
"trim_value": "\""
}
}
]
},
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"message": "date=\"2019-10-06\" time=\"19:09:27\" devname=\"FGT-2\" devid=\"FG101E4J17OOO702\" logid=\"0001000014\" type=\"traffic\" subtype=\"local\" level=\"notice\" vd=\"root\" eventtime=\"1570381767687891019\" tz=\"+0200\" srcip=\"81.8.45.152\" srcport=\"43688\" srcintf=\"vlan78\" srcintfrole=\"wan\" dstip=\"212.188.109.206\" dstport=\"63390\" dstintf=\"root\" dstintfrole=\"undefined\" sessionid=\"12336602\" proto=\"6\" action=\"deny\" policyid=\"0\" policytype=\"local-in-policy\" service=\"tcp/63390\" dstcountry=\"Austria\" srccountry=\"Russian Federation\" trandisp=\"noop\" duration=\"0\" sentbyte=\"0\" rcvdbyte=\"0\" sentpkt=\"0\" appcat=\"unscanned\" crscore=\"5\" craction=\"262144\" crlevel=\"low\" mastersrcmac=\"e0:5f:b9:65:b5:01\" srcmac=\"e0:5f:b9:65:b5:01\" srcserver=\"0\""
}
}
]
} |
Hi @jakelandis @original-brownbear do you happen to have any update on when it will be released? I would really need it to finish my fortinet module. |
@philippkahr, I succeeded parsing Fortigate (CEF format enabled), with the following configuration, after some headache and regex magic. The
The kv filter:
|
The double quote can also be excluded, After modifying the RegExp rverchere provided. thanks a lot. |
Hi, I've enhanced my KV filter with the following parameters (for the CEF format):
|
Hi, Stuck with similar issue, tried @rverchere REGEX patterns, but still struggling with spaces discovered between the File Paths as values.
Field Split REGEX -
|
got it working -
|
Thank you for this example. It works with my kv use case aw well. |
Another alternative is to use the dissect processor which support k/v pairing too, but you need to know the shape of the message ahead of time. In the example below, I would need to know that there 4 k/v pairs and that the second and 4th one does not have quotes. Not ideal, but possibly helpful in some scenarios.
|
We have been unable to get any of these solutions to work Our log uses = for the key value separator and space as the delimiter but there are spaces in the values sometimes. A field also may have other special characters in it like " \ / & and others. Also, the number of fields in a log is variable so I don't think the dissect option would work. Any advice? |
Given you have no "kv": {
"field": "message",
"field_split": "\\s(?![^=]+?(\\s|$))",
"value_split": "=",
"target_field": "log",
"ignore_missing": true,
"strip_brackets": true,
"ignore_failure": true
} The negative lookahead is "any chars not = followed by whitespace or end of line". Test it with your own log lines at regex101.com. |
I'm trying to parse some logs in the
logfmt
format using the KV processor, example log line below:The processor being used is:
Due to the whitespaces in the msg key the logs are being incorrectly split midway through the message, resulting in the
msg
field being"Schedule
There's a similar issue open for the Logstash equivalent plugin here: logstash-plugins/logstash-filter-kv#9
It's my understanding that quoted values as above should be treated and parsed as a single value, and the quotes should then be stripped from the resulting field value.
If this isn't the case it would be good to expose these options, as it makes the kv processor a lot less versatile without.
Cheers,
Mike
The text was updated successfully, but these errors were encountered: