-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
include_fields processor does not work with dotted keys #5916
Conversation
I would have preferred "modifying jolokia fetcher to convert dot keys to objects" if it weren't for the data being problematic as you have shown. We could write separate functions to Get/Put that work on keys containing dots. The For example, with
|
I did some testing on the ES side an the following even with dots will no index in newer versions of Elasticsearch:
The reason is that Elasticsearch does the "dedotting" and converstion to an object on its side. That means if someone wants to index the above it is required on our side to replace |
Here is a more general issue to discuss this: #5942 |
I opened the PR here which dedots the keys and replaces them with |
@andrewkroh I'm undecided on what we should do with this PR. The trade off is that the initial document and the created document are not 100% the same as |
This should prevent name collisions and allow filtering on these fields. The feature is enabled by default and cannot be disable as otherwise ingestion could stop as Elasticsearch is returning an error. Currently not config option is provided as this should not be disabled. * Introduces DeDot function in common package * Adds tests to jmx for new functionality * Use new function also for docker labels Related to elastic#5942 Closes elastic#5916
This should prevent name collisions and allow filtering on these fields. The feature is enabled by default and cannot be disable as otherwise ingestion could stop as Elasticsearch is returning an error. Currently not config option is provided as this should not be disabled. * Introduces DeDot function in common package * Adds tests to jmx for new functionality * Use new function also for docker labels Related to #5942 Closes #5916
Reopening this PR as I think the automatically closing was unintentional. |
Related: #2921 |
Maybe should have a way to escape the dots? Or could offer a way to specify keys similar to Logstash like |
2991124
to
8099af3
Compare
Also looking at #2921 the solution in this PR would work for these cases. The only issue with this solution is that the document looks different before and after the processing. But as from a ES perspective If we go with the above change, dot notation can be used in all processors as the |
8099af3
to
cbf4dcb
Compare
The include_fields processor does not work in case the key contains a dot. See tests added here for this case. There are different solutions here: * Modify `GetValue` in common.MapStr to also look for keys with dots inside instead of directly walking the tree * Modifying jolokia fetcher to convert dot keys to objects The first change would be possible with adding ``` --- a/libbeat/common/mapstr.go +++ b/libbeat/common/mapstr.go @@ -296,6 +296,10 @@ func walkMap(key string, data MapStr, op mapStrOperation) (interface{}, error) { var err error keyParts := strings.Split(key, ".") + if v, exists := data[key]; exists { + return v, nil + } + ``` But that does not fully solve the problem. As in the processor the following code is used, the fields which are added after filtering to the filtered fields, will have a nesting without the docs (see Put method) ``` for _, field := range f.Fields { v, err := event.GetValue(field) if err == nil { _, err = filtered.Put(field, v) } // Ignore ErrKeyNotFound errors if err != nil && errors.Cause(err) != common.ErrKeyNotFound { errs = append(errs, err.Error()) } } ``` This Put would also cause other issues as we see in the event below. If it's convert to an object `classes` and the namespaces `classes.loaded` would conflict because one is an object and the other one isn't. I would have expected that this already causes an issue on the Elasticsearch side before filtering. ``` "jolokia":{ "metrics":{ "Metrics":{ "atomikos.nbTransactions":0.000000, "classes":18857.000000, "classes.loaded":19127.000000, "classes.unloaded":270.000000, "counter.servo.discoveryclient-httpclient_createnew":13542.000000, "counter.servo.discoveryclient-httpclient_delete":13534.000000, "uptime":406037194.000000 }, "java":{ "Threading":{ "DaemonThreadCount":37.000000, "ThreadCount":57.000000 } } } } ``` An alternative solution to the above is replacing the `.` in the names with `_` as we do for example for docker labels and other places with a similar issue. The downside of this is that filtering on fields is not straight forward as the names change between fetching from jolokia and in ES. See https://discuss.elastic.co/t/include-fields-problem/109809
cbf4dcb
to
dbc4763
Compare
@andrewkroh Working on #6292 I realised this feature is also needed there. Ok to get this change in? |
Benchmark old to new walkMap Recursive WalkMap ``` BenchmarkWalkMap/Get-4 500000 3032 ns/op BenchmarkWalkMap/Put-4 300000 4131 ns/op BenchmarkWalkMap/HasKey-4 300000 4080 ns/op BenchmarkWalkMap/Delete-4 300000 4353 ns/op ``` Non recursive WalkMap ``` BenchmarkWalkMap/Get-4 500000 2890 ns/op BenchmarkWalkMap/Put-4 300000 4426 ns/op BenchmarkWalkMap/HasKey-4 300000 4025 ns/op BenchmarkWalkMap/Delete-4 300000 4568 ns/op ``` I did multiple runs and there was no clear winner. They seem to be very similar in execution time.
I'm closing this in favor of #6406 |
Currently if a dotted key is passed to walkmap it is split up in it's part and the map is traversed for each part. In case an event has a dotted key inside walkMap does not return the expected value. For the Event `{"a.b": 1}` the method `GetValue("a.b")` has previously returned `ErrKeyNotExists` but now returns 0. With this change all processors which rely on Get/Put functionalities of MapStr can now also work with events containing dotted keys. **Changes** * The walkMap function was rewritten to a recursive function. This made adding the new functionality easier. * Add tests for GetValue to make possible usages more clear * Add Benchmark tests for walkMap use cases **Limitations** When using `walkMap` with a Key, either the full key has to be dotted or the last part. `GetValue("a.b.c")` on an event `{"a.b":{"c":17}}` does not work as only the first part of the key or the full key are checked against the map. Benchmark old to new walkMap: ``` go test -bench=BenchmarkWalkMap -run=BenchmarkWalkMap ``` Recursive: ``` BenchmarkWalkMap/Get-4 1000000 1274 ns/op BenchmarkWalkMap/Put-4 1000000 1277 ns/op BenchmarkWalkMap/HasKey-4 2000000 880 ns/op BenchmarkWalkMap/HasKeyFirst-4 20000000 111 ns/op BenchmarkWalkMap/Delete-4 1000000 1313 ns/op ``` Iterative: ``` BenchmarkWalkMap/Get-4 500000 2608 ns/op BenchmarkWalkMap/Put-4 1000000 1140 ns/op BenchmarkWalkMap/HasKey-4 3000000 411 ns/op BenchmarkWalkMap/HasKeyFirst-4 20000000 85.9 ns/op BenchmarkWalkMap/Delete-4 1000000 1008 ns/op ``` The benchmark show a difference in the code before and after. The `GetValue` function has become quite a bit more efficient. The `HasKey` is now less efficient. I still need to investigate in detail where the differences come from but the assumption is it comes from the additional functionality. This change was initially triggered by elastic#5916 but is split out into a separate PR because it has usage in multiple places.
Currently if a dotted key is passed to walkmap it is split up in it's part and the map is traversed for each part. In case an event has a dotted key inside walkMap does not return the expected value. For the Event `{"a.b": 1}` the method `GetValue("a.b")` has previously returned `ErrKeyNotExists` but now returns 0. With this change all processors which rely on Get/Put functionalities of MapStr can now also work with events containing dotted keys. **Changes** * The walkMap function was rewritten to a recursive function. This made adding the new functionality easier. * Add tests for GetValue to make possible usages more clear * Add Benchmark tests for walkMap use cases **Limitations** When using `walkMap` with a Key, either the full key has to be dotted or the last part. `GetValue("a.b.c")` on an event `{"a.b":{"c":17}}` does not work as only the first part of the key or the full key are checked against the map. Benchmark old to new walkMap: ``` go test -bench=BenchmarkWalkMap -run=BenchmarkWalkMap ``` Recursive ``` BenchmarkWalkMap/Get-4 1000000 1334 ns/op 432 B/op 7 allocs/op BenchmarkWalkMap/Put-4 1000000 1273 ns/op 1120 B/op 10 allocs/op BenchmarkWalkMap/HasKey-4 2000000 690 ns/op 160 B/op 5 allocs/op BenchmarkWalkMap/HasKeyFirst-4 50000000 28.8 ns/op 0 B/op 0 allocs/op BenchmarkWalkMap/Delete-4 1000000 1431 ns/op 1120 B/op 10 allocs/op ``` Iterative ``` BenchmarkWalkMap/Get-4 500000 2577 ns/op 824 B/op 13 allocs/op BenchmarkWalkMap/Put-4 1000000 1052 ns/op 1072 B/op 8 allocs/op BenchmarkWalkMap/HasKey-4 5000000 381 ns/op 96 B/op 2 allocs/op BenchmarkWalkMap/HasKeyFirst-4 20000000 89.6 ns/op 16 B/op 1 allocs/op BenchmarkWalkMap/Delete-4 1000000 1083 ns/op 1072 B/op 8 allocs/op ``` The benchmark show a difference in the code before and after. The `GetValue` function has become quite a bit more efficient. The `HasKey` is now less efficient. I still need to investigate in detail where the differences come from but the assumption is it comes from the additional functionality. This change was initially triggered by #5916 but is split out into a separate PR because it has usage in multiple places.
The include_fields processor does not work in case the key contains a dot. See tests added here for this case.
There are different solutions here:
GetValue
in common.MapStr to also look for keys with dots inside instead of directly walking the treeThe first change would be possible with adding
But that does not fully solve the problem. As in the processor the following code is used, the fields which are added after filtering to the filtered fields, will have a nesting without the docs (see Put method)
This Put would also cause other issues as we see in the event below. If it's convert to an object
classes
and the namespacesclasses.loaded
would conflict because one is an object and the other one isn't. I would have expected that this already causes an issue on the Elasticsearch side before filtering.An alternative solution to the above is replacing the
.
in the names with_
as we do for example for docker labels and other places with a similar issue. The downside of this is that filtering on fields is not straight forward as the names change between fetching from jolokia and in ES.See https://discuss.elastic.co/t/include-fields-problem/109809