WIP: Support unmapped fields in search 'fields' option #64651

cbuescher · 2020-11-05T14:08:04Z

Currently, the 'fields' option only supports fetching mapped fields. Since
'fields' is meant to be the central place to retrieve document content, it
should allow for loading unmapped values.
This change adds implementation and tests for this addition.

Closes #63690

Opening an early WIP for feedback

Currently, the 'fields' option only supports fetching mapped fields. Since 'fields' is meant to be the central place to retrieve document content, it should allow for loading unmapped values. This change adds implementation and tests for this addition. Closes elastic#63690

cbuescher · 2020-11-05T14:15:03Z

@jtibshirani maybe you can give some quick feedback if this goes into the right direction. I left out any details around enabling fetching the unmapped fields (whether by default or though a flag, or how the API looks like in detail). Also some decisions regarding edge cases I took that we might discuss again:

null values are not retrieved from unmapped _source fields which also seems to be the current behaviour for mapped fields
field values that are not retrieved from mapped fields by the current mechanism but that are present in the _source (e.g. because they are malformed) are not added either by this secondary lookup

I'm still going to look at whether the XContentMapValues#filter functionality that is currently also used for source filtering can be used instead of the implementation here. I'm not sure that it would be more performant but will take a look.

cbuescher · 2020-11-05T15:42:43Z

The failing test above shows another interesting edge case with "flattened" fields. The current mechanism assumes that we cannot retrieve the internals of the "flattened" fields content. With the ability to look up from "source" these can be returned though. See the wildcard test case for details. When searching for "flat*" we return both "flattened" and "flattened.some_field" in this case. Could be seen more as a feature than a bug though.

jtibshirani

@cbuescher this looks like a nice start to me. I added some early thoughts on the implementation.

jtibshirani · 2020-11-05T23:08:53Z

server/src/main/java/org/elasticsearch/search/fetch/subphase/FetchFieldsContext.java


-    public FetchFieldsContext(List<FieldAndFormat> fields) {
+    public FetchFieldsContext(List<FieldAndFormat> fields, boolean includeUnmapped) {


Something to think about: if we end up making this configurable through a flag like include_unmapped, we could introduce a top-level parameter as we do here, or instead support the flag alongside each field pattern (so it'd be part of FieldAndFormat).

jtibshirani · 2020-11-05T23:48:16Z

server/src/main/java/org/elasticsearch/search/fetch/subphase/FieldFetcher.java

        }
+        Function<Map<String, ?>, Map<String, Object>> filter = XContentMapValues.filter(


Re-using some source filtering logic seems to be a good fit. A couple ideas:

Perhaps we shouldn't add the concrete field names as exclusions. The automata used for filtering have limits on the number of states, and I think we could run into these limits if the field patterns resolve to a large number of concrete fields. Here's an example exception from source filtering: No support for max_determinized_states in _source: includes #53739.

A slightly different approach would be to filter and collect the field entries at the same time, instead of doing two passes. This would involve writing a similar method to XContentMapValues#filter, but that flattens and collects the key-value pairs too (maybe something like XContentMapValues#filterAndFlatten?) I'm not sure this will turn out more cleanly, but wanted to mention the idea.

Thanks for the feedback, I'll need to better understand at how XContentMapValues#filter works to see if and how this could work.

jtibshirani · 2020-11-05T23:51:36Z

server/src/main/java/org/elasticsearch/search/fetch/subphase/FieldFetcher.java

+            if (value instanceof Map) {
+                collectAllPaths(currentPath + ".", (Map<String, Object>) value, documentFields);
+            } else {
+                DocumentField f;


I think this might miss lists of objects? Looking at XContentMapValues#filter, there's a special recursive case for lists.

I will take a look but though I have a test for this (essentially array values?)

Ah, do you mean walking e.g. Lists of objects to their leaf values? e.g.

{ "foo" : [ { "f1" : "value1" }, { "f2" : "value2" } ] }

How does the current "fields" lookup work for this, e.g. what does the path look like for the "f1" value= "foo.1.f1" ? Maybe I'm just confused here.

Sorry for the confusion, I read the logic too quickly before. We won't miss anything, but the result structure could be surprising.

For mapped fields, the 'fields' option always flattens arrays of objects. For example, given a document like

"object": [{ "field": "value1"}, {"field": "value2" }]

a request for "fields": ["object.field"] will return

"fields": { "object.field": ["value1", "value2"]}

With this logic, for unmapped fields we'll return

"fields": { "object": [{"field": "value1"}, {"field": "value2"} ]}

Perhaps this behavior is inconsistent, I think we'll still want to flatten arrays of objects (especially if we already flatten objects when they're not in array?) I'm definitely up for discussing this though, it's a question we noted on the original issue.

I understand now and totally agree. I think I ran into this working on improvements on Friday anyway, should be part of my next update here.

cbuescher · 2020-11-09T16:00:12Z

x-pack/plugin/src/test/resources/rest-api-spec/test/flattened/10_basic.yml

@@ -154,5 +154,6 @@

  - match:  { hits.total.value: 1 }
  - length: { hits.hits: 1 }
-  - length: { hits.hits.0.fields: 1 }
+  - length: { hits.hits.0.fields: 2 }


@jtibshirani maybe you have an opinion on this. Seems like we currently see all substructure under a mapped flattened fields as a "leaf" object, regardless of what it contains. With the current implementation of fetching unmapped fields we'd be able to descend into those objects and return them. Or the other way round, if we don't do that, we somehow have to mark the mapped "flattened" nodes in "_source" and stop walking the tree there later when retrieving the unmapped values. Not sure what users would expect here typically, so I went with adapting the expectations in the test for now. Happy to discuss though.

This approach seems okay to me for now -- in some sense the sub-fields of the flattened field are in fact 'unmapped'. However I agree the behavior of flattened fields is generally tricky, I've made a note on the meta issue (#60985) to think through how to best handle flattened data.

jtibshirani · 2020-11-10T01:46:29Z

server/src/main/java/org/elasticsearch/search/fetch/subphase/FieldFetcher.java

        return documentFields;
    }

+    private void collect(Map<String, DocumentField> documentFields, Map<String, Object> source, String parentPath, int lastState) {


This turned out to be a bit complex, I'm not sure my idea to do it all in one pass was a good one :) Maybe we could start by re-using the source filtering logic, optimizing later if we see a performance reason ?

After looking at what the source filtering code does I actually find this solution less complicated. We are doing less work here than in the source filtering code where whole sub-trees of the source can be filtered out an the exclusion logic more complicated. These two methods that are pulling out the data from the source map while keeping track of where they are in the source tree via the inclusion pattern automaton are not recursive but not that long, and they are implementational details that we don't need to expose. On the other hand, using the source filtering code that was written for another purpose initially on the other hand might lead to problems when that code needs to be altered in the future. Having our "own" logic here doesn't seem to much overhead to me tbh.

cbuescher · 2020-11-23T20:09:58Z

@jtibshirani I'm going to close this draft in favour of a cleaned up, squashed version that includes additional docs and some more test at #65386. Hope this is fine by you. Thanks for the reviews so far here.

cbuescher added WIP :Search Foundations/Mapping Index mappings, including merging and defining field types v7.10.1 labels Nov 5, 2020

cbuescher requested a review from jtibshirani November 5, 2020 14:08

Alternative impl using source map filtering

33e1a8b

jtibshirani reviewed Nov 5, 2020

View reviewed changes

cbuescher changed the title ~~Support unmapped fields in search 'fields' option~~ WIP: Support unmapped fields in search 'fields' option Nov 6, 2020

cbuescher self-assigned this Nov 6, 2020

Christoph Büscher added 5 commits November 6, 2020 16:18

Adding bug url

9f39f91

Add other collect implementation

f50bdc2

iter

c94b191

Merge branch 'master' into fix-63690

9d6fd43

Adapt flattend field yml

976fde1

cbuescher commented Nov 9, 2020

View reviewed changes

jtibshirani reviewed Nov 10, 2020

View reviewed changes

Christoph Büscher added 5 commits November 10, 2020 17:45

Add include_unmapped option

822fb02

Merge branch 'master' into fix-63690

9511279

iter flag

7ec4341

fix test

4d74f2d

Make flag optional

4b5232b

cbuescher mentioned this pull request Nov 19, 2020

Support unmapped fields in search 'fields' option. #63690

Closed

cbuescher closed this Nov 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Support unmapped fields in search 'fields' option #64651

WIP: Support unmapped fields in search 'fields' option #64651

cbuescher commented Nov 5, 2020

cbuescher commented Nov 5, 2020

cbuescher commented Nov 5, 2020

jtibshirani left a comment

jtibshirani Nov 5, 2020 •

edited

Loading

jtibshirani Nov 5, 2020 •

edited

Loading

cbuescher Nov 6, 2020

jtibshirani Nov 5, 2020

cbuescher Nov 6, 2020

cbuescher Nov 6, 2020

jtibshirani Nov 6, 2020 •

edited

Loading

cbuescher Nov 9, 2020

cbuescher Nov 9, 2020

jtibshirani Nov 10, 2020

jtibshirani Nov 10, 2020

cbuescher Nov 10, 2020

cbuescher commented Nov 23, 2020


		public FetchFieldsContext(List<FieldAndFormat> fields) {
		public FetchFieldsContext(List<FieldAndFormat> fields, boolean includeUnmapped) {

		}
		Function<Map<String, ?>, Map<String, Object>> filter = XContentMapValues.filter(

WIP: Support unmapped fields in search 'fields' option #64651

WIP: Support unmapped fields in search 'fields' option #64651

Conversation

cbuescher commented Nov 5, 2020

cbuescher commented Nov 5, 2020

cbuescher commented Nov 5, 2020

jtibshirani left a comment

Choose a reason for hiding this comment

jtibshirani Nov 5, 2020 • edited Loading

Choose a reason for hiding this comment

jtibshirani Nov 5, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtibshirani Nov 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cbuescher commented Nov 23, 2020

jtibshirani Nov 5, 2020 •

edited

Loading

jtibshirani Nov 5, 2020 •

edited

Loading

jtibshirani Nov 6, 2020 •

edited

Loading