Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Support unmapped fields in search 'fields' option #64651

Closed
wants to merge 12 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -295,3 +295,69 @@ setup:
- is_true: hits.hits.0._id
- match: { hits.hits.0.fields.count: [2] }
- is_false: hits.hits.0.fields.count_without_dv
---
Test unmapped field:
- skip:
version: ' - 7.99.99'
reason: support isn't yet backported
- do:
indices.create:
index: test
body:
mappings:
dynamic: false
properties:
f1:
type: keyword
f2:
type: object
enabled: false
f3:
type: object
- do:
index:
index: test
id: 1
refresh: true
body:
f1: some text
f2:
a: foo
b: bar
f3:
c: baz
f4: some other text
- do:
search:
index: test
body:
fields:
- f1
- f4
- match:
hits.hits.0.fields.f1:
- some text
- match:
hits.hits.0.fields.f4:
- some other text
- do:
search:
index: test
body:
fields:
- f*
- match:
hits.hits.0.fields.f1:
- some text
- match:
hits.hits.0.fields.f2\.a:
- foo
- match:
hits.hits.0.fields.f2\.b:
- bar
- match:
hits.hits.0.fields.f3\.c:
- baz
- match:
hits.hits.0.fields.f4:
- some other text
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ protected void setupInnerHitsContext(QueryShardContext queryShardContext,
innerHitsContext.docValuesContext(docValuesContext);
}
if (innerHitBuilder.getFetchFields() != null) {
FetchFieldsContext fieldsContext = new FetchFieldsContext(innerHitBuilder.getFetchFields());
FetchFieldsContext fieldsContext = new FetchFieldsContext(innerHitBuilder.getFetchFields(), false);
innerHitsContext.fetchFieldsContext(fieldsContext);
}
if (innerHitBuilder.getScriptFields() != null) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -973,7 +973,8 @@ private void parseSource(DefaultSearchContext context, SearchSourceBuilder sourc
context.docValuesContext(docValuesContext);
}
if (source.fetchFields() != null) {
FetchFieldsContext fetchFieldsContext = new FetchFieldsContext(source.fetchFields());
// TODO make "includeUnmapped configurable?
FetchFieldsContext fetchFieldsContext = new FetchFieldsContext(source.fetchFields(), true);
context.fetchFieldsContext(fetchFieldsContext);
}
if (source.highlighter() != null) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ public Aggregator createInternal(SearchContext searchContext,
subSearchContext.docValuesContext(docValuesContext);
}
if (fetchFields != null) {
FetchFieldsContext fieldsContext = new FetchFieldsContext(fetchFields);
FetchFieldsContext fieldsContext = new FetchFieldsContext(fetchFields, true);
subSearchContext.fetchFieldsContext(fieldsContext);
}
for (ScriptFieldsContext.ScriptField field : scriptFields) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,18 @@
*/
public class FetchFieldsContext {
private final List<FieldAndFormat> fields;
private final boolean includeUnmapped;

public FetchFieldsContext(List<FieldAndFormat> fields) {
public FetchFieldsContext(List<FieldAndFormat> fields, boolean includeUnmapped) {
Copy link
Contributor

@jtibshirani jtibshirani Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to think about: if we end up making this configurable through a flag like include_unmapped, we could introduce a top-level parameter as we do here, or instead support the flag alongside each field pattern (so it'd be part of FieldAndFormat).

this.fields = fields;
this.includeUnmapped = includeUnmapped;
}

public List<FieldAndFormat> fields() {
return fields;
}

public boolean includeUnmapped() {
return includeUnmapped;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,13 @@ public FetchSubPhaseProcessor getProcessor(FetchContext fetchContext) {
"in the mappings for index [" + fetchContext.getIndexName() + "]");
}

FieldFetcher fieldFetcher = FieldFetcher.create(fetchContext.getQueryShardContext(), searchLookup, fetchFieldsContext.fields());
FieldFetcher fieldFetcher = FieldFetcher.create(
fetchContext.getQueryShardContext(),
searchLookup,
fetchFieldsContext.fields(),
fetchFieldsContext.includeUnmapped()
);

return new FetchSubPhaseProcessor() {
@Override
public void setNextReader(LeafReaderContext readerContext) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

import org.apache.lucene.index.LeafReaderContext;
import org.elasticsearch.common.document.DocumentField;
import org.elasticsearch.common.xcontent.support.XContentMapValues;
import org.elasticsearch.index.mapper.MappedFieldType;
import org.elasticsearch.index.mapper.ValueFetcher;
import org.elasticsearch.index.query.QueryShardContext;
Expand All @@ -30,10 +31,12 @@
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.function.Function;

/**
* A helper class to {@link FetchFieldsPhase} that's initialized with a list of field patterns to fetch.
Expand All @@ -42,32 +45,53 @@
public class FieldFetcher {
public static FieldFetcher create(QueryShardContext context,
SearchLookup searchLookup,
Collection<FieldAndFormat> fieldAndFormats) {
Collection<FieldAndFormat> fieldAndFormats,
boolean includeUnmapped) {

List<FieldContext> fieldContexts = new ArrayList<>();
List<String> originalPattern = new ArrayList<>();
List<String> excludeForUnmappedFetch = new ArrayList<>();

for (FieldAndFormat fieldAndFormat : fieldAndFormats) {
String fieldPattern = fieldAndFormat.field;
String format = fieldAndFormat.format;

Collection<String> concreteFields = context.simpleMatchToIndexNames(fieldPattern);
originalPattern.add(fieldAndFormat.field);
for (String field : concreteFields) {
MappedFieldType ft = context.getFieldType(field);
if (ft == null || context.isMetadataField(field)) {
continue;
}
ValueFetcher valueFetcher = ft.valueFetcher(context, searchLookup, format);
excludeForUnmappedFetch.add(field);
fieldContexts.add(new FieldContext(field, valueFetcher));
}
if (fieldPattern.charAt(fieldPattern.length() - 1) != '*') {
// not a prefix pattern, exclude potential sub-fields when fetching unmapped fields
excludeForUnmappedFetch.add(fieldPattern + ".*");
}
}
Function<Map<String, ?>, Map<String, Object>> filter = XContentMapValues.filter(
Copy link
Contributor

@jtibshirani jtibshirani Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-using some source filtering logic seems to be a good fit. A couple ideas:

  • Perhaps we shouldn't add the concrete field names as exclusions. The automata used for filtering have limits on the number of states, and I think we could run into these limits if the field patterns resolve to a large number of concrete fields. Here's an example exception from source filtering: No support for max_determinized_states in _source: includes #53739.
  • A slightly different approach would be to filter and collect the field entries at the same time, instead of doing two passes. This would involve writing a similar method to XContentMapValues#filter, but that flattens and collects the key-value pairs too (maybe something like XContentMapValues#filterAndFlatten?) I'm not sure this will turn out more cleanly, but wanted to mention the idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, I'll need to better understand at how XContentMapValues#filter works to see if and how this could work.

originalPattern.toArray(new String[originalPattern.size()]),
excludeForUnmappedFetch.toArray(new String[excludeForUnmappedFetch.size()])
);

return new FieldFetcher(fieldContexts);
return new FieldFetcher(fieldContexts, includeUnmapped, filter);
}

private final List<FieldContext> fieldContexts;

private FieldFetcher(List<FieldContext> fieldContexts) {
private final boolean includeUnmapped;
private final Function<Map<String, ?>, Map<String, Object>> filter;

private FieldFetcher(
List<FieldContext> fieldContexts,
boolean includeUnmapped,
Function<Map<String, ?>, Map<String, Object>> filter
) {
this.fieldContexts = fieldContexts;
this.includeUnmapped = includeUnmapped;
this.filter = filter;
}

public Map<String, DocumentField> fetch(SourceLookup sourceLookup, Set<String> ignoredFields) throws IOException {
Expand All @@ -85,9 +109,37 @@ public Map<String, DocumentField> fetch(SourceLookup sourceLookup, Set<String> i
documentFields.put(field, new DocumentField(field, parsedValues));
}
}
if (includeUnmapped) {
Map<String, Object> unmappedFieldsToAdd = filter.apply(sourceLookup.loadSourceIfNeeded());
collectLeafValues(unmappedFieldsToAdd, documentFields);
}
return documentFields;
}

static void collectLeafValues(Map<String, Object> map, Map<String, DocumentField> documentFields) {
collectAllPaths("", map, documentFields);
}

private static void collectAllPaths(String prefix, Map<String, Object> source, Map<String, DocumentField> documentFields) {
for (String key : source.keySet()) {
Object value = source.get(key);
String currentPath = prefix + key;
if (value instanceof Map) {
collectAllPaths(currentPath + ".", (Map<String, Object>) value, documentFields);
} else {
DocumentField f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might miss lists of objects? Looking at XContentMapValues#filter, there's a special recursive case for lists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take a look but though I have a test for this (essentially array values?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, do you mean walking e.g. Lists of objects to their leaf values? e.g.

{
    "foo" : [ { "f1" : "value1" }, { "f2" : "value2" } ]
}

How does the current "fields" lookup work for this, e.g. what does the path look like for the "f1" value= "foo.1.f1" ? Maybe I'm just confused here.

Copy link
Contributor

@jtibshirani jtibshirani Nov 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion, I read the logic too quickly before. We won't miss anything, but the result structure could be surprising.

For mapped fields, the 'fields' option always flattens arrays of objects. For example, given a document like

"object": [{ "field": "value1"}, {"field": "value2" }]

a request for "fields": ["object.field"] will return

"fields": { "object.field": ["value1", "value2"]}

With this logic, for unmapped fields we'll return

"fields": { "object": [{"field": "value1"}, {"field": "value2"} ]}

Perhaps this behavior is inconsistent, I think we'll still want to flatten arrays of objects (especially if we already flatten objects when they're not in array?) I'm definitely up for discussing this though, it's a question we noted on the original issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand now and totally agree. I think I ran into this working on improvements on Friday anyway, should be part of my next update here.

if (value instanceof List) {
f = new DocumentField(currentPath, (List) value);
} else {
f = new DocumentField(currentPath, Collections.singletonList(value));
}
if (f.getValue() != null) {
documentFields.put(currentPath, f);
}
}
}
}

public void setNextReader(LeafReaderContext readerContext) {
for (FieldContext field : fieldContexts) {
field.valueFetcher.setNextReader(readerContext);
Expand Down
Loading