Skip to content

Commit

Permalink
Avoid attempting to load the same empty field twice in fetch phase (e…
Browse files Browse the repository at this point in the history
…lastic#107551)

During the fetch phase, there's a number of stored fields that are requested explicitly or loaded by default. That information is included in `StoredFieldsSpec` that each fetch sub phase exposes.

We attempt to provide stored fields that are already loaded to the fields lookup that scripts as well as value fetchers use to load field values (via `SearchLookup`). This is done in `PreloadedFieldLookupProvider.` The current logic makes available values for fields that have been found, so that scripts or value fetchers that request them don't load them again ad-hoc. What happens though for stored fields that don't have a value for a specific doc, is that they are treated like any other field that was not requested, and loaded again, although they will not be found, which causes overhead.

This change makes available to `PreloadedFieldLookupProvider` the list of required stored fields, so that it can better distinguish between fields that we already attempted to load (although we may not have found a value for them) and those that need to be loaded ad-hoc (for instance because a script is requesting them for the first time).

This is an existing issue, that has become evident as we moved fetching of metadata fields to `FetchFieldsPhase`, that relies on value fetchers, and hence on `SearchLookup`. We end up attempting to load default metadata fields (`_ignored` and `_routing`) twice when they are not present in a document, which makes us call `LeafReader#storedFields` additional times for the same document providing a `SingleFieldVisitor` that will never find a value.

Another existing issue that this PR fixes is for the `FetchFieldsPhase` to extend the `StoredFieldsSpec` that it exposes to include the metadata fields that the phase is now responsible for loading. That results in `_ignored` being included in the output of the debug stored fields section when profiling is enabled. The fact that it was previously missing is an existing bug (it was missing in `StoredFieldLoader#fieldsToLoad`).

Yet another existing issues that this PR fixes is that `_id` has been until now always loaded on demand when requested via fetch fields or script. That is because it is not part of the preloaded stored fields that the fetch phase passes over to the `PreloadedFieldLookupProvider`. That causes overhead as the field has already been loaded, and should not be loaded once again when explicitly requested.
  • Loading branch information
javanna authored Apr 17, 2024
1 parent 53d012b commit 223e7f8
Show file tree
Hide file tree
Showing 11 changed files with 141 additions and 19 deletions.
5 changes: 5 additions & 0 deletions docs/changelog/107551.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 107551
summary: Avoid attempting to load the same empty field twice in fetch phase
area: Search
type: bug
issues: []
4 changes: 2 additions & 2 deletions docs/reference/search/profile.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ The API returns the following result:
"load_source_count": 5
},
"debug": {
"stored_fields": ["_id", "_routing", "_source"]
"stored_fields": ["_id", "_ignored", "_routing", "_source"]
},
"children": [
{
Expand Down Expand Up @@ -1051,7 +1051,7 @@ And here is the fetch profile:
"load_source_count": 5
},
"debug": {
"stored_fields": ["_id", "_routing", "_source"]
"stored_fields": ["_id", "_ignored", "_routing", "_source"]
},
"children": [
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ profile fetch:
- gt: { profile.shards.0.fetch.breakdown.next_reader: 0 }
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields_count: 0 }
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields: 0 }
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _routing, _source] }
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _ignored, _routing, _source] }
- length: { profile.shards.0.fetch.children: 4 }
- match: { profile.shards.0.fetch.children.0.type: FetchFieldsPhase }
- gt: { profile.shards.0.fetch.children.0.breakdown.next_reader_count: 0 }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ fetch fields:
- gt: { profile.shards.0.fetch.breakdown.next_reader: 0 }
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields_count: 0 }
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields: 0 }
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _routing, _source] }
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _ignored, _routing, _source] }
- length: { profile.shards.0.fetch.children: 2 }
- match: { profile.shards.0.fetch.children.0.type: FetchFieldsPhase }
- gt: { profile.shards.0.fetch.children.0.breakdown.next_reader_count: 0 }
Expand Down Expand Up @@ -74,7 +74,7 @@ fetch source:
- gt: { profile.shards.0.fetch.breakdown.next_reader: 0 }
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields_count: 0 }
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields: 0 }
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _routing, _source] }
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _ignored, _routing, _source] }
- length: { profile.shards.0.fetch.children: 3 }
- match: { profile.shards.0.fetch.children.0.type: FetchFieldsPhase }
- match: { profile.shards.0.fetch.children.1.type: FetchSourcePhase }
Expand Down Expand Up @@ -139,7 +139,7 @@ fetch nested source:
- gt: { profile.shards.0.fetch.breakdown.next_reader: 0 }
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields_count: 0 }
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields: 0 }
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _routing, _source] }
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _ignored, _routing, _source] }
- length: { profile.shards.0.fetch.children: 4 }
- match: { profile.shards.0.fetch.children.0.type: FetchFieldsPhase }
- match: { profile.shards.0.fetch.children.1.type: FetchSourcePhase }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -168,4 +168,18 @@ public void testInvalid() {
assertThat(exc.getMessage(), equalTo("cannot combine _none_ with other fields"));
}
}

public void testFetchId() {
assertAcked(prepareCreate("test"));
ensureGreen();

prepareIndex("test").setId("1").setSource("field", "value").get();
refresh();

assertResponse(prepareSearch("test").addFetchField("_id"), response -> {
assertEquals(1, response.getHits().getHits().length);
assertEquals("1", response.getHits().getAt(0).getId());
assertEquals("1", response.getHits().getAt(0).field("_id").getValue());
});
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -112,9 +112,13 @@ private SearchHits buildSearchHits(SearchContext context, int[] docIdsToLoad, Pr
context.getSearchExecutionContext().setLookupProviders(sourceProvider, ctx -> fieldLookupProvider);

List<FetchSubPhaseProcessor> processors = getProcessors(context.shardTarget(), fetchContext, profiler);

StoredFieldsSpec storedFieldsSpec = StoredFieldsSpec.build(processors, FetchSubPhaseProcessor::storedFieldsSpec);
storedFieldsSpec = storedFieldsSpec.merge(new StoredFieldsSpec(false, false, sourceLoader.requiredStoredFields()));
// Ideally the required stored fields would be provided as constructor argument a few lines above, but that requires moving
// the getProcessors call to before the setLookupProviders call, which causes weird issues in InnerHitsPhase.
// setLookupProviders resets the SearchLookup used throughout the rest of the fetch phase, which StoredValueFetchers rely on
// to retrieve stored fields, and InnerHitsPhase is the last sub-fetch phase and re-runs the entire fetch phase.
fieldLookupProvider.setPreloadedStoredFieldNames(storedFieldsSpec.requiredStoredFields());

StoredFieldLoader storedFieldLoader = profiler.storedFields(StoredFieldLoader.fromSpec(storedFieldsSpec));
IdLoader idLoader = context.newIdLoader();
Expand Down Expand Up @@ -164,7 +168,7 @@ protected SearchHit nextDoc(int doc) throws IOException {
leafIdLoader
);
sourceProvider.source = hit.source();
fieldLookupProvider.storedFields = hit.loadedFields();
fieldLookupProvider.setPreloadedStoredFieldValues(hit.hit().getId(), hit.loadedFields());
for (FetchSubPhaseProcessor processor : processors) {
processor.process(hit);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,16 @@
package org.elasticsearch.search.fetch;

import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.util.SetOnce;
import org.elasticsearch.index.mapper.IdFieldMapper;
import org.elasticsearch.search.lookup.FieldLookup;
import org.elasticsearch.search.lookup.LeafFieldLookupProvider;

import java.io.IOException;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.function.Supplier;

/**
Expand All @@ -26,15 +30,22 @@
*/
class PreloadedFieldLookupProvider implements LeafFieldLookupProvider {

Map<String, List<Object>> storedFields;
LeafFieldLookupProvider backUpLoader;
Supplier<LeafFieldLookupProvider> loaderSupplier;
private final SetOnce<Set<String>> preloadedStoredFieldNames = new SetOnce<>();
private Map<String, List<Object>> preloadedStoredFieldValues;
private String id;
private LeafFieldLookupProvider backUpLoader;
private Supplier<LeafFieldLookupProvider> loaderSupplier;

@Override
public void populateFieldLookup(FieldLookup fieldLookup, int doc) throws IOException {
String field = fieldLookup.fieldType().name();
if (storedFields.containsKey(field)) {
fieldLookup.setValues(storedFields.get(field));

if (field.equals(IdFieldMapper.NAME)) {
fieldLookup.setValues(Collections.singletonList(id));
return;
}
if (preloadedStoredFieldNames.get().contains(field)) {
fieldLookup.setValues(preloadedStoredFieldValues.get(field));
return;
}
// stored field not preloaded, go and get it directly
Expand All @@ -44,8 +55,26 @@ public void populateFieldLookup(FieldLookup fieldLookup, int doc) throws IOExcep
backUpLoader.populateFieldLookup(fieldLookup, doc);
}

void setPreloadedStoredFieldNames(Set<String> preloadedStoredFieldNames) {
this.preloadedStoredFieldNames.set(preloadedStoredFieldNames);
}

void setPreloadedStoredFieldValues(String id, Map<String, List<Object>> preloadedStoredFieldValues) {
assert preloadedStoredFieldNames.get().containsAll(preloadedStoredFieldValues.keySet())
: "Provided stored field that was not expected to be preloaded? "
+ preloadedStoredFieldValues.keySet()
+ " - "
+ preloadedStoredFieldNames;
this.preloadedStoredFieldValues = preloadedStoredFieldValues;
this.id = id;
}

void setNextReader(LeafReaderContext ctx) {
backUpLoader = null;
loaderSupplier = () -> LeafFieldLookupProvider.fromStoredFields().apply(ctx);
}

LeafFieldLookupProvider getBackUpLoader() {
return backUpLoader;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ public final class FetchFieldsPhase implements FetchSubPhase {
private static final List<FieldAndFormat> DEFAULT_METADATA_FIELDS = List.of(
new FieldAndFormat(IgnoredFieldMapper.NAME, null),
new FieldAndFormat(RoutingFieldMapper.NAME, null),
// will only be fetched when mapped (older archived indices)
new FieldAndFormat(LegacyTypeFieldMapper.NAME, null)
);

Expand Down Expand Up @@ -95,9 +96,9 @@ public void setNextReader(LeafReaderContext readerContext) {
@Override
public StoredFieldsSpec storedFieldsSpec() {
if (fieldFetcher != null) {
return fieldFetcher.storedFieldsSpec();
return metadataFieldFetcher.storedFieldsSpec().merge(fieldFetcher.storedFieldsSpec());
}
return StoredFieldsSpec.NO_REQUIREMENTS;
return metadataFieldFetcher.storedFieldsSpec();
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,9 @@ public MappedFieldType fieldType() {
*/
public void setValues(List<Object> values) {
assert valuesLoaded == false : "Call clear() before calling setValues()";
values.stream().map(fieldType::valueForDisplay).forEach(this.values::add);
if (values != null) {
values.stream().map(fieldType::valueForDisplay).forEach(this.values::add);
}
this.valuesLoaded = true;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,13 @@
import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.index.memory.MemoryIndex;
import org.apache.lucene.util.BytesRef;
import org.elasticsearch.index.mapper.IdFieldMapper;
import org.elasticsearch.index.mapper.MappedFieldType;
import org.elasticsearch.search.lookup.FieldLookup;
import org.elasticsearch.test.ESTestCase;

import java.io.IOException;
import java.util.Collections;
import java.util.List;
import java.util.Locale;
import java.util.Map;
Expand All @@ -30,7 +32,16 @@ public class PreloadedFieldLookupProviderTests extends ESTestCase {

public void testFallback() throws IOException {
PreloadedFieldLookupProvider lookup = new PreloadedFieldLookupProvider();
lookup.storedFields = Map.of("foo", List.of("bar"));
lookup.setPreloadedStoredFieldNames(Collections.singleton("foo"));
lookup.setPreloadedStoredFieldValues("id", Map.of("foo", List.of("bar")));

MappedFieldType idFieldType = mock(MappedFieldType.class);
when(idFieldType.name()).thenReturn(IdFieldMapper.NAME);
when(idFieldType.valueForDisplay(any())).then(invocation -> (invocation.getArguments()[0]));
FieldLookup idFieldLookup = new FieldLookup(idFieldType);
lookup.populateFieldLookup(idFieldLookup, 0);
assertEquals("id", idFieldLookup.getValue());
assertNull(lookup.getBackUpLoader()); // fallback didn't get used because 'foo' is in the list

MappedFieldType fieldType = mock(MappedFieldType.class);
when(fieldType.name()).thenReturn("foo");
Expand All @@ -39,7 +50,7 @@ public void testFallback() throws IOException {

lookup.populateFieldLookup(fieldLookup, 0);
assertEquals("BAR", fieldLookup.getValue());
assertNull(lookup.backUpLoader); // fallback didn't get used because 'foo' is in the list
assertNull(lookup.getBackUpLoader()); // fallback didn't get used because 'foo' is in the list

MappedFieldType unloadedFieldType = mock(MappedFieldType.class);
when(unloadedFieldType.name()).thenReturn("unloaded");
Expand All @@ -56,5 +67,4 @@ public void testFallback() throws IOException {
lookup.populateFieldLookup(unloadedFieldLookup, 0);
assertEquals("VALUE", unloadedFieldLookup.getValue());
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,17 @@
import org.elasticsearch.index.mapper.DocValueFetcher;
import org.elasticsearch.index.mapper.MappedFieldType;
import org.elasticsearch.index.mapper.NestedLookup;
import org.elasticsearch.index.mapper.StoredValueFetcher;
import org.elasticsearch.index.query.SearchExecutionContext;
import org.elasticsearch.search.DocValueFormat;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.aggregations.support.CoreValuesSourceType;
import org.elasticsearch.search.fetch.FetchContext;
import org.elasticsearch.search.fetch.FetchSubPhase;
import org.elasticsearch.search.fetch.FetchSubPhaseProcessor;
import org.elasticsearch.search.fetch.StoredFieldsContext;
import org.elasticsearch.search.fetch.StoredFieldsSpec;
import org.elasticsearch.search.lookup.SearchLookup;
import org.elasticsearch.search.lookup.Source;
import org.elasticsearch.test.ESTestCase;

Expand All @@ -35,6 +39,7 @@
import java.util.Set;

import static org.mockito.ArgumentMatchers.any;
import static org.mockito.ArgumentMatchers.eq;
import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.when;

Expand Down Expand Up @@ -90,6 +95,58 @@ public void testDocValueFetcher() throws IOException {

reader.close();
dir.close();
}

public void testStoredFieldsSpec() {
StoredFieldsContext storedFieldsContext = StoredFieldsContext.fromList(List.of("stored", "_metadata"));
FetchFieldsContext ffc = new FetchFieldsContext(List.of(new FieldAndFormat("field", null)));

SearchLookup searchLookup = mock(SearchLookup.class);

SearchExecutionContext sec = mock(SearchExecutionContext.class);
when(sec.isMetadataField(any())).then(invocation -> invocation.getArguments()[0].toString().startsWith("_"));

MappedFieldType routingFt = mock(MappedFieldType.class);
when(routingFt.valueFetcher(any(), any())).thenReturn(new StoredValueFetcher(searchLookup, "_routing"));
when(sec.getFieldType(eq("_routing"))).thenReturn(routingFt);

// this would normally not be mapped -> getMatchingFieldsNames would not resolve it (unless for older archive indices)
MappedFieldType typeFt = mock(MappedFieldType.class);
when(typeFt.valueFetcher(any(), any())).thenReturn(new StoredValueFetcher(searchLookup, "_type"));
when(sec.getFieldType(eq("_type"))).thenReturn(typeFt);

MappedFieldType ignoredFt = mock(MappedFieldType.class);
when(ignoredFt.valueFetcher(any(), any())).thenReturn(new StoredValueFetcher(searchLookup, "_ignored"));
when(sec.getFieldType(eq("_ignored"))).thenReturn(ignoredFt);

// Ideally we would test that explicitly requested stored fields are included in stored fields spec, but isStored is final hence it
// can't be mocked. In reality, _metadata would be included but stored would not.
MappedFieldType storedFt = mock(MappedFieldType.class);
when(sec.getFieldType(eq("stored"))).thenReturn(storedFt);
MappedFieldType metadataFt = mock(MappedFieldType.class);
when(sec.getFieldType(eq("_metadata"))).thenReturn(metadataFt);

MappedFieldType fieldType = mock(MappedFieldType.class);
when(fieldType.valueFetcher(any(), any())).thenReturn(
new DocValueFetcher(
DocValueFormat.RAW,
new SortedNumericIndexFieldData("field", IndexNumericFieldData.NumericType.LONG, CoreValuesSourceType.NUMERIC, null, false)
)
);
when(sec.getFieldType(eq("field"))).thenReturn(fieldType);

when(sec.getMatchingFieldNames(any())).then(invocation -> Set.of(invocation.getArguments()[0]));
when(sec.nestedLookup()).thenReturn(NestedLookup.EMPTY);
FetchContext fetchContext = mock(FetchContext.class);
when(fetchContext.fetchFieldsContext()).thenReturn(ffc);
when(fetchContext.storedFieldsContext()).thenReturn(storedFieldsContext);
when(fetchContext.getSearchExecutionContext()).thenReturn(sec);
FetchFieldsPhase fetchFieldsPhase = new FetchFieldsPhase();
FetchSubPhaseProcessor processor = fetchFieldsPhase.getProcessor(fetchContext);
StoredFieldsSpec storedFieldsSpec = processor.storedFieldsSpec();
assertEquals(3, storedFieldsSpec.requiredStoredFields().size());
assertTrue(storedFieldsSpec.requiredStoredFields().contains("_routing"));
assertTrue(storedFieldsSpec.requiredStoredFields().contains("_ignored"));
assertTrue(storedFieldsSpec.requiredStoredFields().contains("_type"));
}
}

0 comments on commit 223e7f8

Please sign in to comment.