-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fetch fields support for runtime fields #60775
Add fetch fields support for runtime fields #60775
Conversation
This adds support to the `fields` fetch phase for runtime fields. To do so it reworks the mechanism that the fetch phase uses to "prepare" to fetch fields which is much more compatible with runtime fields. Rather than implement fetching directly by running the script I chose to implement fetching from doc values, which in turn runs the script. This allowed me to reuse the doc values fetching code from the doc values fetching phase which bought me a few tests "automatically".
Pinging @elastic/es-search (:Search/Search) |
if (sourceValue == null) { | ||
return List.of(); | ||
} | ||
public abstract ValueFetcher valueFetcher(SearchLookup lookup, @Nullable String format); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to go "leaf by leaf" here because that is the kind of iterations that you'd need to read doc values runtime fields. Another option would be to make fetch take both the LeafReaderContext
and the docId
and have the doc values and runtime fields based implementations build their own leaf readers. That feels worse to me but it is certainly an option.
DocValueFormat dvFormat = fieldType().docValueFormat(format, null); | ||
IndexFieldData<?> fd = lookup.doc().getForField(fieldType()); | ||
return ctx -> fd.load(ctx).buildFetcher(dvFormat); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to implement fetching runtime fields by pointing them at the doc values. This gave me a convenient excuse to remove most of the instanceof
s in FetchDocValuesPhase
while implementing my feature. As a bonus all of the tests for fetching doc values cover this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see why this was convenient. It also opens up the possibility retrieving doc values in the fetch fields phase (even outside of runtime fields).
I wonder if runtime fields will ever want to make a different choice about how to execute if they're called in fetch vs. aggs? For example if a script refers to a value that's present both in the _source and doc values, we could choose to load from _source just for fetch. I'm just brainstorming, it seems unlikely such an optimization will be important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do imagine a day where we can "fake out" doc values from _source
, but I think we'd do that by building a "funny SearchLookup".
@@ -367,4 +403,105 @@ protected boolean forbidPrivateIndexSettings() { | |||
return true; | |||
} | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't feel like a great place for these test methods. We do have a FieldMapperTestCase
but we don't use it consistently.
modules/parent-join/src/main/java/org/elasticsearch/join/mapper/ParentJoinFieldMapper.java
Outdated
Show resolved
Hide resolved
@@ -912,4 +917,42 @@ private String getRandomWildcardPattern() { | |||
} | |||
return sb.toString(); | |||
} | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one doesn't extend from ESSingleNodeTestCase
! We probably should stop all of them from extending from it, but that is a job for another day.
protected Object parseSourceValue(Object value, String format) { | ||
throw new UnsupportedOperationException(); | ||
public ValueFetcher valueFetcher(SearchLookup lookup, String format) { | ||
return docValuesFetcher(lookup, format); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After all that, this is the entire implementation! Everything else is shared and it all just works by virtue of supporting doc values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't add any unit tests for this because we have unit tests for doc values. I did add a few integration tests just to make sure everything is plugged in.
@elasticmachine update branch |
8968ca3
to
41fa35b
Compare
@jtibshirani I've merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did an initial pass. It's exciting to see this come together! I'm assuming that after a couple rounds of review, some refactors will be pulled into master
for easier maintainability?
x-pack/plugin/src/test/resources/rest-api-spec/test/runtime_fields/10_keyword.yml
Outdated
Show resolved
Hide resolved
x-pack/plugin/src/test/resources/rest-api-spec/test/runtime_fields/30_double.yml
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/RangeFieldMapper.java
Show resolved
Hide resolved
DocValueFormat dvFormat = fieldType().docValueFormat(format, null); | ||
IndexFieldData<?> fd = lookup.doc().getForField(fieldType()); | ||
return ctx -> fd.load(ctx).buildFetcher(dvFormat); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see why this was convenient. It also opens up the possibility retrieving doc values in the fetch fields phase (even outside of runtime fields).
I wonder if runtime fields will ever want to make a different choice about how to execute if they're called in fetch vs. aggs? For example if a script refers to a value that's present both in the _source and doc values, we could choose to load from _source just for fetch. I'm just brainstorming, it seems unlikely such an optimization will be important.
@@ -283,13 +282,15 @@ public void parse(ParseContext context) throws IOException { | |||
protected abstract void parseCreateField(ParseContext context) throws IOException; | |||
|
|||
/** | |||
* Given access to a document's _source, return this field's values. | |||
* Build a {@linkplain ValueFetcher} to fetch values for the fields fetch api. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting worried that FieldMapper
is growing too large and hard to understand. (This is partially my fault for adding _source retrieval logic to it in FieldMapper#lookupValues
!) For me the extensive use of anonymous classes and function references also makes it harder to follow.
Some ideas:
- We could instantiate some of these anonymous classes, even though they're quite simple. For example there could be a concrete
SourceValueFetcher
class. ValueFetcher
could live in its own file. Maybe it could contain static factory methods to create source and doc values fetchers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to move ValueFetcher
to its own file and move the implementations over. After I do that we can see how we feel about it.
List<?> values = fieldMapper.lookupValues(sourceLookup, context.format); | ||
parsedValues.addAll(values); | ||
for (LeafValueFetcher fetcher : context.leafFetchers) { | ||
parsedValues.addAll(fetcher.fetch(hitContext.docId() - hitContext.readerContext().docBase)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that HitContext.docId
is already the leaf reader doc ID? It's passed in as subDocId
:
elasticsearch/server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java
Line 242 in 9059cfb
hitContext.reset(hit, subReaderContext, subDocId, context.searcher()); |
It feels like we should have test coverage for this, maybe
FieldValueRetrieverTests
would be a good place?
server/src/test/java/org/elasticsearch/search/fetch/subphase/FieldValueRetrieverTests.java
Show resolved
Hide resolved
@@ -96,6 +97,11 @@ public void execute(SearchContext context) { | |||
FieldsVisitor fieldsVisitor = createStoredFieldsVisitor(context, storedToRequestedFields); | |||
|
|||
try { | |||
SearchLookup lookup = new SearchLookup(context.mapperService(), context.getQueryShardContext()::getForField); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue here is that the runtime scripts don't have access to the shared _source. It seems like the SearchLookup
that's passed to fielddataBuilder
will always be the one on QueryShardContext
?
It actually seems confusing that SearchLookup
lets you perform getForField
to get field data, but then this method uses a potentially different SearchLookup
to create it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! I'll dig into it.
I'd hoped to land this in the branch and then cherry-pick it over to master, dropping out all of the "in the branch" stuff. |
docMap = new DocLookup(mapperService, fieldDataLookup); | ||
public SearchLookup( | ||
MapperService mapperService, | ||
BiFunction<MappedFieldType, Supplier<SearchLookup>, IndexFieldData<?>> fieldDataLookup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supplier<SearchLookup>
is a little weird here but it is compatible with #60318 and I think it fits well with how QueryShardContext
uses it.
@jtibshirani I pushed some updates for all of the things you mentioned. I did move |
And I seem to have broken some tests. One moment! |
All better now. I hope. |
run elasticsearch-ci/1 |
This adds support to the
fields
fetch phase for runtime fields. To doso it reworks the mechanism that the fetch phase uses to "prepare" to
fetch fields which is much more compatible with runtime fields.
Rather than implement fetching directly by running the script I chose to
implement fetching from doc values, which in turn runs the script. This
allowed me to reuse the doc values fetching code from the doc values
fetching phase which bought me a few tests "automatically".