Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fetch fields support for runtime fields #60775

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Aug 5, 2020

This adds support to the fields fetch phase for runtime fields. To do
so it reworks the mechanism that the fetch phase uses to "prepare" to
fetch fields which is much more compatible with runtime fields.

Rather than implement fetching directly by running the script I chose to
implement fetching from doc values, which in turn runs the script. This
allowed me to reuse the doc values fetching code from the doc values
fetching phase which bought me a few tests "automatically".

nik9000 added 4 commits August 5, 2020 13:31
This adds support to the `fields` fetch phase for runtime fields. To do
so it reworks the mechanism that the fetch phase uses to "prepare" to
fetch fields which is much more compatible with runtime fields.

Rather than implement fetching directly by running the script I chose to
implement fetching from doc values, which in turn runs the script. This
allowed me to reuse the doc values fetching code from the doc values
fetching phase which bought me a few tests "automatically".
@nik9000 nik9000 marked this pull request as ready for review August 5, 2020 21:43
@nik9000 nik9000 requested a review from jtibshirani August 5, 2020 21:43
@nik9000 nik9000 added the :Search/Search Search-related issues that do not fall into other categories label Aug 5, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Aug 5, 2020
if (sourceValue == null) {
return List.of();
}
public abstract ValueFetcher valueFetcher(SearchLookup lookup, @Nullable String format);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to go "leaf by leaf" here because that is the kind of iterations that you'd need to read doc values runtime fields. Another option would be to make fetch take both the LeafReaderContext and the docId and have the doc values and runtime fields based implementations build their own leaf readers. That feels worse to me but it is certainly an option.

DocValueFormat dvFormat = fieldType().docValueFormat(format, null);
IndexFieldData<?> fd = lookup.doc().getForField(fieldType());
return ctx -> fd.load(ctx).buildFetcher(dvFormat);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to implement fetching runtime fields by pointing them at the doc values. This gave me a convenient excuse to remove most of the instanceofs in FetchDocValuesPhase while implementing my feature. As a bonus all of the tests for fetching doc values cover this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why this was convenient. It also opens up the possibility retrieving doc values in the fetch fields phase (even outside of runtime fields).

I wonder if runtime fields will ever want to make a different choice about how to execute if they're called in fetch vs. aggs? For example if a script refers to a value that's present both in the _source and doc values, we could choose to load from _source just for fetch. I'm just brainstorming, it seems unlikely such an optimization will be important.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do imagine a day where we can "fake out" doc values from _source, but I think we'd do that by building a "funny SearchLookup".

@@ -367,4 +403,105 @@ protected boolean forbidPrivateIndexSettings() {
return true;
}

/**
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel like a great place for these test methods. We do have a FieldMapperTestCase but we don't use it consistently.

@@ -912,4 +917,42 @@ private String getRandomWildcardPattern() {
}
return sb.toString();
}

/**
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one doesn't extend from ESSingleNodeTestCase! We probably should stop all of them from extending from it, but that is a job for another day.

protected Object parseSourceValue(Object value, String format) {
throw new UnsupportedOperationException();
public ValueFetcher valueFetcher(SearchLookup lookup, String format) {
return docValuesFetcher(lookup, format);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After all that, this is the entire implementation! Everything else is shared and it all just works by virtue of supporting doc values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add any unit tests for this because we have unit tests for doc values. I did add a few integration tests just to make sure everything is plugged in.

@nik9000
Copy link
Member Author

nik9000 commented Aug 6, 2020

@elasticmachine update branch

@javanna javanna mentioned this pull request Aug 6, 2020
30 tasks
@nik9000 nik9000 force-pushed the runtime_fields_fetch_fields branch from 8968ca3 to 41fa35b Compare August 11, 2020 16:57
@nik9000
Copy link
Member Author

nik9000 commented Aug 11, 2020

@jtibshirani I've merged feature/runtime_fields and dropped the skip for the fields fetch tests.

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an initial pass. It's exciting to see this come together! I'm assuming that after a couple rounds of review, some refactors will be pulled into master for easier maintainability?

DocValueFormat dvFormat = fieldType().docValueFormat(format, null);
IndexFieldData<?> fd = lookup.doc().getForField(fieldType());
return ctx -> fd.load(ctx).buildFetcher(dvFormat);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why this was convenient. It also opens up the possibility retrieving doc values in the fetch fields phase (even outside of runtime fields).

I wonder if runtime fields will ever want to make a different choice about how to execute if they're called in fetch vs. aggs? For example if a script refers to a value that's present both in the _source and doc values, we could choose to load from _source just for fetch. I'm just brainstorming, it seems unlikely such an optimization will be important.

@@ -283,13 +282,15 @@ public void parse(ParseContext context) throws IOException {
protected abstract void parseCreateField(ParseContext context) throws IOException;

/**
* Given access to a document's _source, return this field's values.
* Build a {@linkplain ValueFetcher} to fetch values for the fields fetch api.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting worried that FieldMapper is growing too large and hard to understand. (This is partially my fault for adding _source retrieval logic to it in FieldMapper#lookupValues!) For me the extensive use of anonymous classes and function references also makes it harder to follow.

Some ideas:

  • We could instantiate some of these anonymous classes, even though they're quite simple. For example there could be a concrete SourceValueFetcher class.
  • ValueFetcher could live in its own file. Maybe it could contain static factory methods to create source and doc values fetchers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to move ValueFetcher to its own file and move the implementations over. After I do that we can see how we feel about it.

List<?> values = fieldMapper.lookupValues(sourceLookup, context.format);
parsedValues.addAll(values);
for (LeafValueFetcher fetcher : context.leafFetchers) {
parsedValues.addAll(fetcher.fetch(hitContext.docId() - hitContext.readerContext().docBase));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that HitContext.docId is already the leaf reader doc ID? It's passed in as subDocId:

hitContext.reset(hit, subReaderContext, subDocId, context.searcher());

It feels like we should have test coverage for this, maybe FieldValueRetrieverTests would be a good place?

@@ -96,6 +97,11 @@ public void execute(SearchContext context) {
FieldsVisitor fieldsVisitor = createStoredFieldsVisitor(context, storedToRequestedFields);

try {
SearchLookup lookup = new SearchLookup(context.mapperService(), context.getQueryShardContext()::getForField);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue here is that the runtime scripts don't have access to the shared _source. It seems like the SearchLookup that's passed to fielddataBuilder will always be the one on QueryShardContext?

It actually seems confusing that SearchLookup lets you perform getForField to get field data, but then this method uses a potentially different SearchLookup to create it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! I'll dig into it.

@nik9000
Copy link
Member Author

nik9000 commented Aug 12, 2020

I did an initial pass. It's exciting to see this come together! I'm assuming that after a couple rounds of review, some refactors will be pulled into master for easier maintainability?

I'd hoped to land this in the branch and then cherry-pick it over to master, dropping out all of the "in the branch" stuff.

docMap = new DocLookup(mapperService, fieldDataLookup);
public SearchLookup(
MapperService mapperService,
BiFunction<MappedFieldType, Supplier<SearchLookup>, IndexFieldData<?>> fieldDataLookup
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supplier<SearchLookup> is a little weird here but it is compatible with #60318 and I think it fits well with how QueryShardContext uses it.

@nik9000 nik9000 requested a review from jtibshirani August 12, 2020 17:18
@nik9000
Copy link
Member Author

nik9000 commented Aug 12, 2020

@jtibshirani I pushed some updates for all of the things you mentioned. I did move ValueFetcher but didn't make concrete subclasses for it. Please have another look and let me know if you think I should. I'm pretty comfortable with the function parameters but there are certainly other ways to do it. I might just be stuck in a funny mindset.

@nik9000
Copy link
Member Author

nik9000 commented Aug 12, 2020

And I seem to have broken some tests. One moment!

@nik9000
Copy link
Member Author

nik9000 commented Aug 12, 2020

And I seem to have broken some tests. One moment!

All better now. I hope.

@nik9000
Copy link
Member Author

nik9000 commented Aug 12, 2020

run elasticsearch-ci/1

@jtibshirani
Copy link
Contributor

@nik9000 and I caught up through a separate channel -- we're going to hold off on moving this PR forward until I've addressed #61033, since that may affect the FieldMapper API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants