Add fetch fields support for runtime fields #60775

nik9000 · 2020-08-05T17:34:01Z

This adds support to the fields fetch phase for runtime fields. To do
so it reworks the mechanism that the fetch phase uses to "prepare" to
fetch fields which is much more compatible with runtime fields.

Rather than implement fetching directly by running the script I chose to
implement fetching from doc values, which in turn runs the script. This
allowed me to reuse the doc values fetching code from the doc values
fetching phase which bought me a few tests "automatically".

This adds support to the `fields` fetch phase for runtime fields. To do so it reworks the mechanism that the fetch phase uses to "prepare" to fetch fields which is much more compatible with runtime fields. Rather than implement fetching directly by running the script I chose to implement fetching from doc values, which in turn runs the script. This allowed me to reuse the doc values fetching code from the doc values fetching phase which bought me a few tests "automatically".

elasticmachine · 2020-08-05T21:43:14Z

Pinging @elastic/es-search (:Search/Search)

nik9000 · 2020-08-05T23:54:42Z

server/src/main/java/org/elasticsearch/index/mapper/FieldMapper.java

-        if (sourceValue == null) {
-            return List.of();
-        }
+    public abstract ValueFetcher valueFetcher(SearchLookup lookup, @Nullable String format);


I decided to go "leaf by leaf" here because that is the kind of iterations that you'd need to read doc values runtime fields. Another option would be to make fetch take both the LeafReaderContext and the docId and have the doc values and runtime fields based implementations build their own leaf readers. That feels worse to me but it is certainly an option.

nik9000 · 2020-08-05T23:56:44Z

server/src/main/java/org/elasticsearch/index/mapper/FieldMapper.java

+        DocValueFormat dvFormat = fieldType().docValueFormat(format, null);
+        IndexFieldData<?> fd = lookup.doc().getForField(fieldType());
+        return ctx -> fd.load(ctx).buildFetcher(dvFormat);
+    }


I decided to implement fetching runtime fields by pointing them at the doc values. This gave me a convenient excuse to remove most of the instanceofs in FetchDocValuesPhase while implementing my feature. As a bonus all of the tests for fetching doc values cover this.

I can see why this was convenient. It also opens up the possibility retrieving doc values in the fetch fields phase (even outside of runtime fields).

I wonder if runtime fields will ever want to make a different choice about how to execute if they're called in fetch vs. aggs? For example if a script refers to a value that's present both in the _source and doc values, we could choose to load from _source just for fetch. I'm just brainstorming, it seems unlikely such an optimization will be important.

I do imagine a day where we can "fake out" doc values from _source, but I think we'd do that by building a "funny SearchLookup".

nik9000 · 2020-08-05T23:57:42Z

test/framework/src/main/java/org/elasticsearch/test/ESSingleNodeTestCase.java

@@ -367,4 +403,105 @@ protected boolean forbidPrivateIndexSettings() {
        return true;
    }

+    /**


This doesn't feel like a great place for these test methods. We do have a FieldMapperTestCase but we don't use it consistently.

modules/parent-join/src/main/java/org/elasticsearch/join/mapper/ParentJoinFieldMapper.java

nik9000 · 2020-08-06T00:01:32Z

...wildcard/src/test/java/org/elasticsearch/xpack/wildcard/mapper/WildcardFieldMapperTests.java

@@ -912,4 +917,42 @@ private String getRandomWildcardPattern() {
        }
        return sb.toString();
    }
+
+    /**


This one doesn't extend from ESSingleNodeTestCase! We probably should stop all of them from extending from it, but that is a job for another day.

nik9000 · 2020-08-06T00:02:43Z

...lds/src/main/java/org/elasticsearch/xpack/runtimefields/mapper/RuntimeScriptFieldMapper.java

-    protected Object parseSourceValue(Object value, String format) {
-        throw new UnsupportedOperationException();
+    public ValueFetcher valueFetcher(SearchLookup lookup, String format) {
+        return docValuesFetcher(lookup, format);
    }


After all that, this is the entire implementation! Everything else is shared and it all just works by virtue of supporting doc values.

I didn't add any unit tests for this because we have unit tests for doc values. I did add a few integration tests just to make sure everything is plugged in.

nik9000 · 2020-08-06T18:29:06Z

@elasticmachine update branch

nik9000 · 2020-08-11T16:58:22Z

@jtibshirani I've merged feature/runtime_fields and dropped the skip for the fields fetch tests.

jtibshirani

I did an initial pass. It's exciting to see this come together! I'm assuming that after a couple rounds of review, some refactors will be pulled into master for easier maintainability?

x-pack/plugin/src/test/resources/rest-api-spec/test/runtime_fields/10_keyword.yml

x-pack/plugin/src/test/resources/rest-api-spec/test/runtime_fields/30_double.yml

server/src/main/java/org/elasticsearch/index/mapper/RangeFieldMapper.java

jtibshirani · 2020-08-11T19:21:15Z

server/src/main/java/org/elasticsearch/index/mapper/FieldMapper.java

+        DocValueFormat dvFormat = fieldType().docValueFormat(format, null);
+        IndexFieldData<?> fd = lookup.doc().getForField(fieldType());
+        return ctx -> fd.load(ctx).buildFetcher(dvFormat);
+    }


I can see why this was convenient. It also opens up the possibility retrieving doc values in the fetch fields phase (even outside of runtime fields).

I wonder if runtime fields will ever want to make a different choice about how to execute if they're called in fetch vs. aggs? For example if a script refers to a value that's present both in the _source and doc values, we could choose to load from _source just for fetch. I'm just brainstorming, it seems unlikely such an optimization will be important.

jtibshirani · 2020-08-11T19:29:10Z

server/src/main/java/org/elasticsearch/index/mapper/FieldMapper.java

@@ -283,13 +282,15 @@ public void parse(ParseContext context) throws IOException {
    protected abstract void parseCreateField(ParseContext context) throws IOException;

    /**
-     * Given access to a document's _source, return this field's values.
+     * Build a {@linkplain ValueFetcher} to fetch values for the fields fetch api.


I'm getting worried that FieldMapper is growing too large and hard to understand. (This is partially my fault for adding _source retrieval logic to it in FieldMapper#lookupValues!) For me the extensive use of anonymous classes and function references also makes it harder to follow.

Some ideas:

We could instantiate some of these anonymous classes, even though they're quite simple. For example there could be a concrete SourceValueFetcher class.

ValueFetcher could live in its own file. Maybe it could contain static factory methods to create source and doc values fetchers?

I'd be happy to move ValueFetcher to its own file and move the implementations over. After I do that we can see how we feel about it.

jtibshirani · 2020-08-11T21:17:05Z

server/src/main/java/org/elasticsearch/search/fetch/subphase/FieldValueRetriever.java

-                List<?> values = fieldMapper.lookupValues(sourceLookup, context.format);
-                parsedValues.addAll(values);
+            for (LeafValueFetcher fetcher : context.leafFetchers) {
+                parsedValues.addAll(fetcher.fetch(hitContext.docId() - hitContext.readerContext().docBase));


I think that HitContext.docId is already the leaf reader doc ID? It's passed in as subDocId:

elasticsearch/server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java

Line 242 in 9059cfb

hitContext.reset(hit, subReaderContext, subDocId, context.searcher());

It feels like we should have test coverage for this, maybe FieldValueRetrieverTests would be a good place?

server/src/main/java/org/elasticsearch/search/lookup/SourceLookup.java

server/src/test/java/org/elasticsearch/search/fetch/subphase/FieldValueRetrieverTests.java

jtibshirani · 2020-08-11T22:46:14Z

server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java

@@ -96,6 +97,11 @@ public void execute(SearchContext context) {
        FieldsVisitor fieldsVisitor = createStoredFieldsVisitor(context, storedToRequestedFields);

        try {
+            SearchLookup lookup = new SearchLookup(context.mapperService(), context.getQueryShardContext()::getForField);


One issue here is that the runtime scripts don't have access to the shared _source. It seems like the SearchLookup that's passed to fielddataBuilder will always be the one on QueryShardContext?

It actually seems confusing that SearchLookup lets you perform getForField to get field data, but then this method uses a potentially different SearchLookup to create it.

Good call! I'll dig into it.

nik9000 · 2020-08-12T13:19:53Z

I did an initial pass. It's exciting to see this come together! I'm assuming that after a couple rounds of review, some refactors will be pulled into master for easier maintainability?

I'd hoped to land this in the branch and then cherry-pick it over to master, dropping out all of the "in the branch" stuff.

nik9000 · 2020-08-12T15:08:49Z

server/src/main/java/org/elasticsearch/search/lookup/SearchLookup.java

-        docMap = new DocLookup(mapperService, fieldDataLookup);
+    public SearchLookup(
+        MapperService mapperService,
+        BiFunction<MappedFieldType, Supplier<SearchLookup>, IndexFieldData<?>> fieldDataLookup


Supplier<SearchLookup> is a little weird here but it is compatible with #60318 and I think it fits well with how QueryShardContext uses it.

nik9000 · 2020-08-12T17:20:05Z

@jtibshirani I pushed some updates for all of the things you mentioned. I did move ValueFetcher but didn't make concrete subclasses for it. Please have another look and let me know if you think I should. I'm pretty comfortable with the function parameters but there are certainly other ways to do it. I might just be stuck in a funny mindset.

nik9000 · 2020-08-12T17:21:19Z

And I seem to have broken some tests. One moment!

nik9000 · 2020-08-12T17:30:05Z

And I seem to have broken some tests. One moment!

All better now. I hope.

nik9000 · 2020-08-12T18:40:53Z

run elasticsearch-ci/1

jtibshirani · 2020-08-12T20:29:23Z

@nik9000 and I caught up through a separate channel -- we're going to hold off on moving this PR forward until I've addressed #61033, since that may affect the FieldMapper API.

nik9000 added 4 commits August 5, 2020 13:31

Update test names

a83baab

Fix tests

6409abe

Merge branch 'feature/runtime_fields' into runtime_fields_fetch_fields

311f660

nik9000 marked this pull request as ready for review August 5, 2020 21:43

nik9000 requested a review from jtibshirani August 5, 2020 21:43

nik9000 added the :Search/Search Search-related issues that do not fall into other categories label Aug 5, 2020

elasticmachine added the Team:Search Meta label for search team label Aug 5, 2020

Fix javadoc

686b1f0

nik9000 commented Aug 5, 2020

View reviewed changes

modules/parent-join/src/main/java/org/elasticsearch/join/mapper/ParentJoinFieldMapper.java Outdated Show resolved Hide resolved

nik9000 commented Aug 6, 2020

View reviewed changes

javanna mentioned this pull request Aug 6, 2020

Add support for runtime fields #59332

Closed

30 tasks

Merge branch 'feature/runtime_fields' into runtime_fields_fetch_fields

41fa35b

nik9000 force-pushed the runtime_fields_fetch_fields branch from 8968ca3 to 41fa35b Compare August 11, 2020 16:57

jtibshirani reviewed Aug 11, 2020

View reviewed changes

nik9000 added 4 commits August 12, 2020 09:26

Rename methods

2c6a6f7

Simpler test

1a38dc1

Move lookup ref

9484eb0

checkstyle

45b316c

nik9000 commented Aug 12, 2020

View reviewed changes

nik9000 added 2 commits August 12, 2020 11:27

The doc id already is in the leaf context

c868104

Move ValueFetcher

e2ee7af

nik9000 requested a review from jtibshirani August 12, 2020 17:18

oops wrong method

9cb7416

weltenwort mentioned this pull request Aug 13, 2020

[Logs UI] Support mappings-based runtime fields in the Logs UI elastic/kibana#74937

Closed

javanna closed this Sep 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fetch fields support for runtime fields #60775

Add fetch fields support for runtime fields #60775

nik9000 commented Aug 5, 2020

elasticmachine commented Aug 5, 2020

nik9000 Aug 5, 2020

nik9000 Aug 5, 2020

jtibshirani Aug 11, 2020

nik9000 Aug 12, 2020

nik9000 Aug 5, 2020

nik9000 Aug 6, 2020

nik9000 Aug 6, 2020

nik9000 Aug 6, 2020

nik9000 commented Aug 6, 2020

nik9000 commented Aug 11, 2020

jtibshirani left a comment

jtibshirani Aug 11, 2020

jtibshirani Aug 11, 2020

nik9000 Aug 12, 2020

jtibshirani Aug 11, 2020

jtibshirani Aug 11, 2020

nik9000 Aug 12, 2020

nik9000 commented Aug 12, 2020

nik9000 Aug 12, 2020

nik9000 commented Aug 12, 2020

nik9000 commented Aug 12, 2020

nik9000 commented Aug 12, 2020

nik9000 commented Aug 12, 2020

jtibshirani commented Aug 12, 2020

Add fetch fields support for runtime fields #60775

Add fetch fields support for runtime fields #60775

Conversation

nik9000 commented Aug 5, 2020

elasticmachine commented Aug 5, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nik9000 commented Aug 6, 2020

nik9000 commented Aug 11, 2020

jtibshirani left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nik9000 commented Aug 12, 2020

Choose a reason for hiding this comment

nik9000 commented Aug 12, 2020

nik9000 commented Aug 12, 2020

nik9000 commented Aug 12, 2020

nik9000 commented Aug 12, 2020

jtibshirani commented Aug 12, 2020