Follow Up for Add source fallback for keyword fields #88040

jdconrad · 2022-06-24T23:18:00Z

Original PR here (#87765). This PR adds updates to add source fallback through a MappedType#scriptDatafieldBuilder method along with all the required additional plumbing for the new scripting fields API to use. This change is based on the design discussion results outlined here (#80504 (comment))

Note that the scriptDatafieldBuilder method returns a Tuple<Boolean, IndexFieldData.Builder> where the Boolean value is based on whether or not we used a fallback to retrieve values. This is required for the old-style script access through doc['field'] to maintain its current user-facing values. The LeafDocLookup cache is a reflection of this where additional plumbing has been added so that the new-style and old-style api can share the cached values retrieved from doc values, but only the new-style api will retrieve values from source.

I would please request that a closer look be given to the changes in IndexFieldService related to caching as I wasn't sure if we should be caching any values retrieved from source fallback via runtime fields. I also would please request that attention be given to cyclic runtime field checking as while I think the new path covers it correctly, I wasn't sure if I missed some corner cases.

elasticmachine · 2022-06-24T23:18:04Z

Pinging @elastic/es-search (Team:Search)

elasticmachine · 2022-06-24T23:18:04Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

elasticmachine · 2022-06-25T21:34:33Z

Pinging @elastic/clients-team (Team:Clients)

elasticsearchmachine · 2022-06-27T14:06:25Z

Hi @jdconrad, I've created a changelog YAML for you.

jdconrad · 2022-06-27T20:51:29Z

@elasticmachine run elasticsearch-ci/packaging-tests-unix-sample

jdconrad · 2022-07-07T17:35:49Z

After discussions with the search team and @javanna I'm going to update this PR to prototype value fetchers instead of runtime fields to do source fallback. If we can make the plumbing work, value fetchers seem to cover all the use cases we want including taking into account options on fields whereas runtime fields do not. This is a requirement to reach parity with the existing doc values.

jdconrad · 2022-07-12T01:12:04Z

I have made the following updates based on discussions with the search team members:

The keyword field type now uses a ValueFetcher to do source fallback instead of a runtime field. The majority of this new code is contained in KeywordValueFetcherIndexFieldData. This is an example of how we could apply value fetchers to other fields required for the new scripting API as well. There is probably opportunity to extract out code, but I didn't want to over-complicate anything until it was looked at.
I removed the now extraneous code from runtime fields. This gets rid of source paths having to be plumbed into them.
I removed the Tuple return from scriptFielddataBuilder and instead replaced this information with a marker interface on the KeywordValueFetcherDocValues. Since we are no longer using runtime fields it's much simpler to just add a marker interface to any of the new doc values generated from value fetcher to use an instance of check against in LeafDocLookup to differentiate between the old doc style and the new scripting fields style.

@javanna Would you please take a look and let me know what you think when you have a bit of time? Thanks!

jdconrad · 2022-07-12T18:15:21Z

@elasticmachine run elasticsearch-ci/packaging-tests-windows-sample

javanna

I left some high-level comments, and asked @romseygeek to have a better look and get in touch. Thanks for iterating on this!

javanna · 2022-07-12T15:14:23Z

server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java

-                )
+            return new KeywordValueFetcherIndexFieldData.Builder(
+                name(),
+                new SourceValueFetcher(searchLookup.get().sourcePaths(name()), nullValue) {


the value fetcher creation already exists in the mapper, right? is it an option to reuse the existing code for it or are there differences?

The ones currently created for source depend on receiving an existing SearchExecutionContext. I would definitely be open an an alternative for this, but it didn't seem correct to try to pass the SearchExecutionContext through to scripting and then through to a mapped type.

javanna · 2022-07-12T15:18:27Z

server/src/main/java/org/elasticsearch/index/fielddata/KeywordValueFetcherIndexFieldData.java

+        @Override
+        public BytesRef nextValue() throws IOException {
+            assert iterator.hasNext();
+            return new BytesRef(iterator.next().toString());


I guess that we stumble upon the lack of typing of ValueFetchers here. Calling toString is an ok shortcut here, not sure if it should be integrated in the fetchers directly, especially for other types. Also, for the runtime field types these casts are already available in the mapped field type definition. Not too sure if we want to reuse that, or maybe it makes sense to make value fetchers typed and one day move runtime fields to use value fetchers too? I need to better understand pros and cons.

I'm definitely open to improvement here. I just wasn't sure of the best way to get the required BytesRef for this specific type of doc values. I like the idea you mentioned here of possibly re-using value fetchers for gathering source from runtime fields, but I imagine that could be a small project on its own. I'll take a look for the cast within the mapped field type to see if that could work.

javanna · 2022-07-13T12:27:51Z

server/src/main/java/org/elasticsearch/index/fielddata/KeywordValueFetcherIndexFieldData.java

+import java.util.SortedSet;
+import java.util.TreeSet;
+
+public class KeywordValueFetcherIndexFieldData


this fielddata impl is very similar to that of the keyword runtime field. I wonder if it's worth going through the effort of sharing code between the two. Also, maybe runtime fields that load from source could be changed to not rely on a script then. Would it be acceptable for runtime fields that load from _source to go through value fetchers instead of directly to SourceLookup like they do today?

There is likely some common ground here. I agree we should look for a way to share this if possible.

We could probably re-name and re-use BinaryScriptFieldData as the common base class for both KeywordValueFetcherIndexFieldData and StringScriptFieldData.

I also like the idea of routing runtime fields using source through this path instead. One caveat is the value fetchers work on source paths and parent fields as opposed to a single user-provided path so we would have to workout if that's an okay change for the user.

javanna · 2022-07-13T12:29:50Z

server/src/main/java/org/elasticsearch/index/fielddata/KeywordValueFetcherIndexFieldData.java

+
+    public static class Builder implements IndexFieldData.Builder {
+        private final String name;
+        private final ValueFetcher valueFetcher;


One doubt I have around using value fetchers: we have been talking about allowing value fetchers to load from doc_values or stored fields. Should we make the fielddata that's based on value fetcher depend directly on the source value fetcher then? Would it make any sense to ever load from doc_values through this fielddata impl? Would like to gather feedback about this.

That's a good question. This seems like a good topic for a discussion w/ the rest of the search team.

jdconrad · 2022-07-13T16:48:04Z

I spoke with @romseygeek, and I'm going to add a couple more examples of how this would work for a numeric type (like long) and a structured type (like GeoPoint) to see if this is a good path to continue forward with.

jdconrad · 2022-07-18T18:11:38Z

@romseygeek I added source fallback for integer and geo point type fields. I also abstracted out the source value fetcher index field data code to try to remove some of the boiler plate. I only added what was strictly necessary for scripting for now, but I believe the code would be easily extendable to other features that wanted to take advantage by making different types of emulated doc values available. I did not do any of the other numeric type fields, but they would follow basically the same pattern.

Would you please take another pass when you have a bit of time? Thanks!

Also after considering this code for a while I realize it was really only designed to have doc values from Lucene as a direct source. With the addition of runtime fields and now possibly source fallback, I do wonder if there's a way we could abstract out the source of the doc values and pass it into the index field data so index field data can be shared when the source is actual doc values and when the source is emulated doc values instead of needing a separate one for each. But this is a discussion for a different time.

jdconrad · 2022-07-21T16:43:29Z

After speaking with @romseygeek I'm going to produce another WIP PR where instead of having two fielddataBuilder methods (one for scripting and one for search), we're going to have a single method that has options on it to see if that reduces plumbing overhead.

jdconrad · 2022-07-25T18:26:15Z

Closing this in favor of #88735

jdconrad added 4 commits June 24, 2022 15:58

Update SearchLookup to include source paths for each field.

9227535

Add source fallback for keyword field.

ad63384

Update some tests for keyword source fallback.

d110c7d

separate script source fallback into its own method

892b27a

jdconrad added >enhancement WIP :Search Foundations/Mapping Index mappings, including merging and defining field types :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache labels Jun 24, 2022

jdconrad requested review from jpountz, javanna and romseygeek June 24, 2022 23:18

elasticmachine added Team:Search Meta label for search team Team:Core/Infra Meta label for core/infra team labels Jun 24, 2022

elasticsearchmachine added the v8.4.0 label Jun 24, 2022

sethmlarson added the Team:Clients Meta label for clients team label Jun 25, 2022

jdconrad removed the WIP label Jun 27, 2022

jdconrad added 2 commits June 27, 2022 07:06

Update docs/changelog/88040.yaml

30fc69b

Merge branch 'master' into sourcefallback

4c522c4

jdconrad removed the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label Jun 27, 2022

elasticmachine removed the Team:Core/Infra Meta label for core/infra team label Jun 27, 2022

update change log

27f7d80

jdconrad added the Team:Core/Infra Meta label for core/infra team label Jun 27, 2022

Merge branch 'master' into sourcefallback

47e8955

Merge branch 'master' into sourcefallback

d105a93

jdconrad added the WIP label Jul 7, 2022

jdconrad added 6 commits July 7, 2022 13:55

Delete docs/changelog/88040.yaml

60d4d0e

Merge branch 'master' into sourcefallback

661d2c1

Merge branch 'master' into sourcefallback

1d0ffd1

fix tests with master merge

39cde22

add keyword source fallback for scripting using value fetcher

0693ac0

replace tuple return with marker interface

9062de7

Merge branch 'master' into sourcefallback

c56371e

Merge branch 'master' into sourcefallback

6a3aa33

javanna reviewed Jul 13, 2022

View reviewed changes

jdconrad added 5 commits July 14, 2022 09:51

Merge branch 'master' into sourcefallback

6a03351

Split off value fetcher index field data into base classes.

170b9af

Add integer field to source fallback w/ value fetcher strategy.

50beaea

Add script source fallback for geopoint field.

c91f730

Merge branch 'master' into sourcefallback

a8854c3

Merge branch 'master' into sourcefallback

38df037

jdconrad mentioned this pull request Jul 22, 2022

Add source fallback for keyword fields using operation #88735

Merged

Merge branch 'master' into sourcefallback

c0ab030

elasticsearchmachine changed the base branch from master to main July 22, 2022 23:06

jdconrad closed this Jul 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow Up for Add source fallback for keyword fields #88040

Follow Up for Add source fallback for keyword fields #88040

jdconrad commented Jun 24, 2022

elasticmachine commented Jun 24, 2022

elasticmachine commented Jun 24, 2022

elasticmachine commented Jun 25, 2022

elasticsearchmachine commented Jun 27, 2022

jdconrad commented Jun 27, 2022

jdconrad commented Jul 7, 2022 •

edited

Loading

jdconrad commented Jul 12, 2022

jdconrad commented Jul 12, 2022

javanna left a comment

javanna Jul 12, 2022

jdconrad Jul 13, 2022

javanna Jul 12, 2022

jdconrad Jul 13, 2022

javanna Jul 13, 2022

jdconrad Jul 13, 2022

javanna Jul 13, 2022

jdconrad Jul 13, 2022 •

edited

Loading

jdconrad commented Jul 13, 2022

jdconrad commented Jul 18, 2022

jdconrad commented Jul 21, 2022

jdconrad commented Jul 25, 2022

Follow Up for Add source fallback for keyword fields #88040

Follow Up for Add source fallback for keyword fields #88040

Conversation

jdconrad commented Jun 24, 2022

elasticmachine commented Jun 24, 2022

elasticmachine commented Jun 24, 2022

elasticmachine commented Jun 25, 2022

elasticsearchmachine commented Jun 27, 2022

jdconrad commented Jun 27, 2022

jdconrad commented Jul 7, 2022 • edited Loading

jdconrad commented Jul 12, 2022

jdconrad commented Jul 12, 2022

javanna left a comment

Choose a reason for hiding this comment

javanna Jul 12, 2022

Choose a reason for hiding this comment

jdconrad Jul 13, 2022

Choose a reason for hiding this comment

javanna Jul 12, 2022

Choose a reason for hiding this comment

jdconrad Jul 13, 2022

Choose a reason for hiding this comment

javanna Jul 13, 2022

Choose a reason for hiding this comment

jdconrad Jul 13, 2022

Choose a reason for hiding this comment

javanna Jul 13, 2022

Choose a reason for hiding this comment

jdconrad Jul 13, 2022 • edited Loading

Choose a reason for hiding this comment

jdconrad commented Jul 13, 2022

jdconrad commented Jul 18, 2022

jdconrad commented Jul 21, 2022

jdconrad commented Jul 25, 2022

jdconrad commented Jul 7, 2022 •

edited

Loading

jdconrad Jul 13, 2022 •

edited

Loading