Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow Up for Add source fallback for keyword fields #88040

Closed
wants to merge 24 commits into from

Conversation

jdconrad
Copy link
Contributor

Original PR here (#87765). This PR adds updates to add source fallback through a MappedType#scriptDatafieldBuilder method along with all the required additional plumbing for the new scripting fields API to use. This change is based on the design discussion results outlined here (#80504 (comment))

Note that the scriptDatafieldBuilder method returns a Tuple<Boolean, IndexFieldData.Builder> where the Boolean value is based on whether or not we used a fallback to retrieve values. This is required for the old-style script access through doc['field'] to maintain its current user-facing values. The LeafDocLookup cache is a reflection of this where additional plumbing has been added so that the new-style and old-style api can share the cached values retrieved from doc values, but only the new-style api will retrieve values from source.

I would please request that a closer look be given to the changes in IndexFieldService related to caching as I wasn't sure if we should be caching any values retrieved from source fallback via runtime fields. I also would please request that attention be given to cyclic runtime field checking as while I think the new path covers it correctly, I wasn't sure if I missed some corner cases.

@jdconrad jdconrad added >enhancement WIP :Search Foundations/Mapping Index mappings, including merging and defining field types :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache labels Jun 24, 2022
@elasticmachine elasticmachine added Team:Search Meta label for search team Team:Core/Infra Meta label for core/infra team labels Jun 24, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@sethmlarson sethmlarson added the Team:Clients Meta label for clients team label Jun 25, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/clients-team (Team:Clients)

@jdconrad jdconrad removed the WIP label Jun 27, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @jdconrad, I've created a changelog YAML for you.

@jdconrad jdconrad removed the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label Jun 27, 2022
@elasticmachine elasticmachine removed the Team:Core/Infra Meta label for core/infra team label Jun 27, 2022
@jdconrad jdconrad added the Team:Core/Infra Meta label for core/infra team label Jun 27, 2022
@jdconrad
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/packaging-tests-unix-sample

@jdconrad
Copy link
Contributor Author

jdconrad commented Jul 7, 2022

After discussions with the search team and @javanna I'm going to update this PR to prototype value fetchers instead of runtime fields to do source fallback. If we can make the plumbing work, value fetchers seem to cover all the use cases we want including taking into account options on fields whereas runtime fields do not. This is a requirement to reach parity with the existing doc values.

@jdconrad jdconrad added the WIP label Jul 7, 2022
@jdconrad
Copy link
Contributor Author

I have made the following updates based on discussions with the search team members:

  1. The keyword field type now uses a ValueFetcher to do source fallback instead of a runtime field. The majority of this new code is contained in KeywordValueFetcherIndexFieldData. This is an example of how we could apply value fetchers to other fields required for the new scripting API as well. There is probably opportunity to extract out code, but I didn't want to over-complicate anything until it was looked at.
  2. I removed the now extraneous code from runtime fields. This gets rid of source paths having to be plumbed into them.
  3. I removed the Tuple return from scriptFielddataBuilder and instead replaced this information with a marker interface on the KeywordValueFetcherDocValues. Since we are no longer using runtime fields it's much simpler to just add a marker interface to any of the new doc values generated from value fetcher to use an instance of check against in LeafDocLookup to differentiate between the old doc style and the new scripting fields style.

@javanna Would you please take a look and let me know what you think when you have a bit of time? Thanks!

@jdconrad
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/packaging-tests-windows-sample

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some high-level comments, and asked @romseygeek to have a better look and get in touch. Thanks for iterating on this!

)
return new KeywordValueFetcherIndexFieldData.Builder(
name(),
new SourceValueFetcher(searchLookup.get().sourcePaths(name()), nullValue) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the value fetcher creation already exists in the mapper, right? is it an option to reuse the existing code for it or are there differences?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ones currently created for source depend on receiving an existing SearchExecutionContext. I would definitely be open an an alternative for this, but it didn't seem correct to try to pass the SearchExecutionContext through to scripting and then through to a mapped type.

@Override
public BytesRef nextValue() throws IOException {
assert iterator.hasNext();
return new BytesRef(iterator.next().toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that we stumble upon the lack of typing of ValueFetchers here. Calling toString is an ok shortcut here, not sure if it should be integrated in the fetchers directly, especially for other types. Also, for the runtime field types these casts are already available in the mapped field type definition. Not too sure if we want to reuse that, or maybe it makes sense to make value fetchers typed and one day move runtime fields to use value fetchers too? I need to better understand pros and cons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely open to improvement here. I just wasn't sure of the best way to get the required BytesRef for this specific type of doc values. I like the idea you mentioned here of possibly re-using value fetchers for gathering source from runtime fields, but I imagine that could be a small project on its own. I'll take a look for the cast within the mapped field type to see if that could work.

import java.util.SortedSet;
import java.util.TreeSet;

public class KeywordValueFetcherIndexFieldData
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fielddata impl is very similar to that of the keyword runtime field. I wonder if it's worth going through the effort of sharing code between the two. Also, maybe runtime fields that load from source could be changed to not rely on a script then. Would it be acceptable for runtime fields that load from _source to go through value fetchers instead of directly to SourceLookup like they do today?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is likely some common ground here. I agree we should look for a way to share this if possible.

We could probably re-name and re-use BinaryScriptFieldData as the common base class for both KeywordValueFetcherIndexFieldData and StringScriptFieldData.

I also like the idea of routing runtime fields using source through this path instead. One caveat is the value fetchers work on source paths and parent fields as opposed to a single user-provided path so we would have to workout if that's an okay change for the user.


public static class Builder implements IndexFieldData.Builder {
private final String name;
private final ValueFetcher valueFetcher;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One doubt I have around using value fetchers: we have been talking about allowing value fetchers to load from doc_values or stored fields. Should we make the fielddata that's based on value fetcher depend directly on the source value fetcher then? Would it make any sense to ever load from doc_values through this fielddata impl? Would like to gather feedback about this.

Copy link
Contributor Author

@jdconrad jdconrad Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. This seems like a good topic for a discussion w/ the rest of the search team.

@jdconrad
Copy link
Contributor Author

I spoke with @romseygeek, and I'm going to add a couple more examples of how this would work for a numeric type (like long) and a structured type (like GeoPoint) to see if this is a good path to continue forward with.

@jdconrad
Copy link
Contributor Author

@romseygeek I added source fallback for integer and geo point type fields. I also abstracted out the source value fetcher index field data code to try to remove some of the boiler plate. I only added what was strictly necessary for scripting for now, but I believe the code would be easily extendable to other features that wanted to take advantage by making different types of emulated doc values available. I did not do any of the other numeric type fields, but they would follow basically the same pattern.

Would you please take another pass when you have a bit of time? Thanks!

Also after considering this code for a while I realize it was really only designed to have doc values from Lucene as a direct source. With the addition of runtime fields and now possibly source fallback, I do wonder if there's a way we could abstract out the source of the doc values and pass it into the index field data so index field data can be shared when the source is actual doc values and when the source is emulated doc values instead of needing a separate one for each. But this is a discussion for a different time.

@jdconrad
Copy link
Contributor Author

After speaking with @romseygeek I'm going to produce another WIP PR where instead of having two fielddataBuilder methods (one for scripting and one for search), we're going to have a single method that has options on it to see if that reduces plumbing overhead.

@elasticsearchmachine elasticsearchmachine changed the base branch from master to main July 22, 2022 23:06
@jdconrad
Copy link
Contributor Author

Closing this in favor of #88735

@jdconrad jdconrad closed this Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Clients Meta label for clients team Team:Core/Infra Meta label for core/infra team Team:Search Meta label for search team v8.4.0 WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants