-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Follow Up for Add source fallback for keyword fields #88040
Conversation
Pinging @elastic/es-search (Team:Search) |
Pinging @elastic/es-core-infra (Team:Core/Infra) |
Pinging @elastic/clients-team (Team:Clients) |
Hi @jdconrad, I've created a changelog YAML for you. |
@elasticmachine run elasticsearch-ci/packaging-tests-unix-sample |
After discussions with the search team and @javanna I'm going to update this PR to prototype value fetchers instead of runtime fields to do source fallback. If we can make the plumbing work, value fetchers seem to cover all the use cases we want including taking into account options on fields whereas runtime fields do not. This is a requirement to reach parity with the existing doc values. |
I have made the following updates based on discussions with the search team members:
@javanna Would you please take a look and let me know what you think when you have a bit of time? Thanks! |
@elasticmachine run elasticsearch-ci/packaging-tests-windows-sample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some high-level comments, and asked @romseygeek to have a better look and get in touch. Thanks for iterating on this!
) | ||
return new KeywordValueFetcherIndexFieldData.Builder( | ||
name(), | ||
new SourceValueFetcher(searchLookup.get().sourcePaths(name()), nullValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the value fetcher creation already exists in the mapper, right? is it an option to reuse the existing code for it or are there differences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ones currently created for source depend on receiving an existing SearchExecutionContext
. I would definitely be open an an alternative for this, but it didn't seem correct to try to pass the SearchExecutionContext
through to scripting and then through to a mapped type.
@Override | ||
public BytesRef nextValue() throws IOException { | ||
assert iterator.hasNext(); | ||
return new BytesRef(iterator.next().toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that we stumble upon the lack of typing of ValueFetchers here. Calling toString is an ok shortcut here, not sure if it should be integrated in the fetchers directly, especially for other types. Also, for the runtime field types these casts are already available in the mapped field type definition. Not too sure if we want to reuse that, or maybe it makes sense to make value fetchers typed and one day move runtime fields to use value fetchers too? I need to better understand pros and cons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely open to improvement here. I just wasn't sure of the best way to get the required BytesRef for this specific type of doc values. I like the idea you mentioned here of possibly re-using value fetchers for gathering source from runtime fields, but I imagine that could be a small project on its own. I'll take a look for the cast within the mapped field type to see if that could work.
import java.util.SortedSet; | ||
import java.util.TreeSet; | ||
|
||
public class KeywordValueFetcherIndexFieldData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fielddata impl is very similar to that of the keyword runtime field. I wonder if it's worth going through the effort of sharing code between the two. Also, maybe runtime fields that load from source could be changed to not rely on a script then. Would it be acceptable for runtime fields that load from _source to go through value fetchers instead of directly to SourceLookup like they do today?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is likely some common ground here. I agree we should look for a way to share this if possible.
We could probably re-name and re-use BinaryScriptFieldData
as the common base class for both KeywordValueFetcherIndexFieldData
and StringScriptFieldData
.
I also like the idea of routing runtime fields using source through this path instead. One caveat is the value fetchers work on source paths and parent fields as opposed to a single user-provided path so we would have to workout if that's an okay change for the user.
|
||
public static class Builder implements IndexFieldData.Builder { | ||
private final String name; | ||
private final ValueFetcher valueFetcher; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One doubt I have around using value fetchers: we have been talking about allowing value fetchers to load from doc_values or stored fields. Should we make the fielddata that's based on value fetcher depend directly on the source value fetcher then? Would it make any sense to ever load from doc_values through this fielddata impl? Would like to gather feedback about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. This seems like a good topic for a discussion w/ the rest of the search team.
I spoke with @romseygeek, and I'm going to add a couple more examples of how this would work for a numeric type (like long) and a structured type (like GeoPoint) to see if this is a good path to continue forward with. |
@romseygeek I added source fallback for integer and geo point type fields. I also abstracted out the source value fetcher index field data code to try to remove some of the boiler plate. I only added what was strictly necessary for scripting for now, but I believe the code would be easily extendable to other features that wanted to take advantage by making different types of emulated doc values available. I did not do any of the other numeric type fields, but they would follow basically the same pattern. Would you please take another pass when you have a bit of time? Thanks! Also after considering this code for a while I realize it was really only designed to have doc values from Lucene as a direct source. With the addition of runtime fields and now possibly source fallback, I do wonder if there's a way we could abstract out the source of the doc values and pass it into the index field data so index field data can be shared when the source is actual doc values and when the source is emulated doc values instead of needing a separate one for each. But this is a discussion for a different time. |
After speaking with @romseygeek I'm going to produce another WIP PR where instead of having two fielddataBuilder methods (one for scripting and one for search), we're going to have a single method that has options on it to see if that reduces plumbing overhead. |
Closing this in favor of #88735 |
Original PR here (#87765). This PR adds updates to add source fallback through a
MappedType#scriptDatafieldBuilder
method along with all the required additional plumbing for the new scripting fields API to use. This change is based on the design discussion results outlined here (#80504 (comment))Note that the
scriptDatafieldBuilder
method returns aTuple<Boolean, IndexFieldData.Builder>
where theBoolean
value is based on whether or not we used a fallback to retrieve values. This is required for the old-style script access throughdoc['field']
to maintain its current user-facing values. TheLeafDocLookup
cache is a reflection of this where additional plumbing has been added so that the new-style and old-style api can share the cached values retrieved from doc values, but only the new-style api will retrieve values from source.I would please request that a closer look be given to the changes in
IndexFieldService
related to caching as I wasn't sure if we should be caching any values retrieved from source fallback via runtime fields. I also would please request that attention be given to cyclic runtime field checking as while I think the new path covers it correctly, I wasn't sure if I missed some corner cases.