-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement fields fetch for runtime fields #61995
Implement fields fetch for runtime fields #61995
Conversation
This implements the `fields` API in `_search` for runtime fields using doc values. Most of that implementation is stolen from the `docvalue_fields` fetch sub-phase, just moved into the same API that the `fields` API uses. At this point the `docvalue_fields` fetch phase looks like a special case of the `fields` API. While I was at it I moved the "which doc values sub-implementation should I use for fetching?" question from a bunch of `instanceof`s to a method on `LeafFieldData` so we can be much more flexible with what is returned and we're not forced to extend certain classes just to make the fetch phase happy. Relates to elastic#59332
@@ -299,6 +299,18 @@ public SearchLookup lookup() { | |||
return this.lookup; | |||
} | |||
|
|||
private SearchLookup fetchLookup = null; | |||
|
|||
public SearchLookup fetchLookup() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the most controversial part of this change. Previously we were building the SourceLookup in the fetch phase and letting things that needed doc values at fetch time use the standard search lookup. This starts us down a road to removing that. I think we like this because we don't want things like the explain
sub-phase to modify the rest of the fetch phases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just create a SearchLookup
within FetchPhase
? I think we could store it on HitContext
instead of storing SourceLookup
directly. That would make its scope and lifetime clearer. It also feels weird to me to put this on QueryShardContext
, since it's not meant to be used during the query phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmmm. Let me dig into this again! I did it this way when I was building things really early and needed something in a much more "global" place to do the building.
I was trying not to modify HitContext
so much - maybe it can be something we pass in to the phases when we build them. I'll take another look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can create SearchLookup
just for the FetchPhase
. I'll push a change that does that. Sadly, we still have to delegate to QueryShardContext
to create it because it is the thing with all of the right dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I need to read this discussion because I have been asking questions around this as part of my review :)
server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
}; | ||
} | ||
|
||
private abstract static class DocValueField { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the "brains" of this stuff has moved into the LeafFieldData
implementations.
public ValueFetcher valueFetcher(MapperService mapperService, String format) { | ||
throw new UnsupportedOperationException(); | ||
public ValueFetcher valueFetcher(MapperService mapperService, SearchLookup lookup, String format) { | ||
return new DocValueFetcher(fieldType().docValueFormat(format, null), () -> lookup.doc().getForField(fieldType())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that everything else is in the LeafFieldData
implemenations, this is all we need in runtime fields!
Pinging @elastic/es-search (:Search/Search) |
ooooh! This is broken for inner hits. I'll see what I can find! |
Nested works now! New error having to do with date_nanos. Looks like I screwed something up there. I'll track it down! |
And now data nanos! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a first round of comments. (I'm still thinking through the details around the SearchLookup
changes.)
server/src/main/java/org/elasticsearch/search/fetch/subphase/FetchDocValuesPhase.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/DocValueFetcher.java
Outdated
Show resolved
Hide resolved
test/framework/src/main/java/org/elasticsearch/index/mapper/MapperTestCase.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/fetch/subphase/FetchFieldsContext.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now read the full change and left some comments !
public FieldValueRetriever fieldValueRetriever() { | ||
return fieldValueRetriever; | ||
public FieldValueRetriever fieldValueRetriever(String indexName, MapperService mapperService, SearchLookup searchLookup) { | ||
if (mapperService.documentMapper().sourceMapper().enabled() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comment, it feels more natural to do this in FetchFieldsContext
rather than this factory method ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I inherited its location from where it was before - I dunno where it feels better. Would you prefer I move it into the factory method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure if you don't mind I think it's nicer to create it directly in FetchFieldsContext
.
test/framework/src/main/java/org/elasticsearch/index/mapper/MapperTestCase.java
Outdated
Show resolved
Hide resolved
@@ -299,6 +299,18 @@ public SearchLookup lookup() { | |||
return this.lookup; | |||
} | |||
|
|||
private SearchLookup fetchLookup = null; | |||
|
|||
public SearchLookup fetchLookup() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just create a SearchLookup
within FetchPhase
? I think we could store it on HitContext
instead of storing SourceLookup
directly. That would make its scope and lifetime clearer. It also feels weird to me to put this on QueryShardContext
, since it's not meant to be used during the query phase.
@@ -42,5 +44,10 @@ | |||
* @param lookup a lookup structure over the document's source. | |||
* @return a list a standardized field values. | |||
*/ | |||
List<Object> fetchValues(SourceLookup lookup); | |||
List<Object> fetchValues(SourceLookup lookup) throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this idea, but it feels natural to only pass the doc ID here:
List<Object> fetchValues(SourceLookup lookup) throws IOException;
Then SourceValueFetcher
could hang onto the SearchLookup
and use that to access SourceLookup
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about that too! I didn't want to do it as part if this PR because it'd make it huge, but I think its a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - this doesn't look like it'll make the PR much bigger. I'll do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a refactor for this too.
@jtibshirani and I talked about this some over video today and settled on two things:
|
And I pushed changes to do those two things. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, thanks again for the many iterations!
For context, @nik9000 and I decided that this approach is best for the initial PR, but the SearchLookup
sharing logic is still a bit confusing. We're thinking through some follow-up refactors to help solidify + clarify it.
I just realized that we should also remove the REST test exclusions in
|
Oh! I thought I had. I'll do it before merging. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@Override | ||
public List<Object> fetchValues(int docId) throws IOException { | ||
if (false == leaf.advanceExact(docId)) { | ||
return List.of(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping?
} | ||
return this.lookup; | ||
} | ||
|
||
/** | ||
* Build a lookup customized for the fetch phase. Use {@link #lookup()} | ||
* in other phases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we open an issue to track finding an alternative for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like @romseygeek is working on this now.
This implements the `fields` API in `_search` for runtime fields using doc values. Most of that implementation is stolen from the `docvalue_fields` fetch sub-phase, just moved into the same API that the `fields` API uses. At this point the `docvalue_fields` fetch phase looks like a special case of the `fields` API. While I was at it I moved the "which doc values sub-implementation should I use for fetching?" question from a bunch of `instanceof`s to a method on `LeafFieldData` so we can be much more flexible with what is returned and we're not forced to extend certain classes just to make the fetch phase happy. Relates to elastic#59332
In elastic#61995 I moved the `docvalue_field` fetch code into a place where I could share it with the fancy new `fields` fetch API. Specifically, runtime fields can use it all that doc values code now. But I broke `scaled_floats` by switching them how they are fetched from `double` to `string`. This adds the override you need to switch them back.
In elastic#61995 I moved the `docvalue_field` fetch code into a place where I could share it with the fancy new `fields` fetch API. Specifically, runtime fields can use it all that doc values code now. But I broke `scaled_floats` by switching them how they are fetched from `double` to `string`. This adds the override you need to switch them back.
This implements the `fields` API in `_search` for runtime fields using doc values. Most of that implementation is stolen from the `docvalue_fields` fetch sub-phase, just moved into the same API that the `fields` API uses. At this point the `docvalue_fields` fetch phase looks like a special case of the `fields` API. While I was at it I moved the "which doc values sub-implementation should I use for fetching?" question from a bunch of `instanceof`s to a method on `LeafFieldData` so we can be much more flexible with what is returned and we're not forced to extend certain classes just to make the fetch phase happy. Relates to #59332
In #61995 I moved the `docvalue_field` fetch code into a place where I could share it with the fancy new `fields` fetch API. Specifically, runtime fields can use it all that doc values code now. But I broke `scaled_floats` by switching them how they are fetched from `double` to `string`. This adds the override you need to switch them back.
This implements the
fields
API in_search
for runtime fields usingdoc values. Most of that implementation is stolen from the
docvalue_fields
fetch sub-phase, just moved into the same API that thefields
API uses. At this point thedocvalue_fields
fetch phase lookslike a special case of the
fields
API.While I was at it I moved the "which doc values sub-implementation
should I use for fetching?" question from a bunch of
instanceof
s to amethod on
LeafFieldData
so we can be much more flexible with what isreturned and we're not forced to extend certain classes just to make the
fetch phase happy.
Relates to #59332