-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify SourceLookup sharing across fetch subphases. #60179
Conversation
58a7d72
to
aca5233
Compare
Pinging @elastic/es-search (:Search/Search) |
@@ -65,10 +65,6 @@ public void hitsExecute(SearchContext context, SearchHit[] hits) throws IOExcept | |||
} | |||
innerHits.docIdsToLoad(docIdsToLoad, 0, docIdsToLoad.length); | |||
innerHits.setId(hit.getId()); | |||
innerHits.lookup().source().setSource(context.lookup().source().internalSourceRef()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We set the root document's _source here, but when processing inner hits we didn't use it and instead reloaded the _source. So this is safe to remove for now.
In a follow-up PR, I will fix this to make sure the _source is loaded only once for the root doc + inner hits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, that's something that we spotted sometime ago: #32818 (comment) and maybe #56210 but never implemented.
@jtibshirani Thanks for the PR, I like the idea. I had an initial look, and will look more thoroughly tomorrow. |
...ava/org/elasticsearch/search/aggregations/bucket/terms/SignificantTextAggregatorFactory.java
Outdated
Show resolved
Hide resolved
This saved _source actually never appears to be used.
server/src/main/java/org/elasticsearch/search/lookup/SearchLookup.java
Outdated
Show resolved
Hide resolved
I caught up with @nik9000 offline about the refactor and wanted to add more context as to its benefits:
However I now see a downside to this refactor. There used to be only one way to share a I'm curious as to what others think. I still like the refactor and feel that it moves us in a better direction. |
server/src/main/java/org/elasticsearch/search/lookup/SearchLookup.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/lookup/SearchLookup.java
Outdated
Show resolved
Hide resolved
We don't have such example at the moment, right ? That's also something we can tackle in a follow up, currently we use the lookup of the query shard context to compile the scripted field but we could refactor this part to move to the new hit context ? Hits are now sorted by doc ids in fetch sub phases so it's also a good chance to get ride of the |
aca5233
to
174edb6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change. I left one comment regarding a possible hack to ensure that scripts and fetch sub phase can still share the _source but I don't feel strongly about it. We can solve the double load more nicely in a follow up if needed.
SourceLookup sourceLookup = context.lookup().source(); | ||
sourceLookup.setSegmentAndDocument(subReaderContext, subDocId); | ||
if (fieldsVisitor.source() != null) { | ||
sourceLookup.setSource(fieldsVisitor.source()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can still assign the context.lookup().source()
with the current _source so that _script
can share the source with other fetch sub-phase. That would only work for sub-phases that use a script and override hitExecute
so the new FetchFieldsPhase
would benefit from this hack ? We can then think how we could clenup the context for script in a follow up ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think FetchFieldsPhase
would benefit currently, because it doesn't support scripts? I like the plan of considering it in a follow-up, for example as part of converting ScriptFieldsPhase
to use hitExecute
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I was thinking preemptively of runtime scripted field but +1 to consider this in a follow up.
@@ -65,10 +65,6 @@ public void hitsExecute(SearchContext context, SearchHit[] hits) throws IOExcept | |||
} | |||
innerHits.docIdsToLoad(docIdsToLoad, 0, docIdsToLoad.length); | |||
innerHits.setId(hit.getId()); | |||
innerHits.lookup().source().setSource(context.lookup().source().internalSourceRef()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, that's something that we spotted sometime ago: #32818 (comment) and maybe #56210 but never implemented.
...java/org/elasticsearch/search/fetch/subphase/highlight/SourceScoreOrderFragmentsBuilder.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/search/fetch/subphase/highlight/SourceSimpleFragmentsBuilder.java
Outdated
Show resolved
Hide resolved
They are not closely related to this PR and it's clearer to keep them separate.
In addition to restoring the shared Here is the final summary of changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtibshirani Thanks Julie, looks a very good change to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM2
Thanks everyone, the review feedback was really helpful. |
This PR fixes a regression where fvh fragments could be loaded from the wrong document _source. Some `FragmentsBuilder` implementations contain a `SourceLookup` to load from _source. The lookup should be positioned to load from the current hit document. However, since `FragmentsBuilder` are cached and shared across hits, the lookup is never updated to load from the new documents. This means we accidentally load _source from a different document. The regression was introduced in #60179, which started storing `SourceLookup` on `FragmentsBuilder`. Fixes #65533.
…5641) This PR fixes a regression where fvh fragments could be loaded from the wrong document _source. Some `FragmentsBuilder` implementations contain a `SourceLookup` to load from _source. The lookup should be positioned to load from the current hit document. However, since `FragmentsBuilder` are cached and shared across hits, the lookup is never updated to load from the new documents. This means we accidentally load _source from a different document. The regression was introduced in elastic#60179, which started storing `SourceLookup` on `FragmentsBuilder`. Fixes elastic#65533.
This PR fixes a regression where fvh fragments could be loaded from the wrong document _source. Some `FragmentsBuilder` implementations contain a `SourceLookup` to load from _source. The lookup should be positioned to load from the current hit document. However, since `FragmentsBuilder` are cached and shared across hits, the lookup is never updated to load from the new documents. This means we accidentally load _source from a different document. The regression was introduced in #60179, which started storing `SourceLookup` on `FragmentsBuilder`. Fixes #65533.
…5641) This PR fixes a regression where fvh fragments could be loaded from the wrong document _source. Some `FragmentsBuilder` implementations contain a `SourceLookup` to load from _source. The lookup should be positioned to load from the current hit document. However, since `FragmentsBuilder` are cached and shared across hits, the lookup is never updated to load from the new documents. This means we accidentally load _source from a different document. The regression was introduced in elastic#60179, which started storing `SourceLookup` on `FragmentsBuilder`. Fixes elastic#65533.
This PR fixes a regression where fvh fragments could be loaded from the wrong document _source. Some `FragmentsBuilder` implementations contain a `SourceLookup` to load from _source. The lookup should be positioned to load from the current hit document. However, since `FragmentsBuilder` are cached and shared across hits, the lookup is never updated to load from the new documents. This means we accidentally load _source from a different document. The regression was introduced in #60179, which started storing `SourceLookup` on `FragmentsBuilder`. Fixes #65533.
The
SourceLookup
class provides access to the _source for a particulardocument, specified through
SourceLookup#setSegmentAndDocument
. Previouslythe search context contained a single
SourceLookup
that was shared betweendifferent fetch subphases. It was hard to reason about its state: is
SourceLookup
set to the expected document? Is the _source already loaded andavailable?
Instead of using a global source lookup, the fetch hit context now provides
access to a lookup that is set to load from the hit document.
This refactor closes #31000, since the same
SourceLookup
is no longer sharedbetween the 'fetch _source phase' and script execution.