-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce incremental reduction of TopDocs #23946
Conversation
This commit adds support for incremental top N reduction if the number of expected shards in the search request is high enough. The changes here also clean up more code in SearchPhaseController to make the separation between values that are the same on each search result and values that are per response. The reduced search phase result doesn't hold an arbitrary result to obtain values like `from`, `size` or sort values which is now cleanly encapsulated.
try { | ||
// the search context should inherit the default timeout | ||
assertThat(contextWithDefaultTimeout.timeout(), equalTo(TimeValue.timeValueSeconds(5))); | ||
} finally { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests where annoyingly slow since they waited for timeouts since shards where still locked - this shaved 10 seconds off the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And that's how you incrementally introduced the incremental reduction of search results !
Looks great !
* @param bufferdAggs a list of pre-collected / buffered aggregations. if this list is non-null all aggregations have been consumed | ||
* @param bufferedAggs a list of pre-collected / buffered aggregations. if this list is non-null all aggregations have been consumed | ||
* from all non-null query results. | ||
* @param bufferedAggs a list of pre-collected / buffered top docs. if this list is non-null all top docs have been consumed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: bufferedTopDocs
// the top docs sort fields used to sort the score docs, <code>null</code> if the results are not sorted | ||
final SortField[] sortField; | ||
// <code>true</code> iff the result score docs is sorted | ||
final boolean isSorted; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: hasFields ? score docs are always sorted ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it to isSortedByField
since this is really what it is :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments, mostly about readability but I like the change in general. In many cases that you get many shard results, I think most shard results would be empty so I'm wondering whether we should optimize for that case (in another PR).
if (size != -1) { | ||
final ScoreDoc[] mergedScoreDocs = mergeTopDocs(topDocs, size, ignoreFrom ? 0 : from); | ||
final boolean hasNoHits = groupedCompletionSuggestions.isEmpty() && topDocs.isEmpty(); | ||
if (hasNoHits == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we avoid the double negation that makes things a bit harder to read by calling the var hasHits
?
if (results.isEmpty()) { | ||
return EMPTY_DOCS; | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, why was it an issue to return EMPTY_DOCS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the return value is again TopDocs
and not ScoreDocs[]
so private static final ScoreDoc[] EMPTY_DOCS = new ScoreDoc[0];
wouldn't cut it
List<InternalAggregations> bufferdAggs, int numReducePhases) { | ||
List<InternalAggregations> bufferedAggs, | ||
List<TopDocs> bufferedTopDocs, TopDocsStats topDocsStats, int numReducePhases, | ||
boolean isScrollRequest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you fix the indentation here? all parameters do not seem to start on the same column
@@ -204,23 +209,35 @@ private static long optionalSum(long left, long right) { | |||
} | |||
} | |||
} | |||
return scoreDocs; | |||
final boolean isSorted; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it what is called isSortedByField
elsewhere?
static class SortedTopDocs { | ||
static final SortedTopDocs EMPTY = new SortedTopDocs(EMPTY_DOCS, false, null); | ||
final ScoreDoc[] scoreDocs; | ||
final boolean sorted; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it what is called isSortedByField elsewhere?
@jpountz I pushed a new commit |
can you elaborate on this a bit I am not sure I am following. |
I was just thinking about the fact that adding more reductions can make results less accurate eg. for |
oh I see what you mean, today if we get a result we don't check if it had any hits at all and in such a case we can just skip it (not buffer it). Is it that what you mean? that is a low hanging fruit I guess... |
This commit adds support for incremental top N reduction if the number of expected shards in the search request is high enough. The changes here also clean up more code in SearchPhaseController to make the separation between values that are the same on each search result and values that are per response. The reduced search phase result doesn't hold an arbitrary result to obtain values like `from`, `size` or sort values which is now cleanly encapsulated.
Yes. I'm wondering there might be issues with the |
final boolean hasTopDocs = source == null || source.size() != 0; | ||
|
||
if (isScrollRequest == false && (hasAggs || hasTopDocs)) { | ||
// no incremental reduce if scroll is used - we only hit a single shard or sometimes more... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@s1monw Would you mind explaining why cannot use incremental reduce if scroll is used. This confuses me.
This commit adds support for incremental top N reduction if the number of
expected shards in the search request is high enough. The changes here
also clean up more code in SearchPhaseController to make the separation
between values that are the same on each search result and values that
are per response. The reduced search phase result doesn't hold an arbitrary
result to obtain values like
from
,size
or sort values which is nowcleanly encapsulated.