-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Cancellable DirectoryReader #52822
Conversation
Implement a Cancellable DirectoryReader that wraps the original DirectoryReader so that when a search task is cancelled the DirectoryReaders also stop their work fast. This is usuful for expensive operations like wilcard/prefix queries where the DirectoryReaders can spend lots of time and consume resources, as previously their work wouldn't stop even though the original search task was cancelled (e.g. because of timeout or dropped client connection).
Pinging @elastic/es-search (:Search/Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some suggestions but this looks like it's on the right track to me.
@@ -708,6 +708,8 @@ private DefaultSearchContext createSearchContext(SearchRewriteContext rewriteCon | |||
searcher, clusterService, indexService, indexShard, bigArrays, threadPool::relativeTimeInMillis, timeout, fetchPhase); | |||
success = true; | |||
return searchContext; | |||
} catch (IOException e) { | |||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this leniency looks dangerous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I simply propagate the exception? or any other suggestion?
} | ||
cancellable.get().run(); | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation feels a bit awkward, I'd rather like to fork ExitableDirectoryReader entirely to not inherit from its QueryTimeout abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that in the 1st approach but this means we have to copy the whole ExitablePointValues
to wrap the point values and therefore the ExitableIntersectVisitor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind copying it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to copy
IndexSearcher searcher = new IndexSearcher(reader); | ||
PercentileRanks ranks = search(searcher, new MatchAllDocsQuery(), aggBuilder, fieldType); | ||
Percentile rank = ranks.iterator().next(); | ||
assertEquals(Double.NaN, rank.getPercent(), 0d); | ||
assertEquals(0.5, rank.getValue(), 0d); | ||
assertFalse(AggregationInspectionHelper.hasValue(((InternalTDigestPercentileRanks)ranks))); | ||
} finally { | ||
unmappedIndexWriter.close(); | ||
directory.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use try-with-resources?
public ContextIndexSearcher(IndexReader reader, Similarity similarity, | ||
QueryCache queryCache, QueryCachingPolicy queryCachingPolicy, | ||
boolean shouldWrap) throws IOException { | ||
super(shouldWrap? new CancellableIndexReader((DirectoryReader) reader, new Cancellable()) : reader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CancellableIndexReader shouldn't have any overhead, so it might be simpler to wrap all the time here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This boolean was added together with a second constructor because of the AggregatorTestCase and hundreds of tests that derive from that. If we wrap the IndexReader we get:
java.lang.AssertionError: The top-reader used to create Weight is not the same as the current reader's top-reader (org.apache.lucene.index.CompositeReaderContext@382edaaa
which I tried to fix by changing the AggregatorTestCase to receive IndexReader and not IndexSearcher as an argument. and all the tests to use the derived IndexSearcher from the context created. But even with this there were a few more tests failing that didn't manage to fix, so after discussion with @jimczi we decided to make this workaround for the moment and address the issue in a separate PR afterwards.
I can add a TODO though to not miss it.
/** | ||
* Wraps an {@link IndexReader} with a cancellation Runnable task. | ||
*/ | ||
private static class CancellableIndexReader extends FilterDirectoryReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call it CancellableDirectoryReader
if it extends DirectoryReader.
@jpountz @jimczi I had to fork the ExitableDirectoryReader because of this: https://github.com/elastic/elasticsearch/pull/52822/files#diff-913ad694a7d744ee93ae1dac48d67b0eR415 The CompletionTerms is a final class and the code in CompletionWeight here: https://github.com/apache/lucene-solr/blob/master/lucene/suggest/src/java/org/apache/lucene/search/suggest/document/CompletionWeight.java#L76 expects CompletionTerms, so if they are wrapped into ExitableTerms the if leads to an IllegalArguementException. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forking sounds totally fine to me.
Can you explain why we need to unset the timeout at some point? I'm guessing that it is to allow aggregations to build their results without errors? A comment would help clarify.
Also I'm not sure why some tests need to become integration tests? These tests used to be integration tests that we migrated to unit test.
@@ -310,6 +297,8 @@ static boolean executeInternal(SearchContext searchContext) throws QueryPhaseExe | |||
queryResult.nodeQueueSize(rExecutor.getCurrentQueueSize()); | |||
queryResult.serviceTimeEWMA((long) rExecutor.getTaskExecutionEWMA()); | |||
} | |||
// Search phase has finished, no longer need to check for timeout | |||
searcher.unsetCheckTimeout(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that we don't experience a timeout during the aggregation phase. (there were integ tests that were failing because of this). I changed the comment to mention it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do this on the cancellable object directly. The searcher shouldn't be able to unset this way. See my comment on the context searcher.
Thx!
@jimczi Could you please help explaining? |
We can probably fix these tests rather than moving them again into an integration tests. We've made some workaround in the AggregatorTestCase that are now hitting us badly. For instance we allow to use a MultiReader instead of a DirectoryReader so a lot of new tests are using this shortcut even though we always expect a DirectoryReader when used outside of tests. We also have a unit test that simulates the mixing of an unmapped field with a mapped one but the base class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me overall. I left some comments to simplify the logic in the ContextIndexSearcher and to move the cancellation impl in the QueryPhase directly.
void unsetCheckTimeout(); | ||
} | ||
|
||
public static class CancellableImpl implements Cancellable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation should live on the QueryPhase imo. This way you don't need to expose the checkDirReaderCancelled
and checkDirReaderCancelled
in the searcher.
|
||
boolean isEnabled(); | ||
void checkCancelled(); | ||
default void checkDirReaderCancelled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two functions should not be part of the interface, checkCancelled should be enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that much either, I exposed the checkDirReaderCancelled() so that I we can have tests that differentiate between exit in the searcher and exit from the dirReader: https://github.com/elastic/elasticsearch/pull/52822/files#diff-2abce48f3a52657ce3740afade6d5f8fR132
But as discussed I could just unit test the Cancellable reader instead.
/** | ||
* iFace which implements the query timeout / cancellation logic | ||
*/ | ||
public interface Cancellable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you call QueryCancellable ?
@@ -77,10 +89,27 @@ | |||
|
|||
private AggregatedDfs aggregatedDfs; | |||
private QueryProfiler profiler; | |||
private Runnable checkCancelled; | |||
private Holder<Cancellable> cancellable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to use a custom Holder
? Now that cancellable directory reader is forked, you should be able to set the cancellable lazily ? So instead of passing the Cancellable
here you'd set it in ContextIndexSearcher#setCancellable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that we need a reference to the cancellable (the holder actually) during construction of the Reader: https://github.com/elastic/elasticsearch/pull/52822/files#diff-913ad694a7d744ee93ae1dac48d67b0eR403
So that later on with the setCancellable
we can set it to the holder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alreanatively I changed so that the ContextIndexSearcher doesn't know of the holder, but then a cast to CancellableDirectoryReader
is required to call the setCancellable.
} | ||
cancellable.get().run(); | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to copy
@@ -310,6 +297,8 @@ static boolean executeInternal(SearchContext searchContext) throws QueryPhaseExe | |||
queryResult.nodeQueueSize(rExecutor.getCurrentQueueSize()); | |||
queryResult.serviceTimeEWMA((long) rExecutor.getTaskExecutionEWMA()); | |||
} | |||
// Search phase has finished, no longer need to check for timeout | |||
searcher.unsetCheckTimeout(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do this on the cancellable object directly. The searcher shouldn't be able to unset this way. See my comment on the context searcher.
@@ -611,14 +611,8 @@ public void testCacheAggregation() throws IOException { | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you revert this change ? It should work without this modification so I'd like to keep this for a different pr since the issue is not related to the exitable directory reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
@@ -920,14 +920,8 @@ public void testCacheAggregation() throws IOException { | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, this shouldn't be needed with the extra constructor.
@@ -22,7 +22,6 @@ | |||
import org.apache.lucene.document.Document; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional to @jimczi 's.
public void setCheckCancelled(Runnable checkCancelled) { | ||
this.checkCancelled = checkCancelled; | ||
public void setCancellable(Cancellable cancellable) { | ||
this.cancellable.set(cancellable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you check that the argument is not null since we seem to assume it's never null in some places?
@@ -244,7 +270,7 @@ public Scorer scorer(LeafReaderContext context) throws IOException { | |||
public BulkScorer bulkScorer(LeafReaderContext context) throws IOException { | |||
BulkScorer in = weight.bulkScorer(context); | |||
if (in != null) { | |||
return new CancellableBulkScorer(in, checkCancelled); | |||
return new CancellableBulkScorer(in, () -> cancellable.get().checkCancelled()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do the get()
call once for all instead of forcing the bulk scorer to do it every time it wants to check whether the request has been cancelled?
return new CancellableBulkScorer(in, () -> cancellable.get().checkCancelled()); | |
Runnable checkCancelled = cancellable.get(); | |
return new CancellableBulkScorer(in, checkCancelled); |
|
||
@Override | ||
protected DirectoryReader doWrapDirectoryReader(DirectoryReader in) { | ||
return in; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's throw an UnsupportedOperationException? (this is only used when asking a DirectoryReader to take into account some new changes in a directory, which should never happen with this impl)
} | ||
|
||
private void checkAndThrowWithSampling() { | ||
if (calls++ % MAX_CALLS_BEFORE_QUERY_TIMEOUT_CHECK == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to reduce the overhead to a minimum, we could make MAX_CALLS_BEFORE_QUERY_TIMEOUT_CHECK
a power of two minus 1 (e.g. (1 << 4) - 1
, ie. 15
, and use a mask instead of a remainder: (calls++ & MAX_CALLS_BEFORE_QUERY_TIMEOUT_CHECK) == 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain why power of two - 1 is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Masks are typically cheaper than remainders. And a%b
is the same as a & (b-1)
when a is positive and b is a power of two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the iterations @matriv, LGTM!
server/src/main/java/org/elasticsearch/search/internal/ContextIndexSearcher.java
Show resolved
Hide resolved
} | ||
|
||
public void setProfiler(QueryProfiler profiler) { | ||
this.profiler = profiler; | ||
} | ||
|
||
/** | ||
* Set a {@link Runnable} that will be run on a regular basis while | ||
* collecting documents. | ||
* Add a {@link Runnable} that will be run on a regular basis while fetching document from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/while accessing/while fetching document/ this is confusing since we don't check the cancellation when visiting the stored fields ?
@jimczi Thx you too for you patience and all the help and guidance you provided. |
Implement an Exitable DirectoryReader that wraps the original DirectoryReader so that when a search task is cancelled the DirectoryReaders also stop their work fast. This is usuful for expensive operations like wilcard/prefix queries where the DirectoryReaders can spend lots of time and consume resources, as previously their work wouldn't stop even though the original search task was cancelled (e.g. because of timeout or dropped client connection). (cherry picked from commit 67acaf6)
With ExitableDirectoryReader in place, check for query cancellation during QueryPhase#preProcess where the query rewriting takes place. Follows: elastic#52822
With ExitableDirectoryReader in place, check for query cancellation during QueryPhase#preProcess where the query rewriting takes place. Follows: #52822
Benchmarking showed that the effect of the ExitableDirectoryReader is reduced considerably when checking every 4095 docs. Moreover, set the cancellable task before calling QueryPhase.preProcess() and make sure we don't wrap with an ExitableDirectoryReader at all when lowLevelCancellation() is set to false to avoid completely any performance impact. Follows: elastic#52822 Follows: elastic#53166 Follows: elastic#53496
Benchmarking showed that the effect of the ExitableDirectoryReader is reduced considerably when checking every 8191 docs. Moreover, set the cancellable task before calling QueryPhase#preProcess() and make sure we don't wrap with an ExitableDirectoryReader at all when lowLevelCancellation is set to false to avoid completely any performance impact. Follows: #52822 Follows: #53166 Follows: #53496
Benchmarking showed that the effect of the ExitableDirectoryReader is reduced considerably when checking every 8191 docs. Moreover, set the cancellable task before calling QueryPhase#preProcess() and make sure we don't wrap with an ExitableDirectoryReader at all when lowLevelCancellation is set to false to avoid completely any performance impact. Follows: #52822 Follows: #53166 Follows: #53496 (cherry picked from commit cdc377e)
Implement a Cancellable DirectoryReader that wraps the original
DirectoryReader so that when a search task is cancelled the
DirectoryReaders also stop their work fast. This is usuful for
expensive operations like wilcard/prefix queries where the
DirectoryReaders can spend lots of time and consume resources,
as previously their work wouldn't stop even though the original
search task was cancelled (e.g. because of timeout or dropped client
connection).