-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip shard refreshes if shard is search idle
#27500
Conversation
Today we refresh automatically in the backgroud by default very second. This default behavior has a significant impact on indexing performance if the refreshes are not needed. This change introduces a notion of a shard being `search idle` which a shard transitions to after (default) `30s` without any access to an external searcher. Once a shard is search idle all scheduled refreshes will be skipped unless there are any refresh listeners registered. If a search happens on a `serach idle` shard the search request _park_ on a refresh listener and will be executed once the next scheduled refresh occurs. This will also turn the shard into the `non-idle` state immediately. This behavior is only applied if there is no explicit refresh interval set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very familiar with this part of the code base but what I understand makes sense to me. You will probably want to add docs as well.
} while (pendingRefreshLocation.compareAndSet(location, lastWriteLocation) == false); | ||
} | ||
|
||
public void awaitPendingRefresh(Consumer<Boolean> listener) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add javadocs.
if (refreshListeners.refreshNeeded() == false // if we have a listener that is waiting for a refresh we need to force it | ||
&& isSearchIdle() && indexSettings.isExplicitRefresh() == false) { | ||
// lets skip this refresh since we are search idle and | ||
// don't necessarily need to refresh. the next search execute cause a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: truncated comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't fully comment to the integration with the search system but the IndexShard approach is nice. I left some comments for discussion.
protected void shardOperation(GetRequest request, ShardId shardId, ActionListener<GetResponse> listener) throws IOException { | ||
IndexService indexService = indicesService.indexServiceSafe(shardId.getIndex()); | ||
IndexShard indexShard = indexService.getShard(shardId.id()); | ||
indexShard.awaitPendingRefresh(b -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence as to whether we should only do this on non-realtime get. Real time gets don't really relate to refresh cycles (they force a refresh if needed). They are already "efficient" in the sense that they only refresh if they need to (i.e., there's a pending doc change in the version map).
@@ -97,6 +100,19 @@ protected void doExecute(Request request, ActionListener<Response> listener) { | |||
|
|||
protected abstract Response shardOperation(Request request, ShardId shardId) throws IOException; | |||
|
|||
protected void shardOperation(Request request, ShardId shardId, ActionListener<Response> listener) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe call this asyncShardOperation to avoid confusion? also, can you please java doc the fact that this is still called on the networking thread?
protected void shardOperation(TermVectorsRequest request, ShardId shardId, ActionListener<TermVectorsResponse> listener) throws IOException { | ||
IndexService indexService = indicesService.indexServiceSafe(shardId.getIndex()); | ||
IndexShard indexShard = indexService.getShard(shardId.id()); | ||
indexShard.awaitPendingRefresh(b -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here w.r.t real time requests.
// once we change the refresh interval we schedule yet another refresh | ||
// to ensure we are in a clean and predictable state. | ||
// it doesn't matter if we move from or to <code>-1</code> in both cases we want | ||
// docs to become visible immediately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the important part here is the we need to flush any pending search requests which are waiting for the next refresh. If so, can you add it to the comment? It's not trivial to figure out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if that's the case, maybe we're better off doing it in IndexShard#onSettingsChanged
. Then all the corresponding code is contained in the same class.
PS I wonder if we do the right thing with RefreshListeners - we people stop refreshing we should do a refresh and release all pending listeners (and refuse to add new ones). This is, of course, not relating to your PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah so I wonder if we should rather build it into the refresh listener that is forces a refresh if refreshes are disabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure people setting wait_for
will want a immediately
executed because refreshes are set to -1. I'm +1 to solving it in the refresh listeners - out of scope for this PR of course.
private void setRefreshPending() { | ||
Engine engine = getEngine(); | ||
if (isSearchIdle()) { | ||
acquireSearcher("setRefreshPending").close(); // move the shard into non-search idle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a bug I removed it and added a test
IndexShard shard = indexService.getShard(0); | ||
boolean hasRefreshed = shard.scheduledRefresh(); | ||
if (randomTimeValue == TimeValue.ZERO) { | ||
// with ZERO we are guaranteed to see the doc since we will wait for a refresh in the background |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by see the doc? I guess triggering a refresh due to it?
assertFalse(shard.isSearchIdle()); | ||
} | ||
assertHitCount(client().prepareSearch().get(), 1); | ||
for (int i = 1; i < numDocs; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this extra indexing buy us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's just how the test works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do nothing after the indexing is done nor during the indexing?
assertFalse(shard.scheduledRefresh()); | ||
assertTrue(shard.isSearchIdle()); | ||
CountDownLatch refreshLatch = new CountDownLatch(1); | ||
client().admin().indices().prepareRefresh().execute(ActionListener.wrap(refreshLatch::countDown)); // async on purpose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just to speed up the test, right? if so, can you add a comment? if not please tell me what I miss ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to not span a thread, it happens concurrently?
t.join(); | ||
} | ||
|
||
public void testPendingRefreshWithIntervalChange() throws InterruptedException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
settings = Settings.builder().put(settings).put(IndexSettings.INDEX_SEARCH_IDLE_AFTER.getKey(), TimeValue.timeValueMillis(10)) | ||
.build(); | ||
scopedSettings.applySettings(settings); | ||
while (primary.isSearchIdle() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol. assert busy?
@bleskes I pushed changes.. I also run benchmarks and the results are promising:
especially since it seems like the indices that are created by this are much more efficient (better compressed and less merges etc.) |
I like the reduced refresh time, total bytes written and segments memory usage! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @s1monw .
PS - did you purposely skip over #27500 (comment) ? I'm fine is so, but want to make sure it was not a mistake.
I did, I think it's not easy to implement since I don' have a history in that method W.R.T the refresh interval. |
* master: Skip shard refreshes if shard is `search idle` (#27500) Remove workaround in translog rest test (#27530) inner_hits: Return an empty _source for nested inner hit when filtering on a field that doesn't exist. percolator: Avoid TooManyClauses exception if number of terms / ranges is exactly equal to 1024 Dedup translog operations by reading in reverse (#27268) Ensure logging is configured for CLI commands Ensure `doc_stats` are changing even if refresh is disabled (#27505) Fix classes that can exit Revert "Adjust CombinedDeletionPolicy for multiple commits (#27456)" Transpose expected and actual, and remove duplicate info from message. (#27515) [DOCS] Fixed broken link in breaking changes
Once a shard goes inactive we want the shard to be refreshed if the refresh interval is default since we might hold on to unnecessary segments and in the inactive case we stopped indexing and can release old segments. Relates to elastic#27500
Once a shard goes inactive we want the shard to be refreshed if the refresh interval is default since we might hold on to unnecessary segments and in the inactive case we stopped indexing and can release old segments. Relates to #27500
…read The change in elastic#27500 introduces this regression that causes `_get` and `_term_vector` actions to run on the network thread if the realtime flag is set. This fixes the issue by delegating to the super method forking on the corresponding threadpool.
With this change, we will always return true for can_match requests on idle search shards; otherwise, some shards will never get refreshed if all search requests perform the can_match phase (i.e., total shards > pre_filter_shard_size). Relates #27500 Relates #50043 Co-authored-by: Nhat Nguyen <[email protected]>
Today we refresh automatically in the backgroud by default very second.
This default behavior has a significant impact on indexing performance
if the refreshes are not needed.
This change introduces a notion of a shard being
search idle
which ashard transitions to after (default)
30s
without any access to anexternal searcher. Once a shard is search idle all scheduled refreshes
will be skipped unless there are any refresh listeners registered.
If a search happens on a
serach idle
shard the search request parkon a refresh listener and will be executed once the next scheduled refresh
occurs. This will also turn the shard into the
non-idle
state immediately.This behavior is only applied if there is no explicit refresh interval set.