-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Soft limit added for max number of search contexts #29252
Conversation
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
1 similar comment
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
289c092
to
3587714
Compare
3587714
to
2ee128d
Compare
Pinging @elastic/es-search-aggs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @diwasjoshi ! I left some comments regarding formatting, naming and when we should check the search contexts size. The default limit might be too low since it applies to all search (not only scroll) so I'd set something like 500 or 1000. Though this might be too big for scrolls so maybe we should apply this limit to scrolls only ?
@@ -227,10 +229,10 @@ private void setDefaultSearchTimeout(TimeValue defaultSearchTimeout) { | |||
private void setDefaultAllowPartialSearchResults(boolean defaultAllowPartialSearchResults) { | |||
this.defaultAllowPartialSearchResults = defaultAllowPartialSearchResults; | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this change is not needed
this::setDefaultAllowPartialSearchResults); | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is not needed
@@ -161,7 +163,7 @@ | |||
private volatile TimeValue defaultSearchTimeout; | |||
|
|||
private volatile boolean defaultAllowPartialSearchResults; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please keep the formatting
@@ -198,10 +200,10 @@ public SearchService(ClusterService clusterService, IndicesService indicesServic | |||
clusterService.getClusterSettings().addSettingsUpdateConsumer(DEFAULT_SEARCH_TIMEOUT_SETTING, this::setDefaultSearchTimeout); | |||
|
|||
defaultAllowPartialSearchResults = DEFAULT_ALLOW_PARTIAL_SEARCH_RESULTS.get(settings); | |||
clusterService.getClusterSettings().addSettingsUpdateConsumer(DEFAULT_ALLOW_PARTIAL_SEARCH_RESULTS, | |||
clusterService.getClusterSettings().addSettingsUpdateConsumer(DEFAULT_ALLOW_PARTIAL_SEARCH_RESULTS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be added in the SearchService.
* maximum of 100 is defensive to prevent generating too many search contexts. | ||
*/ | ||
public static final Setting<Integer> MAX_SEARCH_CONTEXT_SETTING = | ||
Setting.intSetting("node.max_search_context", 100, 0, Property.NodeScope); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can move this setting to SearchService. Also I would name it search.max_open_context
or something similar, it is a per node setting but it is related to search so this would be consistent with the other search setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100 is too low in my opinion as this could be lower than the number of threads in the search thread pool on nodes that have many CPUs.
@@ -566,6 +568,14 @@ final SearchContext createAndPutContext(ShardSearchRequest request) throws IOExc | |||
|
|||
final SearchContext createContext(ShardSearchRequest request) throws IOException { | |||
final DefaultSearchContext context = createSearchContext(request, defaultSearchTimeout); | |||
System.out.println("----activecontexts----" + activeContexts.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this or use the logger instead.
@@ -566,6 +568,14 @@ final SearchContext createAndPutContext(ShardSearchRequest request) throws IOExc | |||
|
|||
final SearchContext createContext(ShardSearchRequest request) throws IOException { | |||
final DefaultSearchContext context = createSearchContext(request, defaultSearchTimeout); | |||
System.out.println("----activecontexts----" + activeContexts.size()); | |||
if (activeContexts.size() >= Node.MAX_SEARCH_CONTEXT_SETTING.get(settings)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can check the size before creating the search context otherwise you need to close it. Though this method is not synchronized so you may not get an accurate count on the concurrent map. Not sure if it's an issue though, the alternative would be to synchronize the addition of a new search context in the map and the check of the size but that would add contention in the search service. @s1monw any opinion on this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calling ConcurrentHashMap#size()
can be quite expensive IMO. I think we should keep track of the open ctx in a counter instead of using the map. I don't think being a little off here makes a difference. I think we don't need to add any sychronization changes here.
@@ -409,7 +447,7 @@ public void testCanMatch() throws IOException { | |||
Strings.EMPTY_ARRAY, false, new AliasFilter(null, Strings.EMPTY_ARRAY), 1f, allowPartialSearchResults))); | |||
|
|||
assertTrue(service.canMatch(new ShardSearchLocalRequest(indexShard.shardId(), 1, SearchType.QUERY_THEN_FETCH, | |||
new SearchSourceBuilder(), Strings.EMPTY_ARRAY, false, new AliasFilter(null, Strings.EMPTY_ARRAY), 1f, | |||
new SearchSourceBuilder(), Strings.EMPTY_ARRAY, false, new AliasFilter(null, Strings.EMPTY_ARRAY), 1f, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not needed
c371373
to
4aad80a
Compare
4aad80a
to
044d013
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some additional comments to Jim's.
try (SearchContext context = service.createAndPutContext(new ShardSearchLocalRequest(indexShard.shardId(), 1, SearchType.DEFAULT, | ||
new SearchSourceBuilder(), new String[0], false, new AliasFilter(null, Strings.EMPTY_ARRAY), 1.0f, true))) { | ||
assertNotNull(context); | ||
} catch (IllegalStateException ex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use expectThrows
to check for the exception/message?
* maximum of 100 is defensive to prevent generating too many search contexts. | ||
*/ | ||
public static final Setting<Integer> MAX_SEARCH_CONTEXT_SETTING = | ||
Setting.intSetting("node.max_search_context", 100, 0, Property.NodeScope); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100 is too low in my opinion as this could be lower than the number of threads in the search thread pool on nodes that have many CPUs.
@@ -566,6 +568,13 @@ final SearchContext createAndPutContext(ShardSearchRequest request) throws IOExc | |||
|
|||
final SearchContext createContext(ShardSearchRequest request) throws IOException { | |||
final DefaultSearchContext context = createSearchContext(request, defaultSearchTimeout); | |||
if (activeContexts.size() >= Node.MAX_SEARCH_CONTEXT_SETTING.get(settings)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we rather add this check at the beginning of createAndPutContext
? This method only creates a new context, it doesn't add it to the list of active contexts?
@@ -566,6 +568,13 @@ final SearchContext createAndPutContext(ShardSearchRequest request) throws IOExc | |||
|
|||
final SearchContext createContext(ShardSearchRequest request) throws IOException { | |||
final DefaultSearchContext context = createSearchContext(request, defaultSearchTimeout); | |||
if (activeContexts.size() >= Node.MAX_SEARCH_CONTEXT_SETTING.get(settings)) { | |||
throw new IllegalStateException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ElasticsearchException would allow to return a HTTP status code that indicates that the request should be retried later, this would be better than a 500/Internal Error
?
We discussed with @jpountz and agreed that we should discuss the value for the default limit. |
ee3d06d
to
9e84a7a
Compare
@@ -547,6 +557,13 @@ private SearchContext findContext(long id, TransportRequest request) throws Sear | |||
} | |||
|
|||
final SearchContext createAndPutContext(ShardSearchRequest request) throws IOException { | |||
if (numActiveContexts >= MAX_OPEN_CONTEXT.get(settings)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this function be called by multiple threads at the same time? And if so, should numActiveContexts
be volatile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it can @mayya-sharipova but if we restrict the counting to scroll
queries let's use an AtomicInteger. The overhead should be limited since we create the search context for a scroll query only on the initial request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @diwasjoshi . We discussed internally and decided that this setting should only be applied to scroll
query (request.scroll() != null
). This way it is easier to choose a default value (we agreed on 500
being a good start) and we don't mix queries in the same counter. Would you mind changing this pr to restrict the counting to scroll
queries ?
@diwasjoshi are you still interested in working on this change? |
@colings86 sorry I couldn't work on this, I will work on the review changes this week. |
@diwasjoshi feels like you were close to finish this. Are you still intending to work on this issue? |
Since there hasn't been any activity on this PR for a while I am inclined to close it for now. @diwasjoshi if you are still interested in working on this and have time to update the PR with the latest master branch and address the review comments then please reopen this PR |
This change introduces a soft limit on the number of search contexts that can be made per node. The setting can be changed per index using the node.max_search_context setting.
Relates to 25244
Checklist