Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soft limit added for max number of search contexts #29252

Closed

Conversation

diwasjoshi
Copy link
Contributor

This change introduces a soft limit on the number of search contexts that can be made per node. The setting can be changed per index using the node.max_search_context setting.
Relates to 25244

Checklist

  • Have you signed the contributor license agreement? Yes
  • Have you followed the contributor guidelines? Yes
  • If submitting code, have you built your formula locally prior to submission with gradle check? Yes. Tests failing. Review needed before proceeding.
  • If submitting code, is your pull request against master? Unless there is a good reason otherwise, we prefer pull requests against master and will backport as needed. Yes
  • If submitting code, have you checked that your submission is for an OS that we support? Yes
  • If you are submitting this code for a class then read our policy for that. NA

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

1 similar comment
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@diwasjoshi diwasjoshi force-pushed the fix-softLimitSearchContexts branch 2 times, most recently from 289c092 to 3587714 Compare March 26, 2018 21:11
@diwasjoshi diwasjoshi force-pushed the fix-softLimitSearchContexts branch from 3587714 to 2ee128d Compare March 26, 2018 21:12
@jimczi jimczi added the :Search/Search Search-related issues that do not fall into other categories label Mar 29, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @diwasjoshi ! I left some comments regarding formatting, naming and when we should check the search contexts size. The default limit might be too low since it applies to all search (not only scroll) so I'd set something like 500 or 1000. Though this might be too big for scrolls so maybe we should apply this limit to scrolls only ?

@@ -227,10 +229,10 @@ private void setDefaultSearchTimeout(TimeValue defaultSearchTimeout) {
private void setDefaultAllowPartialSearchResults(boolean defaultAllowPartialSearchResults) {
this.defaultAllowPartialSearchResults = defaultAllowPartialSearchResults;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this change is not needed

this::setDefaultAllowPartialSearchResults);



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is not needed

@@ -161,7 +163,7 @@
private volatile TimeValue defaultSearchTimeout;

private volatile boolean defaultAllowPartialSearchResults;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please keep the formatting

@@ -198,10 +200,10 @@ public SearchService(ClusterService clusterService, IndicesService indicesServic
clusterService.getClusterSettings().addSettingsUpdateConsumer(DEFAULT_SEARCH_TIMEOUT_SETTING, this::setDefaultSearchTimeout);

defaultAllowPartialSearchResults = DEFAULT_ALLOW_PARTIAL_SEARCH_RESULTS.get(settings);
clusterService.getClusterSettings().addSettingsUpdateConsumer(DEFAULT_ALLOW_PARTIAL_SEARCH_RESULTS,
clusterService.getClusterSettings().addSettingsUpdateConsumer(DEFAULT_ALLOW_PARTIAL_SEARCH_RESULTS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be added in the SearchService.

* maximum of 100 is defensive to prevent generating too many search contexts.
*/
public static final Setting<Integer> MAX_SEARCH_CONTEXT_SETTING =
Setting.intSetting("node.max_search_context", 100, 0, Property.NodeScope);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move this setting to SearchService. Also I would name it search.max_open_context or something similar, it is a per node setting but it is related to search so this would be consistent with the other search setting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 is too low in my opinion as this could be lower than the number of threads in the search thread pool on nodes that have many CPUs.

@@ -566,6 +568,14 @@ final SearchContext createAndPutContext(ShardSearchRequest request) throws IOExc

final SearchContext createContext(ShardSearchRequest request) throws IOException {
final DefaultSearchContext context = createSearchContext(request, defaultSearchTimeout);
System.out.println("----activecontexts----" + activeContexts.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this or use the logger instead.

@@ -566,6 +568,14 @@ final SearchContext createAndPutContext(ShardSearchRequest request) throws IOExc

final SearchContext createContext(ShardSearchRequest request) throws IOException {
final DefaultSearchContext context = createSearchContext(request, defaultSearchTimeout);
System.out.println("----activecontexts----" + activeContexts.size());
if (activeContexts.size() >= Node.MAX_SEARCH_CONTEXT_SETTING.get(settings)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can check the size before creating the search context otherwise you need to close it. Though this method is not synchronized so you may not get an accurate count on the concurrent map. Not sure if it's an issue though, the alternative would be to synchronize the addition of a new search context in the map and the check of the size but that would add contention in the search service. @s1monw any opinion on this ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calling ConcurrentHashMap#size() can be quite expensive IMO. I think we should keep track of the open ctx in a counter instead of using the map. I don't think being a little off here makes a difference. I think we don't need to add any sychronization changes here.

@@ -409,7 +447,7 @@ public void testCanMatch() throws IOException {
Strings.EMPTY_ARRAY, false, new AliasFilter(null, Strings.EMPTY_ARRAY), 1f, allowPartialSearchResults)));

assertTrue(service.canMatch(new ShardSearchLocalRequest(indexShard.shardId(), 1, SearchType.QUERY_THEN_FETCH,
new SearchSourceBuilder(), Strings.EMPTY_ARRAY, false, new AliasFilter(null, Strings.EMPTY_ARRAY), 1f,
new SearchSourceBuilder(), Strings.EMPTY_ARRAY, false, new AliasFilter(null, Strings.EMPTY_ARRAY), 1f,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not needed

@diwasjoshi diwasjoshi force-pushed the fix-softLimitSearchContexts branch 6 times, most recently from c371373 to 4aad80a Compare April 1, 2018 21:00
@diwasjoshi diwasjoshi force-pushed the fix-softLimitSearchContexts branch from 4aad80a to 044d013 Compare April 1, 2018 21:01
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some additional comments to Jim's.

try (SearchContext context = service.createAndPutContext(new ShardSearchLocalRequest(indexShard.shardId(), 1, SearchType.DEFAULT,
new SearchSourceBuilder(), new String[0], false, new AliasFilter(null, Strings.EMPTY_ARRAY), 1.0f, true))) {
assertNotNull(context);
} catch (IllegalStateException ex) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use expectThrows to check for the exception/message?

* maximum of 100 is defensive to prevent generating too many search contexts.
*/
public static final Setting<Integer> MAX_SEARCH_CONTEXT_SETTING =
Setting.intSetting("node.max_search_context", 100, 0, Property.NodeScope);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 is too low in my opinion as this could be lower than the number of threads in the search thread pool on nodes that have many CPUs.

@@ -566,6 +568,13 @@ final SearchContext createAndPutContext(ShardSearchRequest request) throws IOExc

final SearchContext createContext(ShardSearchRequest request) throws IOException {
final DefaultSearchContext context = createSearchContext(request, defaultSearchTimeout);
if (activeContexts.size() >= Node.MAX_SEARCH_CONTEXT_SETTING.get(settings)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we rather add this check at the beginning of createAndPutContext? This method only creates a new context, it doesn't add it to the list of active contexts?

@@ -566,6 +568,13 @@ final SearchContext createAndPutContext(ShardSearchRequest request) throws IOExc

final SearchContext createContext(ShardSearchRequest request) throws IOException {
final DefaultSearchContext context = createSearchContext(request, defaultSearchTimeout);
if (activeContexts.size() >= Node.MAX_SEARCH_CONTEXT_SETTING.get(settings)) {
throw new IllegalStateException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ElasticsearchException would allow to return a HTTP status code that indicates that the request should be retried later, this would be better than a 500/Internal Error?

@jimczi jimczi added the discuss label Apr 9, 2018
@jimczi
Copy link
Contributor

jimczi commented Apr 9, 2018

We discussed with @jpountz and agreed that we should discuss the value for the default limit. 100 is too low and we need to decide if scrolls should be included in the limit. We'll discuss this internally in our search meeting and I'll update this issue when we reached an agreement.

@diwasjoshi diwasjoshi force-pushed the fix-softLimitSearchContexts branch from ee3d06d to 9e84a7a Compare April 16, 2018 13:37
@diwasjoshi
Copy link
Contributor Author

@jimczi @jpountz I have fixed all review changes, including the change suggested by @s1monw for making a new counter. I will wait for update on default limit from your side.

@@ -547,6 +557,13 @@ private SearchContext findContext(long id, TransportRequest request) throws Sear
}

final SearchContext createAndPutContext(ShardSearchRequest request) throws IOException {
if (numActiveContexts >= MAX_OPEN_CONTEXT.get(settings)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this function be called by multiple threads at the same time? And if so, should numActiveContexts be volatile?

Copy link
Contributor

@jimczi jimczi Apr 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can @mayya-sharipova but if we restrict the counting to scroll queries let's use an AtomicInteger. The overhead should be limited since we create the search context for a scroll query only on the initial request.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @diwasjoshi . We discussed internally and decided that this setting should only be applied to scroll query (request.scroll() != null). This way it is easier to choose a default value (we agreed on 500 being a good start) and we don't mix queries in the same counter. Would you mind changing this pr to restrict the counting to scroll queries ?

@colings86
Copy link
Contributor

@diwasjoshi are you still interested in working on this change?

@diwasjoshi
Copy link
Contributor Author

@colings86 sorry I couldn't work on this, I will work on the review changes this week.

@pgomulka
Copy link
Contributor

pgomulka commented Oct 1, 2018

@diwasjoshi feels like you were close to finish this. Are you still intending to work on this issue?

@colings86
Copy link
Contributor

Since there hasn't been any activity on this PR for a while I am inclined to close it for now. @diwasjoshi if you are still interested in working on this and have time to update the PR with the latest master branch and address the review comments then please reopen this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants