-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default to one shard #30539
Default to one shard #30539
Conversation
This commit changes the default out-of-the-box configuration for the number of shards from five to one. We think this will help address a common problem of oversharding. For users with time-based indices that need a different default, this can be managed with index templates. For users with non-time-based indices that find they need to re-shard with the shrink API in place they no longer need to resort only to reindexing. Since this has the impact of changing the default number of shards used in REST tests, we want to ensure that we still have coverage for issues that could arise from multiple shards. As such, we randomize (rarely) the default number of shards in REST tests to two. This is managed via a global index template. However, some tests check the templates that are in the cluster state during the test. Since this template is randomly there, we need a way for tests to skip adding the template used to set the number of shards to two. For this we add the default_shards feature skip. To avoid having to write our docs in a complicated way because sometimes they might be behind one shard, and sometimes they might be behind two shards we apply the default_shards feature skip to all docs tests. That is, these tests will always run with the default number of shards (one).
Pinging @elastic/es-core-infra |
FYI @elastic/es-clients for the feature skip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, this was much simpler (fewer changes needed) than I would have anticipated!
This commit fixes a few issues in the REST client tests that arose from moving to one shard. Moving to one shard causes a reordering of calculations which impacts floating-point arithmetic. It can also impact scoring and thus the ranking of docs.
This commit fixes a SQL test with security that was expecting five shards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left one comment LGTM otherwise
@@ -367,7 +367,7 @@ public ClusterState execute(ClusterState currentState) throws Exception { | |||
// now, put the request settings, so they override templates | |||
indexSettingsBuilder.put(request.settings()); | |||
if (indexSettingsBuilder.get(SETTING_NUMBER_OF_SHARDS) == null) { | |||
indexSettingsBuilder.put(SETTING_NUMBER_OF_SHARDS, settings.getAsInt(SETTING_NUMBER_OF_SHARDS, 5)); | |||
indexSettingsBuilder.put(SETTING_NUMBER_OF_SHARDS, settings.getAsInt(SETTING_NUMBER_OF_SHARDS, 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be 5 if the index.version.created is < 7.0 I mean we can still have a mixed cluster here no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eacba7f
to
2e89d98
Compare
Ship it |
Let us just hard code the test to always use one shard, it is easier that way.
c91f0b9
to
e5b0f49
Compare
With this commit we set the number of shards explicitly to the default of five shards in order to keep results consistent. Relates elastic/elasticsearch#30539
* master: Deprecate not copy settings and explicitly disallow (elastic#30404) [ML] Improve state persistence log message Build: Add mavenPlugin cluster configuration method (elastic#30541) Re-enable FlushIT tests Bump Gradle heap to 2 GB (elastic#30535) SQL: Use request flavored methods in tests (elastic#30345)
* master: Adjust copy settings versions Mute ShrinkIndexIT suite SQL: SYS TABLES ordered according to *DBC specs (elastic#30530)
Let's make sure we blog about this to explain the reasoning. |
* master: Default to one shard (#30539) Unmute IndexUpgradeIT tests Forbid expensive query parts in ranking evaluation (#30151) Docs: Update HighLevelRestClient migration docs (#30544) Clients: Switch to new performRequest (#30543) [TEST] Fix typo in MovAvgIT test Add missing dependencies on testClasses (#30527) [TEST] Mute ML test that needs updating to following ml-cpp changes Document woes between auto-expand-replicas and allocation filtering (#30531) Moved tokenizers to analysis common module (#30538) Adjust copy settings versions Mute ShrinkIndexIT suite SQL: SYS TABLES ordered according to *DBC specs (#30530) Deprecate not copy settings and explicitly disallow (#30404) [ML] Improve state persistence log message Build: Add mavenPlugin cluster configuration method (#30541) Re-enable FlushIT tests Bump Gradle heap to 2 GB (#30535) SQL: Use request flavored methods in tests (#30345) Suppress hdfsFixture if there are spaces in the path (#30302) Delete temporary blobs before creating index file (#30528) Watcher: Remove TriggerEngine.getJobCount() (#30395) [ML] Fix wire BWC for JobUpdate (#30512) Use simpler write-once semantics for FS repository (#30435) Derive max composite buffers from max content len Use simpler write-once semantics for HDFS repository (#30439) SQL: Improve correctness of SYS COLUMNS & TYPES (#30418) Mute two tests in FlushIT with @AwaitsFix. Fix incorrect template name in test case Build: Remove legacy bwc files from xpack (#30485) Mute UnicastZenPingTests#testSimplePings with @AwaitsFix. Security: cleanup code in file stores (#30348) Security: fix TokenMetaData equals and hashcode (#30347) Mute two tests from SmokeTestWatcherWithSecurityClientYamlTestSuiteIT. Mute SharedClusterSnapshotRestoreIT#testSnapshotSucceedsAfterSnapshotFailure with @AwaitsFix. SQL: Improve compatibility with MS query (#30516) SQL: Fix parsing of dates with milliseconds (#30419)
* es/ccr: (37 commits) Default to one shard (#30539) Unmute IndexUpgradeIT tests Forbid expensive query parts in ranking evaluation (#30151) Docs: Update HighLevelRestClient migration docs (#30544) Clients: Switch to new performRequest (#30543) [TEST] Fix typo in MovAvgIT test Add missing dependencies on testClasses (#30527) [TEST] Mute ML test that needs updating to following ml-cpp changes Document woes between auto-expand-replicas and allocation filtering (#30531) Moved tokenizers to analysis common module (#30538) Adjust copy settings versions Mute ShrinkIndexIT suite SQL: SYS TABLES ordered according to *DBC specs (#30530) Deprecate not copy settings and explicitly disallow (#30404) [ML] Improve state persistence log message Build: Add mavenPlugin cluster configuration method (#30541) Re-enable FlushIT tests Bump Gradle heap to 2 GB (#30535) SQL: Use request flavored methods in tests (#30345) Suppress hdfsFixture if there are spaces in the path (#30302) ...
* es/ccr: (37 commits) Default to one shard (#30539) Unmute IndexUpgradeIT tests Forbid expensive query parts in ranking evaluation (#30151) Docs: Update HighLevelRestClient migration docs (#30544) Clients: Switch to new performRequest (#30543) [TEST] Fix typo in MovAvgIT test Add missing dependencies on testClasses (#30527) [TEST] Mute ML test that needs updating to following ml-cpp changes Document woes between auto-expand-replicas and allocation filtering (#30531) Moved tokenizers to analysis common module (#30538) Adjust copy settings versions Mute ShrinkIndexIT suite SQL: SYS TABLES ordered according to *DBC specs (#30530) Deprecate not copy settings and explicitly disallow (#30404) [ML] Improve state persistence log message Build: Add mavenPlugin cluster configuration method (#30541) Re-enable FlushIT tests Bump Gradle heap to 2 GB (#30535) SQL: Use request flavored methods in tests (#30345) Suppress hdfsFixture if there are spaces in the path (#30302) ...
* es/ccr: (37 commits) Default to one shard (elastic#30539) Unmute IndexUpgradeIT tests Forbid expensive query parts in ranking evaluation (elastic#30151) Docs: Update HighLevelRestClient migration docs (elastic#30544) Clients: Switch to new performRequest (elastic#30543) [TEST] Fix typo in MovAvgIT test Add missing dependencies on testClasses (elastic#30527) [TEST] Mute ML test that needs updating to following ml-cpp changes Document woes between auto-expand-replicas and allocation filtering (elastic#30531) Moved tokenizers to analysis common module (elastic#30538) Adjust copy settings versions Mute ShrinkIndexIT suite SQL: SYS TABLES ordered according to *DBC specs (elastic#30530) Deprecate not copy settings and explicitly disallow (elastic#30404) [ML] Improve state persistence log message Build: Add mavenPlugin cluster configuration method (elastic#30541) Re-enable FlushIT tests Bump Gradle heap to 2 GB (elastic#30535) SQL: Use request flavored methods in tests (elastic#30345) Suppress hdfsFixture if there are spaces in the path (elastic#30302) ...
* es/ccr: (37 commits) Default to one shard (elastic#30539) Unmute IndexUpgradeIT tests Forbid expensive query parts in ranking evaluation (elastic#30151) Docs: Update HighLevelRestClient migration docs (elastic#30544) Clients: Switch to new performRequest (elastic#30543) [TEST] Fix typo in MovAvgIT test Add missing dependencies on testClasses (elastic#30527) [TEST] Mute ML test that needs updating to following ml-cpp changes Document woes between auto-expand-replicas and allocation filtering (elastic#30531) Moved tokenizers to analysis common module (elastic#30538) Adjust copy settings versions Mute ShrinkIndexIT suite SQL: SYS TABLES ordered according to *DBC specs (elastic#30530) Deprecate not copy settings and explicitly disallow (elastic#30404) [ML] Improve state persistence log message Build: Add mavenPlugin cluster configuration method (elastic#30541) Re-enable FlushIT tests Bump Gradle heap to 2 GB (elastic#30535) SQL: Use request flavored methods in tests (elastic#30345) Suppress hdfsFixture if there are spaces in the path (elastic#30302) ...
Update the default number of primary shards to match doc update work done in 4852f34 for PR elastic#30539.
Update the default number of primary shards to match doc update work done in #30539.
With this commit we set the number of shards explicitly to the default of five shards in order to keep results consistent. Relates elastic/elasticsearch#30539 Relates #44
With this commit we set the number of shards explicitly to the default of five shards in order to keep results consistent. Relates elastic/elasticsearch#30539 Relates #44
With this commit we set the number of shards explicitly to the default of five shards in order to keep results consistent. Relates elastic/elasticsearch#30539 Relates #44
Overall this is a welcome change. However I was curious about why the decision for only a single shard instead of two? We are largely unaffected here as we default to two shards in all of our index templates, and then adjust based on the customer's environment and load. We default to two because even on a single node, with a single mount point, all of our benchmarking tests show a 5-10% boost on indexing performance with two shards over only one. Admittedly all of our use-cases are time-series oriented and heavily ingest-biased, so we are always looking to maximize ingest/index performance. Certainly most query-centric use-cases will add replicas, not shards. Really I am just curious. I would have picked two as default, but either is much better than five. |
Align with the new Elasticsearch defaults[1] for number of shards for the indices used in the metrics store. [1] elastic/elasticsearch#30539
@robcowart I am so sorry for the slow reply here. Thanks so much for such a thoughtful question. Briefly, we think that the benefits of keeping shard counts as low as possible will benefit more users than shipping with a default configuration that will be beneficial to high-throughout use-cases. We aim to scalable in this regard, but we have a lot of users for which a single shard will suffice to absorb their write traffic. For users that need to scale, we have many knobs they can tune which include increasing their default shard count as well as splitting indices that were created with a single shard. We encourage such users to do performance testing to ensure that the tradeoff or doubling or more their shard count is worth the performance gain. |
This commit adds a migration note regarding the default number of shards changing from five to one. Relates #30539
This commit changes the default out-of-the-box configuration for the number of shards from five to one. We think this will help address a common problem of oversharding. For users with time-based indices that need a different default, this can be managed with index templates. For users with non-time-based indices that find they need to re-shard with the split API in place they no longer need to resort only to reindexing.
Since this has the impact of changing the default number of shards used in REST tests, we want to ensure that we still have coverage for issues that could arise from multiple shards. As such, we randomize (rarely) the default number of shards in REST tests to two. This is managed via a global index template. However, some tests check the templates that are in the cluster state during the test. Since this template is randomly there, we need a way for tests to skip adding the template used to set the number of shards to two. For this we add the default_shards feature skip. To avoid having to write our docs in a complicated way because sometimes they might be behind one shard, and sometimes they might be behind two shards we apply the default_shards feature skip to all docs tests. That is, these tests will always run with the default number of shards (one).