-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cluster-wide shard limit warnings #34021
Add cluster-wide shard limit warnings #34021
Conversation
In a future major version, we will be introducing a soft limit on the number of shards in a cluster based on the number of nodes in the cluster. This limit will be configurable, and checked on operations which create or open shards and issue a warning if the operation would take the cluster over the limit. There is an option to enable strict enforcement of the limit, which turns the warnings into errors. In a future release, the option will be removed and strict enforcement will be the default (and only) behavior.
Pinging @elastic/es-core-infra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some documentation comments. The code seems good to me but it would be worth someone more familiar with this area of the code taking a look
NOTE: `cluster.shards.enforce_max_per_node` cannot be set to `false`, as this | ||
setting will be removed in 7.0 and the limit will always be enforced. To return | ||
to the default behavior for your Elasticsearch version, set this setting to | ||
`"default"`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to have a "default"
value for the setting here? There are two reasons why I am concerned about having this:
- It means the setting accepts values of different types (boolean and String) which we have tried to avoid and remove instances of in other APIs
- Users who set the setting to
"default"
explicitly are going to need to make a subsequent change to their settings in 8.0 (I presume?) to remove the setting which will no longer be valid
Instead could we maybe have the default behaviour enabled if the setting is not set, meaning that users who want to maintain the default behaviour through the version changes don't end up defining this setting and so don't need to make any setting changes at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this based on a conversation with @jasontedor a while ago, where he made very similar comments, but I think I misunderstood what he was suggesting at the time. I'll reevaluate this setting and change it as appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just talked with @jasontedor again - this setting is going to go away and become a system property, which can only be unset or true
.
|
||
==== Cluster Shard Limit | ||
|
||
In a Elasticsearch 7.0 and later, there will be a soft cap on the number of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call it a "soft limit" to be in line with the terminology on similar settings elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll change all the instances of "cap" to "limit" - thanks!
If the cluster is already over the cap, due to changes in node membership or | ||
setting changes, all operations that create or open indices will issue warnings | ||
or fail until either the cap is increased as described below, or some indices | ||
are closed or deleted to bring the number of shards below the cap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above I wonder if we should use "limit" instead of "cap"?
@jasontedor Please review this. 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good, I left some comments, mostly about changing the keys.
In a Elasticsearch 7.0 and later, there will be a soft limit on the number of | ||
shards in a cluster, based on the number of nodes in the cluster. This is | ||
intended to prevent operations which may unintentionally destabilize the | ||
cluster. Until 7.0, actions which would result in the cluster going over the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until
-> Prior to
cluster. Until 7.0, actions which would result in the cluster going over the | ||
limit will issue a deprecation warning. | ||
|
||
NOTE: You can set the system property `es.enforce.shard_limit` to `true` to opt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to namespace this under enforce
, so es.enforce.shard_limit
-> es.enforce_shard_limit
. And perhaps it should more closely reflect the name of the setting, so es.enforce.shard_limit
-> es.enforce_max_shards_per_node
.
If the cluster is already over the limit, due to changes in node membership or | ||
setting changes, all operations that create or open indices will issue warnings | ||
until either the limit is increased as described below, or some indices | ||
are closed or deleted to bring the number of shards below the limit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you link to the sections of the documentation relevant to closing, and separately deleting an index?
The limit defaults to 1,000 shards per node, and be dynamically adjusted using | ||
the following property: | ||
|
||
`cluster.shards.max_per_node`:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am doubting whether this needs to be in a shards
namespace. How about cluster.shards.max_per_node
-> cluster.max_shards_per_node
.
|
||
==== Cluster-wide shard soft limit | ||
Clusters now have soft limits on the total number of open shards in the cluster | ||
based on the number of nodes and the `cluster.shards.max_per_node` cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about cluster.shards.max_per_node
-> cluster.max_shards_per_node`.
@@ -156,6 +158,20 @@ | |||
public static final String INDICES_SHARDS_CLOSED_TIMEOUT = "indices.shards_closed_timeout"; | |||
public static final Setting<TimeValue> INDICES_CACHE_CLEAN_INTERVAL_SETTING = | |||
Setting.positiveTimeSetting("indices.cache.cleanup_interval", TimeValue.timeValueMinutes(1), Property.NodeScope); | |||
private static final boolean ENFORCE_SHARD_LIMIT; | |||
static { | |||
final String ENFORCE_SHARD_LIMIT_KEY = "es.enforce.shard_limit"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
es.enforce_shard_limit
-> es.enforce_max_shards_per_node
.
private static final boolean ENFORCE_SHARD_LIMIT; | ||
static { | ||
final String ENFORCE_SHARD_LIMIT_KEY = "es.enforce.shard_limit"; | ||
final String enforceShardLimitSetting = System.getProperty(ENFORCE_SHARD_LIMIT_KEY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enforceShardLimitSetting
-> enforceMaxShardsPerNode
@@ -156,6 +158,20 @@ | |||
public static final String INDICES_SHARDS_CLOSED_TIMEOUT = "indices.shards_closed_timeout"; | |||
public static final Setting<TimeValue> INDICES_CACHE_CLEAN_INTERVAL_SETTING = | |||
Setting.positiveTimeSetting("indices.cache.cleanup_interval", TimeValue.timeValueMinutes(1), Property.NodeScope); | |||
private static final boolean ENFORCE_SHARD_LIMIT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ENFORCE_SHARD_LIMIT
-> ENFORCE_MAX_SHARDS_PER_NODE
.
/** | ||
* Checks to see if an operation can be performed without taking the cluster | ||
* over the cluster-wide shard limit. Adds a deprecation warning or returns | ||
* an error message as appropriate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to wrap these so narrowly, we can use the full 140-column line length here.
Thanks! I've addressed your comments, can you re-review @jasontedor? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks for the reviews! |
In a future major version, we will be introducing a soft limit on the number of shards in a cluster based on the number of nodes in the cluster. This limit will be configurable, and checked on operations which create or open shards and issue a warning if the operation would take the cluster over the limit. There is an option to enable strict enforcement of the limit, which turns the warnings into errors. In a future release, the option will be removed and strict enforcement will be the default (and only) behavior.
In a future major version, we will be introducing a soft limit on the number of shards in a cluster based on the number of nodes in the cluster. This limit will be configurable, and checked on operations which create or open shards and issue a warning if the operation would take the cluster over the limit. There is an option to enable strict enforcement of the limit, which turns the warnings into errors. In a future release, the option will be removed and strict enforcement will be the default (and only) behavior.
In a future major version, we will be introducing a soft limit on the number of shards in a cluster based on the number of nodes in the cluster. This limit will be configurable, and checked on operations which create or open shards and issue a warning if the operation would take the cluster over the limit. There is an option to enable strict enforcement of the limit, which turns the warnings into errors. In a future release, the option will be removed and strict enforcement will be the default (and only) behavior.
Frozen indices (partial searchable snapshots) require less heap per shard and the limit can therefore be raised for those. We pick 3000 frozen shards per frozen data node, since we think 2000 is reasonable to use in production. Relates elastic#71042 and elastic#34021
In a future major version, we will be introducing a soft limit on the
number of shards in a cluster based on the number of nodes in the
cluster. This is intended to prevent operations which may
unintentionally destabilize the cluster.
This limit is configurable, and checked on operations which create or
open shards and issue a warning if the operation would take the
cluster over the configured limit.
There is an option to enable strict enforcement of the limit, which
turns the warnings into errors. In a future release, the option will be
removed and strict enforcement will be the default (and only) behavior.
This PR will be followed by a 7.0-only PR which removes the enforcement
option and deprecation warnings and always enforces the limit.
Relates to #20705.
This is take 2 of #32856