Skip to content

Commit

Permalink
[DOCS] Merges list of discovery and cluster formation settings (#36909)
Browse files Browse the repository at this point in the history
  • Loading branch information
lcawl authored Dec 21, 2018
1 parent c8a8391 commit 33e9cf3
Show file tree
Hide file tree
Showing 10 changed files with 217 additions and 195 deletions.
24 changes: 7 additions & 17 deletions docs/reference/modules/discovery.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -40,22 +40,15 @@ module. This module is divided into the following sections:
Cluster state publishing is the process by which the elected master node
updates the cluster state on all the other nodes in the cluster.

<<no-master-block>>::

The no-master block is put in place when there is no known elected master,
and can be configured to determine which operations should be rejected when
it is in place.

Advanced settings::

There are settings that allow advanced users to influence the
<<master-election-settings,master election>> and
<<fault-detection-settings,fault detection>> processes.

<<modules-discovery-quorums>>::

This section describes the detailed design behind the master election and
auto-reconfiguration logic.

<<modules-discovery-settings,Settings>>::

There are settings that enable users to influence the discovery, cluster
formation, master election and fault detection processes.

include::discovery/discovery.asciidoc[]

Expand All @@ -65,11 +58,8 @@ include::discovery/adding-removing-nodes.asciidoc[]

include::discovery/publishing.asciidoc[]

include::discovery/no-master-block.asciidoc[]

include::discovery/master-election.asciidoc[]
include::discovery/quorums.asciidoc[]

include::discovery/fault-detection.asciidoc[]

include::discovery/quorums.asciidoc[]

include::discovery/discovery-settings.asciidoc[]
31 changes: 16 additions & 15 deletions docs/reference/modules/discovery/adding-removing-nodes.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,15 @@ desirable to add or remove some master-eligible nodes to or from a cluster.

==== Adding master-eligible nodes

If you wish to add some master-eligible nodes to your cluster, simply configure
the new nodes to find the existing cluster and start them up. Elasticsearch will
add the new nodes to the voting configuration if it is appropriate to do so.
If you wish to add some nodes to your cluster, simply configure the new nodes
to find the existing cluster and start them up. Elasticsearch adds the new nodes
to the voting configuration if it is appropriate to do so.

During master election or when joining an existing formed cluster, a node
sends a join request to the master in order to be officially added to the
cluster. You can use the `cluster.join.timeout` setting to configure how long a
node waits after sending a request to join a cluster. Its default value is `30s`.
See <<modules-discovery-settings>>.

==== Removing master-eligible nodes

Expand Down Expand Up @@ -93,18 +99,13 @@ GET /_cluster/state?filter_path=metadata.cluster_coordination.voting_config_excl
--------------------------------------------------
// CONSOLE

This list is limited in size by the following setting:

`cluster.max_voting_config_exclusions`::

Sets a limits on the number of voting configuration exclusions at any one
time. Defaults to `10`.

Since voting configuration exclusions are persistent and limited in number, they
must be cleaned up. Normally an exclusion is added when performing some
maintenance on the cluster, and the exclusions should be cleaned up when the
maintenance is complete. Clusters should have no voting configuration exclusions
in normal operation.
This list is limited in size by the `cluster.max_voting_config_exclusions`
setting, which defaults to `10`. See <<modules-discovery-settings>>. Since
voting configuration exclusions are persistent and limited in number, they must
be cleaned up. Normally an exclusion is added when performing some maintenance
on the cluster, and the exclusions should be cleaned up when the maintenance is
complete. Clusters should have no voting configuration exclusions in normal
operation.

If a node is excluded from the voting configuration because it is to be shut
down permanently, its exclusion can be removed after it is shut down and removed
Expand Down
18 changes: 6 additions & 12 deletions docs/reference/modules/discovery/bootstrapping.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,13 @@ more of the master-eligible nodes in the cluster. This is known as _cluster
bootstrapping_. This is only required the very first time the cluster starts
up: nodes that have already joined a cluster store this information in their
data folder and freshly-started nodes that are joining an existing cluster
obtain this information from the cluster's elected master. This information is
given using this setting:
obtain this information from the cluster's elected master.

`cluster.initial_master_nodes`::

Sets a list of the <<node.name,node names>> or transport addresses of the
initial set of master-eligible nodes in a brand-new cluster. By default
this list is empty, meaning that this node expects to join a cluster that
has already been bootstrapped.

This setting can be given on the command line or in the `elasticsearch.yml`
configuration file when starting up a master-eligible node. Once the cluster
has formed this setting is no longer required and is ignored. It need not be set
The initial set of master-eligible nodes is defined in the
<<initial_master_nodes,`cluster.initial_master_nodes` setting>>. When you
start a master-eligible node, you can provide this setting on the command line
or in the `elasticsearch.yml` file. After the cluster has formed, this setting
is no longer required and is ignored. It need not be set
on master-ineligible nodes, nor on master-eligible nodes that are started to
join an existing cluster. Note that master-eligible nodes should use storage
that persists across restarts. If they do not, and
Expand Down
160 changes: 160 additions & 0 deletions docs/reference/modules/discovery/discovery-settings.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
[[modules-discovery-settings]]
=== Discovery and cluster formation settings

Discovery and cluster formation are affected by the following settings:

[[master-election-settings]]`cluster.election.back_off_time`::

Sets the amount to increase the upper bound on the wait before an election
on each election failure. Note that this is _linear_ backoff. This defaults
to `100ms`

`cluster.election.duration`::

Sets how long each election is allowed to take before a node considers it to
have failed and schedules a retry. This defaults to `500ms`.

`cluster.election.initial_timeout`::

Sets the upper bound on how long a node will wait initially, or after the
elected master fails, before attempting its first election. This defaults
to `100ms`.


`cluster.election.max_timeout`::

Sets the maximum upper bound on how long a node will wait before attempting
an first election, so that an network partition that lasts for a long time
does not result in excessively sparse elections. This defaults to `10s`

[[fault-detection-settings]]`cluster.fault_detection.follower_check.interval`::

Sets how long the elected master waits between follower checks to each
other node in the cluster. Defaults to `1s`.

`cluster.fault_detection.follower_check.timeout`::

Sets how long the elected master waits for a response to a follower check
before considering it to have failed. Defaults to `30s`.

`cluster.fault_detection.follower_check.retry_count`::

Sets how many consecutive follower check failures must occur to each node
before the elected master considers that node to be faulty and removes it
from the cluster. Defaults to `3`.

`cluster.fault_detection.leader_check.interval`::

Sets how long each node waits between checks of the elected master.
Defaults to `1s`.

`cluster.fault_detection.leader_check.timeout`::

Sets how long each node waits for a response to a leader check from the
elected master before considering it to have failed. Defaults to `30s`.

`cluster.fault_detection.leader_check.retry_count`::

Sets how many consecutive leader check failures must occur before a node
considers the elected master to be faulty and attempts to find or elect a
new master. Defaults to `3`.

`cluster.follower_lag.timeout`::

Sets how long the master node waits to receive acknowledgements for cluster
state updates from lagging nodes. The default value is `90s`. If a node does
not successfully apply the cluster state update within this period of time,
it is considered to have failed and is removed from the cluster. See
<<cluster-state-publishing>>.

`cluster.initial_master_nodes`::

Sets a list of the <<node.name,node names>> or transport addresses of the
initial set of master-eligible nodes in a brand-new cluster. By default
this list is empty, meaning that this node expects to join a cluster that
has already been bootstrapped. See <<initial_master_nodes>>.

`cluster.join.timeout`::

Sets how long a node will wait after sending a request to join a cluster
before it considers the request to have failed and retries. Defaults to
`60s`.

`cluster.max_voting_config_exclusions`::

Sets a limit on the number of voting configuration exclusions at any one
time. The default value is `10`. See
<<modules-discovery-adding-removing-nodes>>.

`cluster.publish.timeout`::

Sets how long the master node waits for each cluster state update to be
completely published to all nodes. The default value is `30s`. If this
period of time elapses, the cluster state change is rejected. See
<<cluster-state-publishing>>.

`discovery.cluster_formation_warning_timeout`::

Sets how long a node will try to form a cluster before logging a warning
that the cluster did not form. Defaults to `10s`. If a cluster has not
formed after `discovery.cluster_formation_warning_timeout` has elapsed then
the node will log a warning message that starts with the phrase `master not discovered` which describes the current state of the discovery process.

`discovery.find_peers_interval`::

Sets how long a node will wait before attempting another discovery round.
Defaults to `1s`.

`discovery.probe.connect_timeout`::

Sets how long to wait when attempting to connect to each address. Defaults
to `3s`.

`discovery.probe.handshake_timeout`::

Sets how long to wait when attempting to identify the remote node via a
handshake. Defaults to `1s`.

`discovery.request_peers_timeout`::
Sets how long a node will wait after asking its peers again before
considering the request to have failed. Defaults to `3s`.

`discovery.zen.hosts_provider`::
Specifies which type of <<built-in-hosts-providers,hosts provider>> provides
the list of seed nodes. By default, it is the
<<settings-based-hosts-provider,settings-based hosts provider>>.

[[no-master-block]]`discovery.zen.no_master_block`::
Specifies which operations are rejected when there is no active master in a
cluster. This setting has two valid values:
+
--
`all`::: All operations on the node (both read and write operations) are rejected.
This also applies for API cluster state read or write operations, like the get
index settings, put mapping and cluster state API.

`write`::: (default) Write operations are rejected. Read operations succeed,
based on the last known cluster configuration. This situation may result in
partial reads of stale data as this node may be isolated from the rest of the
cluster.

[NOTE]
===============================
* The `discovery.zen.no_master_block` setting doesn't apply to nodes-based APIs
(for example, cluster stats, node info, and node stats APIs). Requests to these
APIs are not be blocked and can run on any available node.
* For the cluster to be fully operational, it must have an active master.
===============================
--

`discovery.zen.ping.unicast.hosts`::

Provides a list of master-eligible nodes in the cluster. The list contains
either an array of hosts or a comma-delimited string. Each value has the
format `host:port` or `host`, where `port` defaults to the setting `transport.profiles.default.port`. Note that IPv6 hosts must be bracketed.
The default value is `127.0.0.1, [::1]`. See <<unicast.hosts>>.

`discovery.zen.ping.unicast.hosts.resolve_timeout`::

Sets the amount of time to wait for DNS lookups on each round of discovery. This is specified as a <<time-units, time unit>> and defaults to `5s`.
41 changes: 3 additions & 38 deletions docs/reference/modules/discovery/discovery.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -82,9 +82,10 @@ gives a convenient mechanism for an Elasticsearch instance that is run in a
Docker container to be dynamically supplied with a list of IP addresses to
connect to when those IP addresses may not be known at node startup.

To enable file-based discovery, configure the `file` hosts provider as follows:
To enable file-based discovery, configure the `file` hosts provider as follows
in the `elasticsearch.yml` file:

[source,txt]
[source,yml]
----------------------------------------------------------------
discovery.zen.hosts_provider: file
----------------------------------------------------------------
Expand Down Expand Up @@ -150,39 +151,3 @@ a hosts provider that uses the Azure Classic API find a list of seed nodes.

The {plugins}/discovery-gce.html[GCE discovery plugin] adds a hosts provider
that uses the GCE API find a list of seed nodes.

[float]
==== Discovery settings

The discovery process is controlled by the following settings.

`discovery.find_peers_interval`::

Sets how long a node will wait before attempting another discovery round.
Defaults to `1s`.

`discovery.request_peers_timeout`::

Sets how long a node will wait after asking its peers again before
considering the request to have failed. Defaults to `3s`.

`discovery.probe.connect_timeout`::

Sets how long to wait when attempting to connect to each address. Defaults
to `3s`.

`discovery.probe.handshake_timeout`::

Sets how long to wait when attempting to identify the remote node via a
handshake. Defaults to `1s`.

`discovery.cluster_formation_warning_timeout`::

Sets how long a node will try to form a cluster before logging a warning
that the cluster did not form. Defaults to `10s`.

If a cluster has not formed after `discovery.cluster_formation_warning_timeout`
has elapsed then the node will log a warning message that starts with the phrase
`master not discovered` which describes the current state of the discovery
process.

65 changes: 16 additions & 49 deletions docs/reference/modules/discovery/fault-detection.asciidoc
Original file line number Diff line number Diff line change
@@ -1,52 +1,19 @@
[[fault-detection-settings]]
=== Cluster fault detection settings
[[cluster-fault-detection]]
=== Cluster fault detection

An elected master periodically checks each of the nodes in the cluster in order
to ensure that they are still connected and healthy, and in turn each node in
the cluster periodically checks the health of the elected master. These checks
The elected master periodically checks each of the nodes in the cluster to
ensure that they are still connected and healthy. Each node in the cluster also periodically checks the health of the elected master. These checks
are known respectively as _follower checks_ and _leader checks_.

Elasticsearch allows for these checks occasionally to fail or timeout without
taking any action, and will only consider a node to be truly faulty after a
number of consecutive checks have failed. The following settings control the
behaviour of fault detection.

`cluster.fault_detection.follower_check.interval`::

Sets how long the elected master waits between follower checks to each
other node in the cluster. Defaults to `1s`.

`cluster.fault_detection.follower_check.timeout`::

Sets how long the elected master waits for a response to a follower check
before considering it to have failed. Defaults to `30s`.

`cluster.fault_detection.follower_check.retry_count`::

Sets how many consecutive follower check failures must occur to each node
before the elected master considers that node to be faulty and removes it
from the cluster. Defaults to `3`.

`cluster.fault_detection.leader_check.interval`::

Sets how long each node waits between checks of the elected master.
Defaults to `1s`.

`cluster.fault_detection.leader_check.timeout`::

Sets how long each node waits for a response to a leader check from the
elected master before considering it to have failed. Defaults to `30s`.

`cluster.fault_detection.leader_check.retry_count`::

Sets how many consecutive leader check failures must occur before a node
considers the elected master to be faulty and attempts to find or elect a
new master. Defaults to `3`.

If the elected master detects that a node has disconnected then this is treated
as an immediate failure, bypassing the timeouts and retries listed above, and
the master attempts to remove the node from the cluster. Similarly, if a node
detects that the elected master has disconnected then this is treated as an
immediate failure, bypassing the timeouts and retries listed above, and the
follower restarts its discovery phase to try and find or elect a new master.

Elasticsearch allows these checks to occasionally fail or timeout without
taking any action. It considers a node to be faulty only after a number of
consecutive checks have failed. You can control fault detection behavior with
<<modules-discovery-settings,`cluster.fault_detection.*` settings>>.

If the elected master detects that a node has disconnected, however, this
situation is treated as an immediate failure. The master bypasses the timeout
and retry setting values and attempts to remove the node from the cluster.
Similarly, if a node detects that the elected master has disconnected, this
situation is treated as an immediate failure. The node bypasses the timeout and
retry settings and restarts its discovery phase to try and find or elect a new
master.
Loading

0 comments on commit 33e9cf3

Please sign in to comment.