Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nuance around stretched clusters #77360

Merged
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 36 additions & 13 deletions docs/reference/high-availability/cluster-design.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -230,24 +230,47 @@ The cluster will be resilient to the loss of any node as long as:
[[high-availability-cluster-design-large-clusters]]
=== Resilience in larger clusters

It is not unusual for nodes to share some common infrastructure, such as a power
supply or network router. If so, you should plan for the failure of this
It is not unusual for nodes to share some common infrastructure, such as network
DaveCTurner marked this conversation as resolved.
Show resolved Hide resolved
interconnects or a power supply. If so, you should plan for the failure of this
Copy link
Contributor

@mjmbischoff mjmbischoff Sep 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
interconnects or a power supply. If so, you should plan for the failure of this
interconnects, power supply or, in the case of virtualization, physical hosts. If so, you should plan for the failure of this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. I mean it's correct but it does make the sentence much more complicated. Is it worth the extra words? Do we need to clarify that nodes on the same physical host share infrastructure like power and network? Seems kinda obvious to me but this is a genuine question, I'm not the one on the front line for this kind of thing.

infrastructure and ensure that such a failure would not affect too many of your
nodes. It is common practice to group all the nodes sharing some infrastructure
into _zones_ and to plan for the failure of any whole zone at once.

Your cluster’s zones should all be contained within a single data centre. {es}
expects its node-to-node connections to be reliable and have low latency and
high bandwidth. Connections between data centres typically do not meet these
expectations. Although {es} will behave correctly on an unreliable or slow
network, it will not necessarily behave optimally. It may take a considerable
length of time for a cluster to fully recover from a network partition since it
must resynchronize any missing data and rebalance the cluster once the
partition heals. If you want your data to be available in multiple data centres,
deploy a separate cluster in each data centre and use
<<modules-cross-cluster-search,{ccs}>> or <<xpack-ccr,{ccr}>> to link the
{es} expects its node-to-node connections to be reliable and to have low
latency and adequate bandwidth. Many of the tasks that {es} performs require
multiple round-trips between nodes. This means that a slow or unreliable
DaveCTurner marked this conversation as resolved.
Show resolved Hide resolved
interconnect may have a significant effect on the performance and stability of
your cluster. A few milliseconds of latency added to each round-trip can
DaveCTurner marked this conversation as resolved.
Show resolved Hide resolved
quickly accumulate into a noticeable performance penalty. {es} will
DaveCTurner marked this conversation as resolved.
Show resolved Hide resolved
automatically recover from a network partition as quickly as it can but your
cluster may be partly unavailable during a partition and will need to spend
time and resources to resynchronize any missing data and rebalance itself once
the partition heals. Recovering from a failure may involve copying a large
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we're talking around reallocation and shard recovery here. Is there a reason we don't just directly mention and xref those two concepts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think we had any docs on those topics, at least not concept-level ones that would be suitable for linking from here. If you have some in mind then sure we can add links.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. We have some setting reference:

However, I don't think those are great links to use here. I've opened #77515 to track this gap and add those docs.

This looks fine to me in the meantime.

amount of data between nodes so the recovery time is often determined by the
available bandwidth.

If you have divided your cluster into zones then typically the network
connections within each zone are of higher quality than the connections between
the zones. You must make sure that the network connections between zones are of
DaveCTurner marked this conversation as resolved.
Show resolved Hide resolved
sufficiently high quality. You will see the best results by locating all your
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sufficiently high quality. You will see the best results by locating all your
sufficiently high quality. You will see the highest performance by locating all your

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather be slightly more vague here: it's not just about performance, reliability is also a big deal.

zones within a single data center with each zone having its own independent
power supply and other supporting infrastructure. You can also _stretch_ your
cluster across nearby data centers as long as the network interconnection
between each pair of data centers is good enough.

[[high-availability-cluster-design-min-network-perf]] There is no specific
DaveCTurner marked this conversation as resolved.
Show resolved Hide resolved
minimum network performance required to run a healthy {es} cluster. In theory a
DaveCTurner marked this conversation as resolved.
Show resolved Hide resolved
cluster will work correctly even if the round-trip latency between nodes is
several hundred milliseconds. In practice if your network is that slow then the
DaveCTurner marked this conversation as resolved.
Show resolved Hide resolved
cluster performance will be very poor. In addition, slow networks are often
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cluster performance will be very poor. In addition, slow networks are often
cluster performance will likely be at unacceptable levels. In addition, slow networks are often

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with something like that but then I figured we'd have to dive into what "unacceptable" means and how you'd determine what is or isn't acceptable. I saw someone running a very stretched cluster over satellite links once. Its performance was terrible in an absolute sense, and yet it was still acceptable to them. There's certainly a place for that sort of discussion but it's not here.

unreliable enough to cause network partitions that will lead to periods of
DaveCTurner marked this conversation as resolved.
Show resolved Hide resolved
unavailability.

If you want your data to be available in multiple data centers that are further
apart or not well connected, deploy a separate cluster in each data center and
use <<modules-cross-cluster-search,{ccs}>> or <<xpack-ccr,{ccr}>> to link the
clusters together. These features are designed to perform well even if the
cluster-to-cluster connections are less reliable or slower than the network
cluster-to-cluster connections are less reliable or performant than the network
within each cluster.

After losing a whole zone's worth of nodes, a properly-designed cluster may be
Expand Down