Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nuance around stretched clusters #77360

Merged
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 33 additions & 13 deletions docs/reference/high-availability/cluster-design.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -230,24 +230,44 @@ The cluster will be resilient to the loss of any node as long as:
[[high-availability-cluster-design-large-clusters]]
=== Resilience in larger clusters

It is not unusual for nodes to share some common infrastructure, such as a power
supply or network router. If so, you should plan for the failure of this
It is not unusual for nodes to share some common infrastructure, such as network
DaveCTurner marked this conversation as resolved.
Show resolved Hide resolved
interconnects or a power supply. If so, you should plan for the failure of this
Copy link
Contributor

@mjmbischoff mjmbischoff Sep 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
interconnects or a power supply. If so, you should plan for the failure of this
interconnects, power supply or, in the case of virtualization, physical hosts. If so, you should plan for the failure of this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. I mean it's correct but it does make the sentence much more complicated. Is it worth the extra words? Do we need to clarify that nodes on the same physical host share infrastructure like power and network? Seems kinda obvious to me but this is a genuine question, I'm not the one on the front line for this kind of thing.

infrastructure and ensure that such a failure would not affect too many of your
nodes. It is common practice to group all the nodes sharing some infrastructure
into _zones_ and to plan for the failure of any whole zone at once.

Your cluster’s zones should all be contained within a single data centre. {es}
expects its node-to-node connections to be reliable and have low latency and
high bandwidth. Connections between data centres typically do not meet these
expectations. Although {es} will behave correctly on an unreliable or slow
network, it will not necessarily behave optimally. It may take a considerable
length of time for a cluster to fully recover from a network partition since it
must resynchronize any missing data and rebalance the cluster once the
partition heals. If you want your data to be available in multiple data centres,
deploy a separate cluster in each data centre and use
<<modules-cross-cluster-search,{ccs}>> or <<xpack-ccr,{ccr}>> to link the
{es} expects its node-to-node connections to be reliable and have low latency
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{es} expects its node-to-node connections to be reliable and have low latency
{es} expects its node-to-node connections to be reliable, have low latency

and good bandwidth. Many of the tasks that {es} performs require multiple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully understand why you say 'good bandwidth' at the same time customers have varying notions of good here. For some a dedicated, non-shared 1Gbit is deemed good, others have 10, 25, 40 or 100Gbit with dual nic in a LAG and I guess depending on the their use-case either could be right. It's when their notion of 'good' apart from what they need.

I guess we can get away with 'enough' as in, enough bandwidth

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Enough bandwidth" feels awkward to me, how about "adequate bandwidth"? See b0fae80.

round-trips between nodes. This means that a slow or unreliable interconnect
may have a significant effect on the performance and stability of your cluster.
A few milliseconds of latency added to each round-trip can quickly accumulate
into a noticeable performance penalty. {es} will automatically recover from a
network partition as quickly as it can but your cluster may be partly
unavailable during a partition and will need to spend time and resources to
resynchronize any missing data and rebalance itself once a partition heals.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re:bandwidth above, recovery / reallocation typically is the thing that consumes the bandwidth and lack of bandwidth might go unnoticed until the customer decides to make cluster changes / upgrade / has a node failure. Perhaps mentioning something with respects to time to recovery makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks. Added a sentence at the end of this paragraph about recovery time in b0fae80.


If you have divided your cluster into zones then typically the network
connections within each zone are of higher quality than the connections between
the zones. You must make sure that the network connections between zones are of
DaveCTurner marked this conversation as resolved.
Show resolved Hide resolved
sufficiently high quality. You will see the best results by locating all your
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sufficiently high quality. You will see the best results by locating all your
sufficiently high quality. You will see the highest performance by locating all your

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather be slightly more vague here: it's not just about performance, reliability is also a big deal.

zones within a single data center with each zone having its own independent
power supply and other supporting infrastructure. You can also _stretch_ your
cluster across nearby data centers as long as the network interconnection
between each pair of data centers is good enough.

[[high-availability-cluster-design-min-network-perf]] There is no specific
DaveCTurner marked this conversation as resolved.
Show resolved Hide resolved
minimum network performance required to run a healthy {es} cluster. In theory a
DaveCTurner marked this conversation as resolved.
Show resolved Hide resolved
cluster will work correctly even if the round-trip latency between nodes is
several hundred milliseconds. In practice if your network is that slow then the
DaveCTurner marked this conversation as resolved.
Show resolved Hide resolved
cluster performance will be very poor. In addition, slow networks are often
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cluster performance will be very poor. In addition, slow networks are often
cluster performance will likely be at unacceptable levels. In addition, slow networks are often

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with something like that but then I figured we'd have to dive into what "unacceptable" means and how you'd determine what is or isn't acceptable. I saw someone running a very stretched cluster over satellite links once. Its performance was terrible in an absolute sense, and yet it was still acceptable to them. There's certainly a place for that sort of discussion but it's not here.

unreliable enough to cause network partitions that will lead to periods of
DaveCTurner marked this conversation as resolved.
Show resolved Hide resolved
unavailability.

If you want your data to be available in multiple data centers that are further
apart or not well connected, deploy a separate cluster in each data center and
use <<modules-cross-cluster-search,{ccs}>> or <<xpack-ccr,{ccr}>> to link the
clusters together. These features are designed to perform well even if the
cluster-to-cluster connections are less reliable or slower than the network
cluster-to-cluster connections are less reliable or performant than the network
within each cluster.

After losing a whole zone's worth of nodes, a properly-designed cluster may be
Expand Down