Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] CCR disaster recovery #91491

Merged
merged 30 commits into from
Apr 21, 2023
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
d3856eb
Add bi-directional disaster recovery
Leaf-Lin Nov 10, 2022
c213bc7
add ccr bi-directional disaster recovery image
Leaf-Lin Nov 10, 2022
f3b9c0a
add link to bi-directional disaster recovery
Leaf-Lin Nov 10, 2022
e7dc8a2
add image
Leaf-Lin Nov 10, 2022
e9833ed
add [source]
Leaf-Lin Nov 10, 2022
7c4d5b3
fix language
Leaf-Lin Nov 10, 2022
432c7a5
Update bi-directional-disaster-recovery.asciidoc
Leaf-Lin Nov 10, 2022
98dcdcd
Update bi-directional-disaster-recovery.asciidoc
Leaf-Lin Nov 10, 2022
0d78708
Update bi-directional-disaster-recovery.asciidoc
Leaf-Lin Nov 10, 2022
4343a5d
Apply suggestions from code review
Leaf-Lin Nov 30, 2022
f405129
Apply suggestions from code review
Leaf-Lin Nov 30, 2022
3475190
Apply suggestions from code review
Leaf-Lin Nov 30, 2022
67c97d8
Apply suggestions from code review
Leaf-Lin Nov 30, 2022
14f1e71
Apply suggestions from code review
Leaf-Lin Nov 30, 2022
c1bb454
add test
Leaf-Lin Nov 30, 2022
297f6d4
Update docs/reference/ccr/bi-directional-disaster-recovery.asciidoc
Leaf-Lin Nov 30, 2022
1b94503
Add test
Leaf-Lin Nov 30, 2022
ba54b43
Add uni-directional DR doc
Leaf-Lin Nov 30, 2022
41f7aed
Add uni-directional image
Leaf-Lin Nov 30, 2022
01b9c20
add uni-directional doc reference
Leaf-Lin Nov 30, 2022
ad3b549
Update docs/reference/ccr/uni-directional-disaster-recovery.asciidoc
Leaf-Lin Mar 9, 2023
b92270a
Apply suggestions from code review
Leaf-Lin Mar 9, 2023
b446106
Apply suggestions from code review
Leaf-Lin Mar 9, 2023
67c14d3
Pushing up minor edits to restart build. Previous build failure 'Coul…
amyjtechwriter Apr 5, 2023
5123f3f
Merge branch 'main' into Leaf-Lin-bi-directional-DR
elasticmachine Apr 6, 2023
82d8cbb
Apply suggestions from code review
Leaf-Lin Apr 14, 2023
908d284
Merge branch 'main' into Leaf-Lin-bi-directional-DR
elasticmachine Apr 17, 2023
892207d
Merge branch 'main' into Leaf-Lin-bi-directional-DR
elasticmachine Apr 17, 2023
8a31894
Tip formatting and renaming follwer index to _copy in uni-direction
amyjtechwriter Apr 18, 2023
a3acfac
Fix failing CI doc check
abdonpijpelink Apr 18, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
256 changes: 256 additions & 0 deletions docs/reference/ccr/bi-directional-disaster-recovery.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,256 @@
[role="xpack"]
[[ccr-disaster-recovery-bi-directional-tutorial]]
=== Tutorial: Disaster recovery based on bi-directional {ccr}
++++
<titleabbrev>Bi-directional disaster recovery</titleabbrev>
++++

Learn how to set up disaster recovery between two clusters based on
bi-directional {ccr}. The following tutorial is designed for data streams which supports
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
<<{ref}/use-a-data-stream.html#update-docs-in-a-data-stream-by-query,update_by_query>> and
<<{ref}/use-a-data-stream.html#delete-docs-in-a-data-stream-by-query,delete_by_query>>.
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
You can only perform these actions on the leader index.

This tutorial works with Logstash as the source of ingestion. It takes
advantage of a logstash feature where <<{logstash-ref}/plugins-outputs-elasticsearch,the output can be load balanced
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
across an array of hosts specified>>. Beats and agents currently do not
support multiple outputs. It should also be possible to set up a proxy
(Load Balancer) to redirect traffic without Logstash in this tutorial.

* Setting up a remote cluster on `clusterA` and `clusterB`.
* Setting up bi-directional cross-cluster replication with exclusion patterns.
* Setting up Logstash with multiple hosts to allow automatic load balancing and switching during disasters.
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved

image::images/ccr-bi-directional-disaster-recovery.png[Bi-directional cross cluster replication failover and failback]

==== Initial setup
. Set up a remote cluster on both clusters.
+
[source,console]
----
### On cluster A ###
PUT _cluster/settings
{
"persistent": {
"cluster": {
"remote": {
"clusterB": {
"mode": "proxy",
"skip_unavailable": true,
"server_name": "clusterb.es.australia-southeast1.gcp.elastic-cloud.com",
"proxy_socket_connections": 18,
"proxy_address": "clusterb.es.australia-southeast1.gcp.elastic-cloud.com:9400"
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
}
}
}
}
}
### On cluster B ###
PUT _cluster/settings
{
"persistent": {
"cluster": {
"remote": {
"clusterA": {
"mode": "proxy",
"skip_unavailable": true,
"server_name": "clustera.es.australia-southeast1.gcp.elastic-cloud.com",
"proxy_socket_connections": 18,
"proxy_address": "clustera.es.australia-southeast1.gcp.elastic-cloud.com:9400"
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
}
}
}
}
}
----
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved

. Set up bi-directional cross-cluster replication.
+
[source,console]
----
### On cluster A ###
PUT /_ccr/auto_follow/logs-generic-default
{
"remote_cluster": "clusterB",
"leader_index_patterns": [
".ds-logs-generic-default-20*"
],
"leader_index_exclusion_patterns":"{{leader_index}}-replicated_from_clustera",
"follow_index_pattern": "{{leader_index}}-replicated_from_clusterb"
}

### On cluster B ###
PUT /_ccr/auto_follow/logs-generic-default
{
"remote_cluster": "clusterA",
"leader_index_patterns": [
".ds-logs-generic-default-20*"
],
"leader_index_exclusion_patterns":"{{leader_index}}-replicated_from_clusterb",
"follow_index_pattern": "{{leader_index}}-replicated_from_clustera"
}
----
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
+
IMPORTANT: Existing data on the cluster will not be replicated by
`_ccr/auto_follow` even though the patterns may match. This function will only
replicate newly created backing indices (as part of the data stream)
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
+
IMPORTANT: Ensure to have `leader_index_exclustion_patterns` to avoid recursion.
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
+
TIP: `follow_index_pattern` allows lowercase characters only.
+
TIP: This step cannot be executed via Kibana UI due to the lack of an exclusion
pattern in the UI. Use API in this step.
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved

. Set up the Logstash config file
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
+
In the following example, I use the input generator to demonstrate the document
count in the clusters. Users would need to reconfigure this section
to suit their use cases.
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
+
[source,logstash]
----
### On Logstash server ###
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would logstash not be a single point of failure?

I think the original blog post would index data locally in the current DC to the local cluster, avoiding this issue to some degree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Unfortunately, the original blog post having DC locality isolation does not satisfy most of the bi-directional CCR use cases where users would have a single source of ingestion going into two separated cluster. That's part of the reason I was working on this tutorial to highlight this use case.

There are other possibility avoiding Logstash being the single point of failure, but it is outside the scope of this tutorial.

### This is a logstash config file ###
input {
generator{
message => 'Hello World'
count => 100
}
}
output {
elasticsearch {
hosts => ["https://clustera.es.australia-southeast1.gcp.elastic-cloud.com:9243","https://clusterb.es.australia-southeast1.gcp.elastic-cloud.com:9243"]
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
user => "logstash-user"
password => "same_password_for_both_clusters"
}
}
----
+
IMPORTANT: The key point is that when `cluster A` is down, all traffic will be
automatically redirected to `cluster B`, and once `cluster A` comes back, they
are automatically redirected back to `cluster A` again. This is achieved by the
option `hosts` where multiple ES cluster endpoints are specified in the
array `[cluserA, clusterB]`.
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
+
TIP: Set up the same password for the same user on both clusters to use this load-balancing feature.

. Start logstash with the above config file.
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
+
[source,sh]
----
### On Logstash server ###
bin/logstash -f multiple_hosts.conf
----

. Observe document counts in data streams
+
The setup above will create a data stream named `logs-generic-default`
on each of the clusters. Logstash will write 50% of the documents to `clusterp
A` and 50% of the documents to `cluster B` when both clusters are alive.
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
+
Bi-directional {ccr} will create one more data stream on each of the clusters
with the `-replication_from_cluster{a|b}` suffix. At the end of this step,
you should see:
+
* data streams On cluster A
** 50 documents in logs-generic-default-replicated_from_clusterb
** 50 documents in logs-generic-default
* data streams On cluster B
** 50 documents in logs-generic-default-replicated_from_clustera
** 50 documents in logs-generic-default
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved

. Queries should be set up to perform search across them.
+
If you perform a search on `logs*` on either of the clusters, you should see 100
hits in total.
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
+
[source,console]
----
GET logs*/_search?size=0
----


==== Failover when `clusterA` is down
. You can simulate this by shutting down either of the clusters. Let's shut down
`cluster A` in this tutorial.
. Start logstash with the same config file. (This step is not required in real
use cases where logstash ingests continuously)
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
+
[source,sh]
----
### On Logstash server ###
bin/logstash -f multiple_hosts.conf
----

. Observe all logstash traffic will be redirected to `cluster B` automatically.
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
+
TIP: You should also redirect all search traffic to the `clusterB` cluster during this time.

. Observe two data streams on `cluster B` now contain a different number of documents.
+
* data streams On cluster A (Dead)
** 50 documents in logs-generic-default-replicated_from_clusterb
** 50 documents in logs-generic-default
* data streams On cluster B (Alive)
** 50 documents in logs-generic-default-replicated_from_clustera
** 150 documents in logs-generic-default
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved


==== Failback when `clusterA` comes back
. You can simulate this by turning `cluster A` back.
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
. Observe data ingested to `cluster B` during `cluster A` 's downtime will be
automatically replicated.
+
* data streams On cluster A
** 150 documents in logs-generic-default-replicated_from_clusterb
** 50 documents in logs-generic-default
* data streams On cluster B
** 50 documents in logs-generic-default-replicated_from_clustera
** 150 documents in logs-generic-default
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved

. If you have logstash running at this time, you will also observe traffic is
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
sending to both clusters.
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved

==== Perform update or delete by query
It is possible to update or delete the documents but you can only perform these actions on the leader index.
. First identify which backing index contains the document you want to update.
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
+
[source,console]
----
### On either of the cluster ###
GET logs-generic-default*/_search?filter_path=hits.hits._index
{
"query": {
"match": {
"event.sequence": "97"
}
}
}
----
+
* If the hits returns ` "_index": ".ds-logs-generic-default-replicated_from_clustera-<yyyy.MM.dd>-*"`, then you need to proceed to the next step on `cluster A`.
* If the hits returns ` "_index": ".ds-logs-generic-default-replicated_from_clusterb-<yyyy.MM.dd>-*"`, then you need to proceed to the next step on `cluster B`.
* If the hits returns ` "_index": ".ds-logs-generic-default-<yyyy.MM.dd>-*"`, then you need to proceed to the next step on the same cluster where you perform the search query.
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved

Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
. Perform the update (or delete) by query
+
[source,console]
----
### On the cluster identified from the previous step ###
POST logs-generic-default/_update_by_query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would they not simply use individual updates or deletes instead (maybe through bulk)? I can imagine the query above giving out documents that are on both sides and they'd have to partition the document _id to use on each cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another common request from users is to perform update or deletes in bi-directional CCR.

The purpose of this section is to demonstrate the possibility of doing so. Users are still welcome to bulk delete/update on the leader index as they wish.

{
"query": {
"match": {
"event.sequence": "97"
}
},
"script": {
"source": "ctx._source.event.original = params.new_event",
"lang": "painless",
"params": {
"new_event": "FOOBAR"
}
}
}
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
----
Leaf-Lin marked this conversation as resolved.
Show resolved Hide resolved
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/reference/ccr/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -343,3 +343,4 @@ include::getting-started.asciidoc[]
include::managing.asciidoc[]
include::auto-follow.asciidoc[]
include::upgrading.asciidoc[]
include::bi-directional-disaster-recovery.asciidoc[]