diff --git a/docs/reference/ccr/bi-directional-disaster-recovery.asciidoc b/docs/reference/ccr/bi-directional-disaster-recovery.asciidoc new file mode 100644 index 0000000000000..614af8846230e --- /dev/null +++ b/docs/reference/ccr/bi-directional-disaster-recovery.asciidoc @@ -0,0 +1,275 @@ +[role="xpack"] +[[ccr-disaster-recovery-bi-directional-tutorial]] +=== Tutorial: Disaster recovery based on bi-directional {ccr} +++++ +Bi-directional disaster recovery +++++ + +//// +[source,console] +---- +PUT _data_stream/logs-generic-default +---- +// TESTSETUP + +[source,console] +---- +DELETE /_data_stream/* +---- +// TEARDOWN +//// + +Learn how to set up disaster recovery between two clusters based on +bi-directional {ccr}. The following tutorial is designed for data streams which support +<> and <>. You can only perform these actions on the leader index. + +This tutorial works with {ls} as the source of ingestion. It takes advantage of a {ls} feature where {logstash-ref}/plugins-outputs-elasticsearch.html[the {ls} output to {es}] can be load balanced across an array of hosts specified. {beats} and {agents} currently do not +support multiple outputs. It should also be possible to set up a proxy +(load balancer) to redirect traffic without {ls} in this tutorial. + +* Setting up a remote cluster on `clusterA` and `clusterB`. +* Setting up bi-directional cross-cluster replication with exclusion patterns. +* Setting up {ls} with multiple hosts to allow automatic load balancing and switching during disasters. + +image::images/ccr-bi-directional-disaster-recovery.png[Bi-directional cross cluster replication failover and failback] + +[[ccr-tutorial-initial-setup]] +==== Initial setup +. Set up a remote cluster on both clusters. ++ +[source,console] +---- +### On cluster A ### +PUT _cluster/settings +{ + "persistent": { + "cluster": { + "remote": { + "clusterB": { + "mode": "proxy", + "skip_unavailable": true, + "server_name": "clusterb.es.region-b.gcp.elastic-cloud.com", + "proxy_socket_connections": 18, + "proxy_address": "clusterb.es.region-b.gcp.elastic-cloud.com:9400" + } + } + } + } +} +### On cluster B ### +PUT _cluster/settings +{ + "persistent": { + "cluster": { + "remote": { + "clusterA": { + "mode": "proxy", + "skip_unavailable": true, + "server_name": "clustera.es.region-a.gcp.elastic-cloud.com", + "proxy_socket_connections": 18, + "proxy_address": "clustera.es.region-a.gcp.elastic-cloud.com:9400" + } + } + } + } +} +---- +// TEST[setup:host] +// TEST[s/"server_name": "clustera.es.region-a.gcp.elastic-cloud.com",//] +// TEST[s/"server_name": "clusterb.es.region-b.gcp.elastic-cloud.com",//] +// TEST[s/"proxy_socket_connections": 18,//] +// TEST[s/clustera.es.region-a.gcp.elastic-cloud.com:9400/\${transport_host}/] +// TEST[s/clusterb.es.region-b.gcp.elastic-cloud.com:9400/\${transport_host}/] + +. Set up bi-directional cross-cluster replication. ++ +[source,console] +---- +### On cluster A ### +PUT /_ccr/auto_follow/logs-generic-default +{ + "remote_cluster": "clusterB", + "leader_index_patterns": [ + ".ds-logs-generic-default-20*" + ], + "leader_index_exclusion_patterns":"{{leader_index}}-replicated_from_clustera", + "follow_index_pattern": "{{leader_index}}-replicated_from_clusterb" +} + +### On cluster B ### +PUT /_ccr/auto_follow/logs-generic-default +{ + "remote_cluster": "clusterA", + "leader_index_patterns": [ + ".ds-logs-generic-default-20*" + ], + "leader_index_exclusion_patterns":"{{leader_index}}-replicated_from_clusterb", + "follow_index_pattern": "{{leader_index}}-replicated_from_clustera" +} +---- +// TEST[setup:remote_cluster] +// TEST[s/clusterA/remote_cluster/] +// TEST[s/clusterB/remote_cluster/] ++ +IMPORTANT: Existing data on the cluster will not be replicated by +`_ccr/auto_follow` even though the patterns may match. This function will only +replicate newly created backing indices (as part of the data stream). ++ +IMPORTANT: Use `leader_index_exclusion_patterns` to avoid recursion. ++ +TIP: `follow_index_pattern` allows lowercase characters only. ++ +TIP: This step cannot be executed via the {kib} UI due to the lack of an exclusion +pattern in the UI. Use the API in this step. + +. Set up the {ls} configuration file. ++ +This example uses the input generator to demonstrate the document +count in the clusters. Reconfigure this section +to suit your own use case. ++ +[source,logstash] +---- +### On Logstash server ### +### This is a logstash config file ### +input { + generator{ + message => 'Hello World' + count => 100 + } +} +output { + elasticsearch { + hosts => ["https://clustera.es.region-a.gcp.elastic-cloud.com:9243","https://clusterb.es.region-b.gcp.elastic-cloud.com:9243"] + user => "logstash-user" + password => "same_password_for_both_clusters" + } +} +---- ++ +IMPORTANT: The key point is that when `cluster A` is down, all traffic will be +automatically redirected to `cluster B`. Once `cluster A` comes back, traffic +is automatically redirected back to `cluster A` again. This is achieved by the +option `hosts` where multiple ES cluster endpoints are specified in the +array `[clusterA, clusterB]`. ++ +TIP: Set up the same password for the same user on both clusters to use this load-balancing feature. + +. Start {ls} with the earlier configuration file. ++ +[source,sh] +---- +### On Logstash server ### +bin/logstash -f multiple_hosts.conf +---- + +. Observe document counts in data streams. ++ +The setup creates a data stream named `logs-generic-default` on each of the clusters. {ls} will write 50% of the documents to `cluster A` and 50% of the documents to `cluster B` when both clusters are up. ++ +Bi-directional {ccr} will create one more data stream on each of the clusters +with the `-replication_from_cluster{a|b}` suffix. At the end of this step: ++ +* data streams on cluster A contain: +** 50 documents in `logs-generic-default-replicated_from_clusterb` +** 50 documents in `logs-generic-default` +* data streams on cluster B contain: +** 50 documents in `logs-generic-default-replicated_from_clustera` +** 50 documents in `logs-generic-default` + +. Queries should be set up to search across both data streams. +A query on `logs*`, on either of the clusters, returns 100 +hits in total. ++ +[source,console] +---- +GET logs*/_search?size=0 +---- + + +==== Failover when `clusterA` is down +. You can simulate this by shutting down either of the clusters. Let's shut down +`cluster A` in this tutorial. +. Start {ls} with the same configuration file. (This step is not required in real +use cases where {ls} ingests continuously.) ++ +[source,sh] +---- +### On Logstash server ### +bin/logstash -f multiple_hosts.conf +---- + +. Observe all {ls} traffic will be redirected to `cluster B` automatically. ++ +TIP: You should also redirect all search traffic to the `clusterB` cluster during this time. + +. The two data streams on `cluster B` now contain a different number of documents. ++ +* data streams on cluster A (down) +** 50 documents in `logs-generic-default-replicated_from_clusterb` +** 50 documents in `logs-generic-default` +* data streams On cluster B (up) +** 50 documents in `logs-generic-default-replicated_from_clustera` +** 150 documents in `logs-generic-default` + + +==== Failback when `clusterA` comes back +. You can simulate this by turning `cluster A` back on. +. Data ingested to `cluster B` during `cluster A` 's downtime will be +automatically replicated. ++ +* data streams on cluster A +** 150 documents in `logs-generic-default-replicated_from_clusterb` +** 50 documents in `logs-generic-default` +* data streams on cluster B +** 50 documents in `logs-generic-default-replicated_from_clustera` +** 150 documents in `logs-generic-default` + +. If you have {ls} running at this time, you will also observe traffic is +sent to both clusters. + +==== Perform update or delete by query +It is possible to update or delete the documents but you can only perform these actions on the leader index. + +. First identify which backing index contains the document you want to update. ++ +[source,console] +---- +### On either of the cluster ### +GET logs-generic-default*/_search?filter_path=hits.hits._index +{ +"query": { + "match": { + "event.sequence": "97" + } + } +} +---- ++ +* If the hits returns `"_index": ".ds-logs-generic-default-replicated_from_clustera--*"`, then you need to proceed to the next step on `cluster A`. +* If the hits returns `"_index": ".ds-logs-generic-default-replicated_from_clusterb--*"`, then you need to proceed to the next step on `cluster B`. +* If the hits returns `"_index": ".ds-logs-generic-default--*"`, then you need to proceed to the next step on the same cluster where you performed the search query. + +. Perform the update (or delete) by query: ++ +[source,console] +---- +### On the cluster identified from the previous step ### +POST logs-generic-default/_update_by_query +{ + "query": { + "match": { + "event.sequence": "97" + } + }, + "script": { + "source": "ctx._source.event.original = params.new_event", + "lang": "painless", + "params": { + "new_event": "FOOBAR" + } + } +} +---- ++ +TIP: If a soft delete is merged away before it can be replicated to a follower the following process will fail due to incomplete history on the leader, see <> for more details. diff --git a/docs/reference/ccr/images/ccr-bi-directional-disaster-recovery.png b/docs/reference/ccr/images/ccr-bi-directional-disaster-recovery.png new file mode 100644 index 0000000000000..ad597160d3ce0 Binary files /dev/null and b/docs/reference/ccr/images/ccr-bi-directional-disaster-recovery.png differ diff --git a/docs/reference/ccr/images/ccr-uni-directional-disaster-recovery.png b/docs/reference/ccr/images/ccr-uni-directional-disaster-recovery.png new file mode 100644 index 0000000000000..ad6e19fa13812 Binary files /dev/null and b/docs/reference/ccr/images/ccr-uni-directional-disaster-recovery.png differ diff --git a/docs/reference/ccr/index.asciidoc b/docs/reference/ccr/index.asciidoc index e1b6e98ea5d87..f3180da1ae77e 100644 --- a/docs/reference/ccr/index.asciidoc +++ b/docs/reference/ccr/index.asciidoc @@ -343,3 +343,5 @@ include::getting-started.asciidoc[] include::managing.asciidoc[] include::auto-follow.asciidoc[] include::upgrading.asciidoc[] +include::uni-directional-disaster-recovery.asciidoc[] +include::bi-directional-disaster-recovery.asciidoc[] diff --git a/docs/reference/ccr/uni-directional-disaster-recovery.asciidoc b/docs/reference/ccr/uni-directional-disaster-recovery.asciidoc new file mode 100644 index 0000000000000..731fbc0b242c9 --- /dev/null +++ b/docs/reference/ccr/uni-directional-disaster-recovery.asciidoc @@ -0,0 +1,194 @@ +[role="xpack"] +[[ccr-disaster-recovery-uni-directional-tutorial]] +=== Tutorial: Disaster recovery based on uni-directional {ccr} +++++ +Uni-directional disaster recovery +++++ + +//// +[source,console] +---- +PUT kibana_sample_data_ecommerce +---- +// TESTSETUP + +[source,console] +---- +DELETE kibana_sample_data_ecommerce +---- +// TEARDOWN +//// + + +Learn how to failover and failback between two clusters based on uni-directional {ccr}. You can also visit <> to set up replicating data streams that automatically failover and failback without human intervention. + +* Setting up uni-directional {ccr} replicated from `clusterA` +to `clusterB`. +* Failover - If `clusterA` goes offline, `clusterB` needs to "promote" follower +indices to regular indices to allow write operations. All ingestion will need to +be redirected to `clusterB`, this is controlled by the clients ({ls}, {beats}, +{agents}, etc). +* Failback - When `clusterA` is back online, it assumes the role of a follower +and replicates the leader indices from `clusterB`. + +image::images/ccr-uni-directional-disaster-recovery.png[Uni-directional cross cluster replication failover and failback] + +NOTE: {ccr-cap} provides functionality to replicate user-generated indices only. +{ccr-cap} isn't designed for replicating system-generated indices or snapshot +settings, and can't replicate {ilm-init} or {slm-init} policies across clusters. +Learn more in {ccr} <>. + +==== Prerequisites +Before completing this tutorial, +<> to connect two +clusters and configure a follower index. + +In this tutorial, `kibana_sample_data_ecommerce` is replicated from `clusterA` to `clusterB`. + +[source,console] +---- +### On clusterB ### +PUT _cluster/settings +{ + "persistent": { + "cluster": { + "remote": { + "clusterA": { + "mode": "proxy", + "skip_unavailable": "true", + "server_name": "clustera.es.region-a.gcp.elastic-cloud.com", + "proxy_socket_connections": "18", + "proxy_address": "clustera.es.region-a.gcp.elastic-cloud.com:9400" + } + } + } + } +} +---- +// TEST[setup:host] +// TEST[s/"server_name": "clustera.es.region-a.gcp.elastic-cloud.com",//] +// TEST[s/"proxy_socket_connections": 18,//] +// TEST[s/clustera.es.region-a.gcp.elastic-cloud.com:9400/\${transport_host}/] +// TEST[s/clusterA/remote_cluster/] + +[source,console] +---- +### On clusterB ### +PUT /kibana_sample_data_ecommerce2/_ccr/follow?wait_for_active_shards=1 +{ + "remote_cluster": "clusterA", + "leader_index": "kibana_sample_data_ecommerce" +} +---- +// TEST[continued] +// TEST[s/clusterA/remote_cluster/] + +IMPORTANT: Writes (such as ingestion or updates) should occur only on the leader +index. Follower indices are read-only and will reject any writes. + + +==== Failover when `clusterA` is down + +. Promote the follower indices in `clusterB` into regular indices so +that they accept writes. This can be achieved by: +* First, pause indexing following for the follower index. +* Next, close the follower index. +* Unfollow the leader index. +* Finally, open the follower index (which at this point is a regular index). + ++ +[source,console] +---- +### On clusterB ### +POST /kibana_sample_data_ecommerce2/_ccr/pause_follow +POST /kibana_sample_data_ecommerce2/_close +POST /kibana_sample_data_ecommerce2/_ccr/unfollow +POST /kibana_sample_data_ecommerce2/_open +---- +// TEST[continued] + +. On the client side ({ls}, {beats}, {agent}), manually re-enable ingestion of +`kibana_sample_data_ecommerce2` and redirect traffic to the `clusterB`. You should +also redirect all search traffic to the `clusterB` cluster during +this time. You can simulate this by ingesting documents into this index. You should +notice this index is now writable. ++ +[source,console] +---- +### On clusterB ### +POST kibana_sample_data_ecommerce2/_doc/ +{ + "user": "kimchy" +} +---- +// TEST[continued] + +==== Failback when `clusterA` comes back + +When `clusterA` comes back, `clusterB` becomes the new leader and `clusterA` becomes the follower. + +. Set up remote cluster `clusterB` on `clusterA`. ++ +[source,console] +---- +### On clusterA ### +PUT _cluster/settings +{ + "persistent": { + "cluster": { + "remote": { + "clusterB": { + "mode": "proxy", + "skip_unavailable": "true", + "server_name": "clusterb.es.region-b.gcp.elastic-cloud.com", + "proxy_socket_connections": "18", + "proxy_address": "clusterb.es.region-b.gcp.elastic-cloud.com:9400" + } + } + } + } +} +---- +// TEST[setup:host] +// TEST[s/"server_name": "clusterb.es.region-b.gcp.elastic-cloud.com",//] +// TEST[s/"proxy_socket_connections": 18,//] +// TEST[s/clusterb.es.region-b.gcp.elastic-cloud.com:9400/\${transport_host}/] +// TEST[s/clusterB/remote_cluster/] + +. Existing data needs to be discarded before you can turn any index into a +follower. Ensure the most up-to-date data is available on `clusterB` prior to +deleting any indices on `clusterA`. ++ +[source,console] +---- +### On clusterA ### +DELETE kibana_sample_data_ecommerce +---- +// TEST[skip:need dual cluster setup] + + +. Create a follower index on `clusterA`, now following the leader index in +`clusterB`. ++ +[source,console] +---- +### On clusterA ### +PUT /kibana_sample_data_ecommerce/_ccr/follow?wait_for_active_shards=1 +{ + "remote_cluster": "clusterB", + "leader_index": "kibana_sample_data_ecommerce2" +} +---- +// TEST[continued] +// TEST[s/clusterB/remote_cluster/] + +. The index on the follower cluster now contains the updated documents. ++ +[source,console] +---- +### On clusterA ### +GET kibana_sample_data_ecommerce/_search?q=kimchy +---- +// TEST[continued] ++ +TIP: If a soft delete is merged away before it can be replicated to a follower the following process will fail due to incomplete history on the leader, see <> for more details.