Skip to content

Commit

Permalink
[docs][yba] Update On-prem provider in 2.16 (#17715)
Browse files Browse the repository at this point in the history
  • Loading branch information
ddhodge authored Jun 8, 2023
1 parent ae7a347 commit 6320b2c
Show file tree
Hide file tree
Showing 5 changed files with 286 additions and 325 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -42,51 +42,51 @@ YugabyteDB replicates data across nodes (or fault domains) in order to tolerate

Replication of data in DocDB is achieved at the level of tablets, using tablet peers, with each table sharded into a set of tablets, as demonstrated in the following diagram:

<img src="/images/architecture/replication/tablets_in_a_docsb_table.png" style="max-width:750px;"/>
![Tablets in a table](/images/architecture/replication/tablets_in_a_docsb_table.png)

Each tablet comprises of a set of tablet peers, each of which stores one copy of the data belonging to the tablet. There are as many tablet peers for a tablet as the replication factor, and they form a Raft group. The tablet peers are hosted on different nodes to allow data redundancy to protect against node failures. The replication of data between the tablet peers is strongly consistent.

The following diagram depicts three tablet peers that belong to a tablet called `tablet 1`. The tablet peers are hosted on different YB-TServers and form a Raft group for leader election, failure detection, and replication of the write-ahead logs.

![raft_replication](/images/architecture/raft_replication.png)
![RAFT Replication](/images/architecture/raft_replication.png)

### Raft replication

As soon as a tablet initiates, it elects one of the tablet peers as the tablet leader using the [Raft](https://raft.github.io/) protocol. The tablet leader becomes responsible for processing user-facing write requests by translating the user-issued writes into the document storage layer of DocDB. In addition, the tablet leader replicates among the tablet peers using Raft to achieve strong consistency. Setting aside the tablet leader, the remaining tablet peers of the Raft group are called tablet followers.

The set of DocDB updates depends on the user-issued write, and involves locking a set of keys to establish a strict update order, and optionally reading the older value to modify and update in case of a read-modify-write operation. The Raft log is used to ensure that the database state-machine of a tablet is replicated amongst the tablet peers with strict ordering and correctness guarantees even in the face of failures or membership changes. This is essential to achieving strong consistency.

Once the Raft log is replicated to a majority of tablet-peers and successfully persisted on the majority, the write is applied into the DocDB document storage layer and is subsequently available for reads. Once the write is persisted on disk by the document storage layer, the write entries can be purged from the Raft log. This is performed as a controlled background operation without any impact to the foreground operations.
After the Raft log is replicated to a majority of tablet-peers and successfully persisted on the majority, the write is applied into the DocDB document storage layer and is subsequently available for reads. After the write is persisted on disk by the document storage layer, the write entries can be purged from the Raft log. This is performed as a controlled background operation without any impact to the foreground operations.

## Replication in a cluster

The replicas of data can be placed across multiple fault domains. The following examples of a multi-zone deployment with three zones and the replication factor assumed to be 3 demonstrate how replication across fault domains is performed in a cluster.

### Multi-zone deployment

In the case of a multi-zone deployement, the data in each of the tablets in a node is replicated across multiple zones using the Raft consensus algorithm. All the read and write queries for the rows that belong to a given tablet are handled by that tablets leader, as per the following diagram:
In the case of a multi-zone deployment, the data in each of the tablets in a node is replicated across multiple zones using the Raft consensus algorithm. All the read and write queries for the rows that belong to a given tablet are handled by that tablet's leader, as per the following diagram:

<img src="/images/architecture/replication/raft-replication-across-zones.png" style="max-width:750px;"/>
![Replication across zones](/images/architecture/replication/raft-replication-across-zones.png)

As a part of the Raft replication, each tablet peer first elects a tablet leader responsible for serving reads and writes. The distribution of tablet leaders across different zones is determined by a user-specified data placement policy, which, in the preceding scenario, ensures that in the steady state, each of the zones has an equal number of tablet leaders. The following diagram shows how the tablet leaders are dispersed:

<img src="/images/architecture/replication/optimal-tablet-leader-placement.png" style="max-width:750px;"/>
![Tablet leader placement](/images/architecture/replication/optimal-tablet-leader-placement.png)

### Tolerating a zone outage

As soon as a zone outage occurs, YugabyteDB assumes that all nodes in that zone become unavailable simultaneously. This results in one-third of the tablets (which have their tablet leaders in the zone that just failed) not being able to serve any requests. The other two-thirds of the tablets are not affected. The following illustration shows the tablet peers in the zone that failed:

<img src="/images/architecture/replication/tablet-leaders-vs-followers-zone-outage.png" style="max-width:750px;"/>
![Tablet peers in a failed zone](/images/architecture/replication/tablet-leaders-vs-followers-zone-outage.png)

For the affected one-third, YugabyteDB automatically performs a failover to instances in the other two zones. Once again, the tablets being failed over are distributed across the two remaining zones evenly, as per the following diagram:

<img src="/images/architecture/replication/automatic-failover-zone-outage.png" style="max-width:750px;"/>
![Automatic failover](/images/architecture/replication/automatic-failover-zone-outage.png)

### RPO and RTO on zone outage

The recovery point objective (RPO) for each of these tablets is 0, meaning no data is lost in the failover to another zone. The recovery time objective (RTO) is 3 seconds, which is the time window for completing the failover and becoming operational out of the new zones, as per the following diagram:

<img src="/images/architecture/replication/rpo-vs-rto-zone-outage.png" style="max-width:750px;"/>
![RPO vs RTO](/images/architecture/replication/rpo-vs-rto-zone-outage.png)

## Follower reads

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ For information on the Kubernetes Provider settings, refer to [Provider settings

To add service-level annotations, use the following [overrides](../kubernetes/#overrides):

```config
```yaml
serviceEndpoints:
- name: "yb-master-service"
type: "LoadBalancer"
Expand All @@ -115,19 +115,19 @@ serviceEndpoints:
To disable LoadBalancer, use the following overrides:
```configuration
```yaml
enableLoadBalancer: False
```
To change the cluster domain name, use the following overrides:
```configuration
```yaml
domainName: my.cluster
```
To add annotations at the StatefulSet level, use the following overrides:
```configuration
```yaml
networkAnnotation:
annotation1: 'foo'
annotation2: 'bar'
Expand Down Expand Up @@ -193,7 +193,7 @@ Depending on the cloud providers configured for your YugabyteDB Anywhere, you ca

To provision in AWS or GCP cloud, your overrides should include the appropriate `provider_type` and `region_codes` as an array, as follows:

```configuration
```yaml
{
"universe_name": "cloud-override-demo",
"provider_type": "gcp", # gcp for Google Cloud, aws for Amazon Web Service
Expand All @@ -203,7 +203,7 @@ To provision in AWS or GCP cloud, your overrides should include the appropriate

To provision in Kubernetes, your overrides should include the appropriate `provider_type` and `kube_provider` type, as follows:

```configuration
```yaml
{
"universe_name": "cloud-override-demo",
"provider_type": "kubernetes",
Expand All @@ -215,7 +215,7 @@ To provision in Kubernetes, your overrides should include the appropriate `provi

To override the number of nodes, include the `num_nodes` with the desired value, and then include this parameter along with other parameters for the cloud provider, as follows:

```configuration
```yaml
{
"universe_name": "cloud-override-demo",
"num_nodes": 4 # default is 3 nodes.
Expand All @@ -226,7 +226,7 @@ To override the number of nodes, include the `num_nodes` with the desired value,

To override the replication factor, include `replication` with the desired value, and then include this parameter along with other parameters for the cloud provider, as follows:

```configuration
```yaml
{
"universe_name": "cloud-override-demo",
"replication": 5,
Expand All @@ -240,7 +240,7 @@ To override the replication factor, include `replication` with the desired value

To override the volume settings, include `num_volumes` with the desired value, as well as `volume_size` with the volume size in GB for each of those volumes. For example, to have two volumes with 100GB each, overrides should be specified as follows:

```configuration
```yaml
{
"universe_name": "cloud-override-demo",
"num_volumes": 2,
Expand All @@ -252,7 +252,7 @@ To override the volume settings, include `num_volumes` with the desired value, a

To override the YugabyteDB software version to be used, include `yb_version` with the desired value, ensuring that this version exists in YugabyteDB Anywhere, as follows:

```configuration
```yaml
{
"universe_name": "cloud-override-demo",
"yb_version": "1.1.6.0-b4"
Expand Down
28 changes: 14 additions & 14 deletions docs/content/stable/architecture/docdb-replication/replication.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Replication in DocDB
headerTitle: Synchronous replication
linkTitle: Synchronous
description: Learn how YugabyteDB uses the Raft consensus in DocDB to replicate data across multiple independent fault domains like nodes, zones, regions and clouds.
description: Learn how YugabyteDB uses the Raft consensus in DocDB to replicate data across multiple independent fault domains like nodes, zones, regions, and clouds.
headContent: Synchronous replication using the Raft consensus protocol
menu:
stable:
Expand All @@ -16,7 +16,7 @@ Using the Raft distributed consensus protocol, DocDB automatically replicates da

## Concepts

A number of concepts is central to replication.
A number of concepts are central to replication.

### Fault domains

Expand All @@ -26,7 +26,7 @@ A fault domain comprises a group of nodes that are prone to correlated failures.
* Regions or datacenters
* Cloud providers

Data is typically replicated across fault domains to be resilient to the outage of all nodes in that fault domain.
Data is typically replicated across fault domains to be resilient to the outage of all nodes in one fault domain.

### Fault tolerance

Expand All @@ -40,51 +40,51 @@ YugabyteDB replicates data across nodes (or fault domains) in order to tolerate

Replication of data in DocDB is achieved at the level of tablets, using tablet peers, with each table sharded into a set of tablets, as demonstrated in the following diagram:

<img src="/images/architecture/replication/tablets_in_a_docsb_table.png" style="max-width:750px;"/>
![Tablets in a table](/images/architecture/replication/tablets_in_a_docsb_table.png)

Each tablet comprises of a set of tablet peers, each of which stores one copy of the data belonging to the tablet. There are as many tablet peers for a tablet as the replication factor, and they form a Raft group. The tablet peers are hosted on different nodes to allow data redundancy on node failures. The replication of data between the tablet peers is strongly consistent.
Each tablet comprises of a set of tablet peers, each of which stores one copy of the data belonging to the tablet. There are as many tablet peers for a tablet as the replication factor, and they form a Raft group. The tablet peers are hosted on different nodes to allow data redundancy to protect against node failures. The replication of data between the tablet peers is strongly consistent.

The following diagram depicts three tablet peers that belong to a tablet called `tablet 1`. The tablet peers are hosted on different YB-TServers and form a Raft group for leader election, failure detection, and replication of the write-ahead logs.

![raft_replication](/images/architecture/raft_replication.png)
![RAFT Replication](/images/architecture/raft_replication.png)

### Raft replication

As soon as a tablet initiates, it elects one of the tablet peers as the tablet leader using the [Raft](https://raft.github.io/) protocol. The tablet leader becomes responsible for processing user-facing write requests by translating the user-issued writes into the document storage layer of DocDB. In addition, the tablet leader replicates among the tablet peers using Raft to achieve strong consistency. Setting aside the tablet leader, the remaining tablet peers of the Raft group are called tablet followers.

The set of DocDB updates depends on the user-issued write, and involves locking a set of keys to establish a strict update order, and optionally reading the older value to modify and update in case of a read-modify-write operation. The Raft log is used to ensure that the database state-machine of a tablet is replicated amongst the tablet peers with strict ordering and correctness guarantees even in the face of failures or membership changes. This is essential to achieving strong consistency.

Once the Raft log is replicated to a majority of tablet-peers and successfully persisted on the majority, the write is applied into the DocDB document storage layer and is subsequently available for reads. Once the write is persisted on disk by the document storage layer, the write entries can be purged from the Raft log. This is performed as a controlled background operation without any impact to the foreground operations.
After the Raft log is replicated to a majority of tablet-peers and successfully persisted on the majority, the write is applied into the DocDB document storage layer and is subsequently available for reads. After the write is persisted on disk by the document storage layer, the write entries can be purged from the Raft log. This is performed as a controlled background operation without any impact to the foreground operations.

## Replication in a cluster

The replicas of data can be placed across multiple fault domains. The following examples of a multi-zone deployment with three zones and the replication factor assumed to be 3 demonstrate how replication across fault domains is performed in a cluster.

### Multi-zone deployment

In the case of a multi-zone deployement, the data in each of the tablets in a node is replicated across multiple zones using the Raft consensus algorithm. All the read and write queries for the rows that belong to a given tablet are handled by that tablets leader, as per the following diagram:
In the case of a multi-zone deployment, the data in each of the tablets in a node is replicated across multiple zones using the Raft consensus algorithm. All the read and write queries for the rows that belong to a given tablet are handled by that tablet's leader, as per the following diagram:

<img src="/images/architecture/replication/raft-replication-across-zones.png" style="max-width:750px;"/>
![Replication across zones](/images/architecture/replication/raft-replication-across-zones.png)

As a part of the Raft replication, each tablet peer first elects a tablet leader responsible for serving reads and writes. The distribution of tablet leaders across different zones is determined by a user-specified data placement policy which, in the preceding scenario, ensures that in the steady state, each of the zones has an equal number of tablet leaders. The following diagram shows how the tablet leaders are dispersed:
As a part of the Raft replication, each tablet peer first elects a tablet leader responsible for serving reads and writes. The distribution of tablet leaders across different zones is determined by a user-specified data placement policy, which, in the preceding scenario, ensures that in the steady state, each of the zones has an equal number of tablet leaders. The following diagram shows how the tablet leaders are dispersed:

<img src="/images/architecture/replication/optimal-tablet-leader-placement.png" style="max-width:750px;"/>
![Tablet leader placement](/images/architecture/replication/optimal-tablet-leader-placement.png)

### Tolerating a zone outage

As soon as a zone outage occurs, YugabyteDB assumes that all nodes in that zone become unavailable simultaneously. This results in one-third of the tablets (which have their tablet leaders in the zone that just failed) not being able to serve any requests. The other two-thirds of the tablets are not affected. The following illustration shows the tablet peers in the zone that failed:

<img src="/images/architecture/replication/tablet-leaders-vs-followers-zone-outage.png" style="max-width:750px;"/>
![Tablet peers in a failed zone](/images/architecture/replication/tablet-leaders-vs-followers-zone-outage.png)

For the affected one-third, YugabyteDB automatically performs a failover to instances in the other two zones. Once again, the tablets being failed over are distributed across the two remaining zones evenly, as per the following diagram:

<img src="/images/architecture/replication/automatic-failover-zone-outage.png" style="max-width:750px;"/>
![Automatic failover](/images/architecture/replication/automatic-failover-zone-outage.png)

### RPO and RTO on zone outage

The recovery point objective (RPO) for each of these tablets is 0, meaning no data is lost in the failover to another zone. The recovery time objective (RTO) is 3 seconds, which is the time window for completing the failover and becoming operational out of the new zones, as per the following diagram:

<img src="/images/architecture/replication/rpo-vs-rto-zone-outage.png" style="max-width:750px;"/>
![RPO vs RTO](/images/architecture/replication/rpo-vs-rto-zone-outage.png)

## Follower reads

Expand Down
Loading

0 comments on commit 6320b2c

Please sign in to comment.