Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewriting Horizon scaling documentation #659

Merged
merged 1 commit into from
Jun 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion network/horizon/admin-guide/prerequisites.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ These specifications assume a 30-day retention window for data storage. For a lo

## Multiple Instance Deployment

To achieve high availability, redundancy, and high throughput, explore the [scaling](./scaling.mdx) strategy. It provides detailed prerequisites and guidelines to determine the appropriate [number of Horizon instances](./configuring.mdx#multiple-instance-deployment) to deploy.
To achieve high availability, redundancy, and high throughput, refer to the [scaling](./scaling.mdx) documentation. It provides a detailed overview of several different deployment strategies you can employ, depending on the SLA you need your Horizon instance to achieve.

## Network Access

Expand Down
51 changes: 34 additions & 17 deletions network/horizon/admin-guide/scaling.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,38 +3,55 @@ title: Scaling
sidebar_position: 70
---

As alluded to in the discussion in [Prerequisites](./prerequisites.mdx), Horizon encompasses different logical tiers that can be scaled independently for high throughput, isolation, and high availability. The following components can be independently scaled:
Horizon enables different logical tiers that can be scaled independently for increasing throughput, isolation, and availability. The following components can be independently scaled:

- Web service API (serving)
- Captive Core (ingestion and transaction submission)
- Database (storage)
- Database (storage)

As always, scaling encompasses a spectrum. A few common scaling architectures follow.
## Single Instance Deployment

## Single VM
It is recommend to start with a [single instance deployment](./prerequisites.mdx), and scale up based on the needs of your particular use-case.

As a starting point, for development purposes or low load environments with limited history retention (e.g. a few ledger entries), a single VM would suffice.
This [deployment](./configuring.mdx#single-instance-deployment) is intended for use with minimal history retention (<= 30 days) and minimal request volume.

![](/assets/horizon-scaling/Topology-1VM.png)
In this setup, a single instance of Horizon performs all three [roles](./configuring.mdx#multiple-instance-deployment); ingestion, transaction submission, and end-user API requests.

## Low to Medium Load
![](/assets/horizon-scaling/Topology-single.png)

For low to medium load environments with up to 30-90 days of data history retention and modest API request traffic, this configuration isolates the database instance from the API service and ingestion process.
## Scaling to Multiple Instances

![](/assets/horizon-scaling/Topology-2VMs.png)
There are a few reasons you may choose to scale to multiple instances of Horizon.

## Enterprise _n_-Tier
- Horizontally scaling enables you to serve more API requests and at a faster rate
- Redundancy enables zero downtime in the cases where Horizon requires downtime on upgrade (migrations, state rebuilds, etc)
- Protection against potential ingestion lag, which could result in downtime for end-users

This architecture services high request and data processing throughput with isolation and redundancy for each component. Scale the API service horizontally by adding a load balancer in front of multiple API service instances, each only limited by the database I/O limit. If necessary, use ALB routing to direct specific endpoints to specific request-serving instances, which are tied to a specific, dedicated DB. Now, if an intense endpoint gets clobbered, all other endpoints are unaffected.
Multiple instances of Horizon can be configured to point to the same database, and the ingestion process will not perform redundant work in these cases.

Database instances can be scaled when the I/O limit is reached by using read-only replicated copies that stay in sync and a read/write instance connected to Captive Core. Each DB replica can support a set of request servers to support additional horizontal scaling.
When scaling Horizon, it is worth it to note that Horizon's [rate limiting](../api-reference/structure/rate-limiting.mdx) should be disabled and rate limiting should be managed external to Horizon within infrastructure. Horizon's rate limiting implementation is managed in-memory, so does not work with multiple instances.

Additionally, a second Captive Core instance shares ingestion load and serves as a backup in case of an instance failure.
![](/assets/horizon-scaling/Topology-multiple.png)

![](/assets/horizon-scaling/Topology-Enterprise.png)
## Logically Isolating Ingestion

### Redundant Hot Backup
Ingestion is the process by which new ledgers are propagated into Horizon's database. It's health is critical, as degredations in performance can result in falling behind the last closed ledger, leaving your end-users unaware of the current state of the network, and unable to successfully submit new transactions. Any lag in ingestion would likely be considered downtime for your service

The entire architecture can be replicated to a second cluster. The backup cluster can be upgraded independently or fail-overed to with no downtime. Additionally, capacity can be doubled in an emergency if needed. This is synonymous with the [Blue/Green deployment model](https://en.wikipedia.org/wiki/Blue%E2%80%93green_deployment).
Horizon allows you to independently configure the different [roles](./configuring.mdx#multiple-instance-deployment) that it performs, including ingestion. The below diagram illustrates how you could logically separate the instances serving API requests from the instances performing ingestion, and introduce a read-only replica database in order to further isolate these components. This setup has quite a few advantages:

![](/assets/horizon-scaling/Topology-Enterprise-HotBackup.png)
- Each "role" Horizon plays can be independently scaled
- API instances are significantly ligher weight from a hardware requirements perspective, since they do not need to run captive core
- API instances can be horizontally scaled or dynamically scaled, based on your specific end-user needs
- Ingestion and it's performance is isolated from API activity, so bursts in user activity cannot degrade it and cause ingestion lag. Ingestion health is critical, as degredations in performance can result in falling behind the last closed ledger, leaving your end-users unaware of the current state of the network, and unable to successfully submit new transactions

The Horizon API role requires only read-only permissions to a database for all actions it performs. However, the API instances will need to delegate all transaction submission requests to an instance which runs captive core. Further database replicas could be added if necessary to support more requests.

![](/assets/horizon-scaling/Topology-ingestion-isolation.png)

## Logically Isolating Transaction Submission

In the above example, ingestion is safely isolated from most API traffic, which has historically been the large majority of traffic. However, transaction submission still needs to be served by a core instance, and so API instances must passthrough their transaction submission requests to an ingesting instance.

The below diagram illustrates how we could further isolate (and scale) transaction submission, by way of using core watcher instances, rather than Horizon instances running captive core. This allows us to further protect ingestion, preventing downtime and ingestion lag. It also makes it possible to horizontally scale transaction submission itself, independent of the rest of the API traffic.

![](/assets/horizon-scaling/Topology-ingestion-isolation.png)
Binary file removed static/assets/horizon-scaling/Topology-1VM.png
Binary file not shown.
Binary file removed static/assets/horizon-scaling/Topology-2VMs.png
Binary file not shown.
Binary file removed static/assets/horizon-scaling/Topology-3VMs.png
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/assets/horizon-scaling/Topology-single.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/assets/horizon-scaling/Topology-txsub.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading