Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partner Experience: horizon upgrade best practices #241

Merged
merged 6 commits into from
Sep 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions docs/run-platform-server/installing.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ To install Horizon in production or non-development environments, we recommend t

### Containerized

- Non-Orchestrated, if the target deployment environment does not include a container orchestrator such as Kubernetes, then this means you intend to run the horizon image [dockerhub.com/stellar/stellar-horizon](https://hub.docker.com/r/stellar/stellar-horizon) as a container directly with docker daemon on host. Choose the tag of the horizon image for specific release version and then pull the image using `docker pull stellar/stellar-horizon:<tag_version>` to get it locally onto host.
- Non-Orchestrated, if the target deployment environment does not include a container orchestrator such as Kubernetes, then this means you intend to run the horizon release image from [dockerhub.com/stellar/stellar-horizon](https://hub.docker.com/r/stellar/stellar-horizon) as a container directly with docker daemon on host. Choose the tag of the horizon image for specific release version and then pull the image using `docker pull stellar/stellar-horizon:<tag_version>` to get it locally onto host.
- Orchestrated, when the target envrionment has container orchestration such as Kubernetes cluster, then we recommend to use [Horizon Helm chart](https://github.com/stellar/helm-charts/tree/main/charts/horizon) to manage the installation and deployment life cycle of the horizon image as container(s) on the cluster.

For installation in development environments, please refer to the [Horizon README](https://github.com/stellar/go/blob/master/services/horizon/README.md#try-it-out) from the source code repo for options to use in development context.
Expand All @@ -23,7 +23,12 @@ For installation in development environments, please refer to the [Horizon READM

#### Package Manager

SDF publishes new releases to its custom Ubuntu repositories. Follow [this guide](https://github.com/stellar/packages/blob/master/docs/adding-the-sdf-stable-repository-to-your-system.md#adding-the-sdf-stable-repository-to-your-system) to add the stable SDF repository to your host system and [Install package](https://github.com/stellar/packages/blob/master/docs/installing-individual-packages.md#installing-individual-packages) outlines the various commands that these packages make available. To proceed with installation:
SDF publishes new releases to its custom Ubuntu repositories. Follow [this guide](https://github.com/stellar/packages/blob/master/docs/adding-the-sdf-stable-repository-to-your-system.md#adding-the-sdf-stable-repository-to-your-system) to add the stable SDF repository to your host system. If you are interestd in installing release candidate versions of software that have not reached stable yet, refer to [using 'testing' SDF repository](https://github.com/stellar/packages/blob/master/docs/adding-the-sdf-stable-repository-to-your-system.md#adding-the-bleeding-edge-testing-repository) and lastly, [Install package](https://github.com/stellar/packages/blob/master/docs/installing-individual-packages.md#installing-individual-packages) outlines the various commands that these packages make available.


https://github.com/stellar/packages/blob/master/docs/adding-the-sdf-stable-repository-to-your-system.md#adding-the-bleeding-edge-testing-repository

To proceed with installation:

<CodeExample>

Expand All @@ -46,7 +51,7 @@ Some shells (such as [zsh](https://www.zsh.org/)) cache PATH lookups. You may ne

#### Helm Chart Installation

If the deployment can be done on Kubernetes, there is a [horizon helm chart](https://github.com/stellar/helm-charts/blob/main/charts/horizon) available. Install the [Helm cli tool](https://helm.sh/docs/intro/install/), if you haven't already on your workstation, minimum of version 3. Next, add the Stellar repo to the helm client's list of repos and view the list of available chart versions:
If the deployment can be done on Kubernetes, there is a [horizon helm chart](https://github.com/stellar/helm-charts/blob/main/charts/horizon) available. Install the [Helm cli tool](https://helm.sh/docs/intro/install/), if you haven't already on your workstation, minimum of version 3. Next, add the Stellar repo to the helm client's list of repos and confirm that you can view list of available chart versions for the repo:

<CodeExample>

Expand Down
6 changes: 5 additions & 1 deletion docs/run-platform-server/running.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,10 @@ If you chose containerized installation, you don't execute the horizon binary di
helm install my-horizon stellar/horizon \
--namespace my-horizon-namespace-on-cluster \
--set ingest.persistence.enabled=true \
--set web.replicaCount=1 \
--set web.enabled=true \
--set ingest.enabled=true \
--set ingest.replicaCount=1 \
--set web.existingSecret=my-db-secret \
--set global.image.horizon.tag=2.26.1 \
--set global.network=testnet \
Expand All @@ -55,7 +58,8 @@ If you chose containerized installation, you don't execute the horizon binary di
This example of helm chart usage, highlights some key aspects:

- uses the `global.network=[testnet|pubnet]` parameter, this automates generation of all the horizon configuration parameters specific to network such as archive urls, captive core config, etc and other parameters as mentioned on [Configuring](./configuring.mdx).
- enables all roles on the deployment instance: ingesting, api, transaction submission.
- `global.image.horizon.tag` should be set to one of docker hub tags published on [stellar/stellar-horizon](https://hub.docker.com/r/stellar/stellar-horizon)
- enables all roles on the deployment instance: ingesting, and web api(includes transaction submission). If you choose to have a multi-instance deployment with each instance performing a single role of just web api or ingestion, then you will do two helm installations, one for each role: `my-horizon-ingestion-installation` and `my-horizon-api-installation`. Each of these helm installations will set `ingest.enabled`, `web.enabled`, `ingest.replicaCount`, `web.replicaCount` respectively for the role they are performing.
- to customize further, the best approach is to download the [horizon helm chart values.yaml](https://github.com/stellar/helm-charts/blob/main/charts/horizon/values.yaml), update the settings in your local copy of values.yaml, and pass to helm install, rather than have many individual `--set` on helm install:

<CodeExample>
Expand Down
10 changes: 2 additions & 8 deletions docs/run-platform-server/scaling.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,6 @@ For low to medium load environments with up to 30-90 days of data history retent

![](/assets/horizon-scaling/Topology-2VMs.png)

### Extension: Isolating Captive Core

Additionally, Captive Core can be further isolated into its own VM, especially for isolating high throughput historical catch-up with parallel workers, leaving it unaffected by API request servicing load.

![](/assets/horizon-scaling/Topology-3VMs.png)

## Enterprise _n_-Tier

This architecture services high request and data processing throughput with isolation and redundancy for each component. Scale the API service horizontally by adding a load balancer in front of multiple API service instances, each only limited by the database I/O limit. If necessary, use ALB routing to direct specific endpoints to specific request-serving instances, which are tied to a specific, dedicated DB. Now, if an intense endpoint gets clobbered, all other endpoints are unaffected.
Expand All @@ -39,8 +33,8 @@ Additionally, a second Captive Core instance shares ingestion load and serves as

![](/assets/horizon-scaling/Topology-Enterprise.png)

### Extension: Redundant Hot Backup
### Redundant Hot Backup

The entire architecture can be replicated to a second cluster. The backup cluster can be upgraded independently or fail-overed to with no downtime. Additionally, capacity can be doubled in an emergency if needed.
The entire architecture can be replicated to a second cluster. The backup cluster can be upgraded independently or fail-overed to with no downtime. Additionally, capacity can be doubled in an emergency if needed. This is synonymous with the [Blue/Green deployment model](https://en.wikipedia.org/wiki/Blue%E2%80%93green_deployment).

![](/assets/horizon-scaling/Topology-Enterprise-HotBackup.png)
142 changes: 142 additions & 0 deletions docs/run-platform-server/upgrading.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
---
title: Upgrading
sidebar_position: 80
---

import { Alert } from "@site/src/components/Alert";
import { CodeExample } from "@site/src/components/CodeExample";

We'll describe the recommended steps for upgrading a Horizon 2.x installation.

### Pre-requisites

- An existing Horizon deployment consisting of one or more instances of Horizon.
- All instances are on same 2.x version to begin.
- If [bare metal](./installing.mdx#bare-metal) install, you have shell, or command line access to each host having a Horizon installation.
- If [deployed direct on docker daemon](./installing.mdx#containerized), you have command line access to the host that is running the docker daemon.
- If [deployed on Kubernetes with Helm chart](./installing.mdx#helm-chart-installation), you have kubectl and helm command line tools on your workstation and a user login with appropriate access levels to change resources in target namespace of Horizon deployment on the cluster.

### Assess current installation

- Identify list of all instances of Horizon that need to be upgraded.

- bare metal installations, the list of hosts is managed by you.
- docker daemon deployments, the list of hosts and running containers is managed by you.
- kubernetes deployments, get the list of pods that are deployed from your prior helm installation, they will have an annotation for `release=your_helm_horizon_installation_name`:

<CodeExample>

```bash
kubectl get pods -l release=your_helm_horizon_installation_name -n <your_horizon_namespace>
```

</CodeExample>

- Identify your current Horizon software version:

- obtain command line access to the operating system of each horizon instance:
- bare metal installations, this is typically ssh on linux or powershell on windows.
- docker daemon deployments, use `docker exec -it <containerid> /bin/bash`
- For kubernetes deployments, use `kubectl exec -it <pod_name> -n <horizon_namespace> -- /bin/bash`
- on command line of each instance, run `stellar-horizon version`

- All instances should report the same version, if not, the system may be inconsistent, use this upgrade as opportunity to establish consistency and get them all on same version.

### Determine the target version for upgrade

Now that you know your current horizon version, visit [Horizon Releases](https://github.com/stellar/go/releases) and choose the next greater version above your current version to upgrade. Follow steps [recommended by Github to compare releases](https://docs.github.com/en/repositories/releasing-projects-on-github/comparing-releases), click on the `Compare` dropdown of the chosen release, and then select your current release and GH will display the differences between versions, select the `Files changed` tab, and go to the `services/horizon/CHANGELOG.md`, it will highlight the new release notes for changes that have occurred between your current version and the new version you selected. Review this and look for any `Breaking Changes`, `State Rebuild` and `DB Schema Migration` sections for consideration, as the latter two will also mention expected time for the state rebuild or db migration to apply respectively.

### Install the new version

Now that you have indentified the new version and are aware of the potential impacts from upgrading to new version based on release notes, such as state rebuilds and db migrations, you are informed and ready to proceed with upgrade.

Upgrading production deployents should leverage a secondary, hot-backup deployment, also known as a [blue/green model](./scaling.mdx#redundant-hot-backup) and perform the upgrade on the inactive deployment first. This will avoid downtime of system to your external users, as the upgrade takes place on the inactive deployment.

A good strategy for upgrading horizon and applicable to single or multi-instance deployments - shut all instances down, install new horizon version on one of the ingesting instances first. The reason being horizon software will only initate `State Rebuild` and `DB Schema Migration` actions related to an upgrade on an instance that it detects ingestion has been enabled with configuration parameter, `INGEST=true`. This lowers complexity for you during the upgrade as you only need to focus on one instance and it avoids potential concurrent horizon ingestion processes attempting the same upgrade on the database.

- bare metal installations, stop the horizon process on all instances first, then shell into one instance that is configured for ingestion, and use apt package manager on linux.

<CodeExample>

```bash
sudo apt update
sudo apt install stellar-horizon=new_horizon_debian_pkg_version
```

</CodeExample>

Restart horizon using the configuration already in place, but include `APPLY_MIGRATIONS=true` environment variable, this will trigger horizon to automatically run any db migrations that it detects are needed.

- docker daemon deployments, stop all docker containers first, then choose one container that has ingestion enabled, set the new tag for the image based on release published on dockerhub - [stellar/stellar-horizon](https://hub.docker.com/r/stellar/stellar-horizon/tags), and restart the container in docker daemon, include `APPLY_MIGRATIONS=true` environment variable to the container envrionment, this will trigger horizon to automatically run any db migrations that it detects are needed.
- For helm installations on kubernetes, first use your helm cli tool to stop all horizon instances by scaling all your horizon installations(for ingest and web) down to 0 replicas, which you've created prior on [run steps](./running.mdx).

<CodeExample>

```bash
helm upgrade all-my-horizon-installations \
--namespace my-horizon-namespace-on-cluster \
--set ingest.replicaCount=0 \
--set web.replicaCount=0
```

</CodeExample>

Now, use helm to start just a single horizon instance from a helm installation that has ingestion enabled on kubernetes cluster, you will set the `global.image.horizon.tag` to the release tag published on [stellar/stellar-horizon](https://hub.docker.com/r/stellar/stellar-horizon/tags)

<CodeExample>

```bash
helm upgrade my-horizon \
--namespace my-horizon-namespace-on-cluster \
--set global.image.horizon.tag=new_horizon_release_number \
--set ingest.horizonConfig.applyMigrations=True \
--set ingest.replicaCount=1
```

</CodeExample>

### Confirming the upgrade on single ingestion instance first

If you have [monitoring](./monitoring.mdx) infrastructure in place, then you have two options for assessing the upgrade status:

- View metrics outputs using grafana dashboards that leverage queries on the [horizon metrics data model](./monitoring.mdx#data-model) to check key stats like ingestion and network ledgers are advancing and in step.

- View the horizon web server 'status' url path on the upgraded instance:

<CodeExample>

```bash
curl http://localhost:8000/
```

</CodeExample>

The response will be HTTP status code 200 and body of response will be a text based json data structure with diagnostic info on current horizon software version, and ledger numbers for ingestion and network, refresh the url every 5 seconds or so, and should see the ingestion and network ledger numbers advancing and in step, indicating good connection to the network and ingestion.

If metrics and/or the horizon 'status' url respones don't indicate healthy status based on advancing ledger ingestion, two steps to triage further:

- A delay in horizon achieving healthy status after an upgrade is expected and legitmate for any upgrade cases where `State Rebuild` or `DB Migration` was noted in the release delta as part of prior [Determine the target version for upgrade step](#determine-the-target-version-for-upgrade). Typically the notes will also mention relative timeframe expectations for those to complete which can be factored in to how long to wait on delay.
- Check the logs from the upgraded instance to confirm what's going on. Any `State Rebuild` or `DB Migration` initiated will be mentioned. For example, a db migration will be noted in logs with following lines for start and finish:
```
2023/09/22 18:27:01 Applying DB migrations...
2023/09/22 18:27:01 successfully applied 5 horizon migrations
```

### Upgrade all remaining instances

At this point, you have upgraded one ingesting instance to the new horizon version, it has automatically updated the database if required and the instance is running with healthy status. Now, install the same horizon software version on the remainder of instances, restarting each after the upgrade. For bare-metal and docker daemon installations that will likely be self explanatory on how to accomplish that for remainder of instances, on helm chart installations, run the helm upgrade again, setting the image tag and also restoring original `replicaCount`s:

<CodeExample>

```bash
helm upgrade all-my-horizon-installations \
--namespace my-horizon-namespace-on-cluster \
--set ingest.replicaCount=1 \
--set web.replicaCount=1 \
--set global.image.horizon.tag=new_horizon_release_number \
--set ingest.horizonConfig.applyMigrations=False
```

</CodeExample>

For production deployments following the hot backup or blue/green model, this is the opportunity to confirm the inactive deployment has taken the upgrade correctly and stable, at which point, switch the load balancers now to forward traffic to the inactive deployment, making it the active deployment. Now, can take time to perform same upgrade on the other deployment which is now inactive.