-
Notifications
You must be signed in to change notification settings - Fork 500
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
exp/services/ledgerexporter: Updated README with step by step guide t…
…o installing and running ledger exporter
- Loading branch information
Showing
5 changed files
with
207 additions
and
81 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,101 +1,184 @@ | ||
# Ledger Exporter (Work in Progress) | ||
## Ledger Exporter: Installation and Usage Guide | ||
|
||
The Ledger Exporter is a tool designed to export ledger data from a Stellar network and upload it to a specified destination. It supports both bounded and unbounded modes, allowing users to export a specific range of ledgers or continuously export new ledgers as they arrive on the network. | ||
This guide provides step-by-step instructions on installing and using the Ledger Exporter, a tool that helps you export Stellar network ledger data to a Google Cloud Storage (GCS) bucket for efficient analysis and storage. | ||
|
||
Ledger Exporter currently uses captive-core as the ledger backend and GCS as the destination data store. | ||
|
||
# Exported Data Format | ||
The tool allows for the export of multiple ledgers in a single exported file. The exported data is in XDR format and is compressed using zstd before being uploaded. | ||
**Table of Contents** | ||
|
||
```go | ||
type LedgerCloseMetaBatch struct { | ||
StartSequence uint32 | ||
EndSequence uint32 | ||
LedgerCloseMetas []LedgerCloseMeta | ||
} | ||
``` | ||
* [Prerequisites](#prerequisites) | ||
* [Installation Steps](#installation-steps) | ||
* [Set Up GCP Credentials](#set-up-gcp-credentials) | ||
* [Create a GCS Bucket](#create-a-gcs-bucket) | ||
* [Configuration](#configuration) | ||
* [Create a Configuration File (`config.toml`)](#create-a-configuration-file-configtoml) | ||
* [Running the Ledger Exporter](#running-the-ledger-exporter) | ||
* [Pull the Docker Image](#1-pull-the-docker-image) | ||
* [Run the Ledger Exporter](#2-run-the-ledger-exporter) | ||
* [CLI Commands](#cli-commands) | ||
1. [scan-and-fill](#1-scan-and-fill) | ||
2. [append](#2-append) | ||
|
||
## Getting Started | ||
## Prerequisites | ||
|
||
### Installation (coming soon) | ||
* **Google Cloud Platform (GCP) Account:** You'll need a GCP account to create a GCS bucket for storing the exported data. | ||
* **Docker:** Allows you to run the Ledger Exporter in a self-contained environment. The official Docker installation guide: [https://docs.docker.com/engine/install/](https://docs.docker.com/engine/install/) | ||
|
||
### Command Line Options | ||
## Installation Steps | ||
|
||
#### Scan and Fill Mode: | ||
Exports a specific range of ledgers, defined by --start and --end. Will only export to remote datastore if data is absent. | ||
```bash | ||
ledgerexporter scan-and-fill --start <start_ledger> --end <end_ledger> --config-file <config_file_path> | ||
``` | ||
### Set Up GCP Credentials | ||
|
||
#### Append Mode: | ||
Exports ledgers initially searching from --start, looking for the next absent ledger sequence number proceeding --start on the data store. If abscence is detected, the export range is narrowed to `--start <absent_ledger_sequence>`. | ||
This feature requires ledgers to be present on the remote data store for some (possibly empty) prefix of the requested range and then absent for the (possibly empty) remainder. | ||
Create application default credentials for your Google Cloud Platform (GCP) project by following these steps: | ||
1. Download the [SDK](https://cloud.google.com/sdk/docs/install). | ||
2. Install and initialize the [gcloud CLI](https://cloud.google.com/sdk/docs/initializing). | ||
3. Create [application authentication credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc#google-idp) and store it in a secure location on your system, such as $HOME/.config/gcloud/application_default_credentials.json. | ||
|
||
In this mode, the --end ledger can be provided to stop the process once export has reached that ledger, or if absent or 0 it will result in continous exporting of new ledgers emitted from the network. | ||
For detailed instructions, refer to the [Providing Credentials for Application Default Credentials (ADC) guide.](https://cloud.google.com/docs/authentication/provide-credentials-adc) | ||
|
||
It’s guaranteed that ledgers exported during `append` mode from `start` and up to the last logged ledger file `Uploaded {ledger file name}` were contiguous, meaning all ledgers within that range were exported to the data lake with no gaps or missing ledgers in between. | ||
```bash | ||
ledgerexporter append --start <start_ledger> --config-file <config_file_path> | ||
``` | ||
### Create a GCS Bucket | ||
|
||
### Configuration (toml): | ||
The `stellar_core_config` supports two ways for configuring captive core: | ||
- use prebuilt captive core config toml, archive urls, and passphrase based on `stellar_core_config.network = testnet|pubnet`. | ||
- manually set the the captive core confg by supplying these core parameters which will override any defaults when `stellar_core_config.network` is present also: | ||
`stellar_core_config.captive_core_toml_path` | ||
`stellar_core_config.history_archive_urls` | ||
`stellar_core_config.network_passphrase` | ||
1. Go to the GCP Console's Storage section ([https://console.cloud.google.com/storage](https://console.cloud.google.com/storage)) and create a new bucket. | ||
2. Choose a descriptive name for the bucket, such as `stellar-ledger-data`. | ||
3. **Note down the bucket name** as you'll need it later in the configuration process. | ||
|
||
Ensure you have stellar-core installed and set `stellar_core_config.stellar_core_binary_path` to it's path on o/s. | ||
## Configuration | ||
|
||
Enable web service that will be bound to localhost post and publishes metrics by including `admin_port = {port}` | ||
### Create a Configuration File (`config.toml`) | ||
|
||
The configuration file specifies details about your GCS bucket, stellar network and other settings. | ||
|
||
Replace the placeholder values in the [sample file](config.example.toml) with your specific information: | ||
|
||
<details> | ||
<summary> Sample TOML Configuration (config.toml) </summary> | ||
|
||
An example config, demonstrating preconfigured captive core settings and gcs data store config. | ||
```toml | ||
# Admin port configuration | ||
# Specifies the port number for hosting the web service locally to publish metrics. | ||
admin_port = 6061 | ||
|
||
[datastore_config] | ||
# Datastore Configuration | ||
[datastore] | ||
# Specifies the type of datastore. Currently, only Google Cloud Storage (GCS) is supported. | ||
type = "GCS" | ||
|
||
[datastore_config.params] | ||
destination_bucket_path = "your-bucket-name/<optional_subpath1>/<optional_subpath2>/" | ||
[datastore.parameters] | ||
# The Google Cloud Storage bucket path for storing data, with optional subpaths for organization. | ||
bucket_path = "your-bucket-name/<optional_subpath1>/<optional_subpath2>/" | ||
|
||
[datastore.schema] | ||
# Configuration for ledger and file storage. | ||
ledgers_per_file = 64 # Number of ledgers stored in each file. | ||
files_per_partition = 10 # Number of files per partition directory. | ||
|
||
[datastore_config.schema] | ||
ledgers_per_file = 64 | ||
files_per_partition = 10 | ||
# Stellar-core Configuration | ||
[stellar_core] | ||
# Use default captive-core config based on network | ||
# Options are "testnet" for the test network or "pubnet" for the public network. | ||
network = "testnet" | ||
|
||
[stellar_core_config] | ||
network = "testnet" | ||
stellar_core_binary_path = "/my/path/to/stellar-core" | ||
captive_core_toml_path = "my-captive-core.cfg" | ||
history_archive_urls = ["http://testarchiveurl1", "http://testarchiveurl2"] | ||
network_passphrase = "test" | ||
# Alternatively, you can manually configure captive-core parameters (overrides defaults if 'network' is set). | ||
|
||
# Path to the captive-core configuration file. | ||
#captive_core_config_path = "my-captive-core.cfg" | ||
|
||
# URLs for Stellar history archives, with multiple URLs allowed. | ||
#history_archive_urls = ["http://testarchiveurl1", "http://testarchiveurl2"] | ||
|
||
# Network passphrase for the Stellar network. | ||
#network_passphrase = "Test SDF Network ; September 2015" | ||
|
||
# Path to stellar-core binary | ||
# Not required when running in a Docker container as it has the stellar-core installed and path is set. | ||
# When running outside of Docker, it will look for stellar-core in the OS path if it exists. | ||
#stellar_core_binary_path = "/my/path/to/stellar-core | ||
``` | ||
</details> | ||
|
||
### Exported Files | ||
## Running the Ledger Exporter | ||
|
||
#### File Organization: | ||
- Ledgers are grouped into files, with the number of ledgers per file set by `ledgers_per_file`. | ||
- Files are further organized into partitions, with the number of files per partition set by `files_per_partition`. | ||
### 1. Pull the Docker Image | ||
|
||
### Filename Structure: | ||
- Filenames indicate the ledger range they contain, e.g., `0-63.xdr.zstd` holds ledgers 0 to 63. | ||
- Partition directories group files, e.g., `/0-639/` holds files for ledgers 0 to 639. | ||
Open a terminal window and run the following command to download the Stellar Ledger Exporter Docker image: | ||
|
||
#### Example: | ||
with `ledgers_per_file = 64` and `files_per_partition = 10`: | ||
- Partition names: `/0-639`, `/640-1279`, ... | ||
- Filenames: `/0-639/0-63.xdr.zstd`, `/0-639/64-127.xdr.zstd`, ... | ||
```bash | ||
docker pull stellar/ledger-exporter | ||
``` | ||
|
||
### 2. Run the Ledger Exporter | ||
|
||
#### Special Cases: | ||
The following command demonstrates how to run the Ledger Exporter: | ||
|
||
- If `ledgers_per_file` is set to 1, filenames will only contain the ledger number. | ||
- If `files_per_partition` is set to 1, filenames will not contain the partition. | ||
```bash | ||
docker run --platform linux/amd64 -d \ | ||
-v "$HOME/.config/gcloud/application_default_credentials.json":/.config/gcp/credentials.json:ro \ | ||
-e GOOGLE_APPLICATION_CREDENTIALS=/.config/gcp/credentials.json \ | ||
-v ${PWD}/config.toml:/config.toml \ | ||
stellar/ledger-exporter <command> [options] | ||
``` | ||
|
||
#### Note: | ||
- Avoid changing `ledgers_per_file` and `files_per_partition` after configuration for consistency. | ||
**Explanation:** | ||
|
||
#### Retrieving Data: | ||
- To locate a specific ledger sequence, calculate the partition name and ledger file name using `files_per_partition` and `ledgers_per_file`. | ||
- The `GetObjectKeyFromSequenceNumber` function automates this calculation. | ||
* `--platform linux/amd64`: Specifies the platform architecture (adjust if needed for your system). | ||
* `-d`: Runs the container in detached mode (background process). | ||
* `-v`: Mounts volumes to map your local GCP credentials and config.toml file to the container: | ||
* `$HOME/.config/gcloud/application_default_credentials.json`: Your local GCP credentials file. | ||
* `${PWD}/config.toml`: Your local configuration file. | ||
* `-e GOOGLE_APPLICATION_CREDENTIALS=/.config/gcp/credentials.json`: Sets the environment variable for credentials within the container. | ||
* `stellar/ledger-exporter`: The Docker image name. | ||
* `<command>`: The Stellar Ledger Exporter command (e.g., [scan-and-fill](#1-scan-and-fill), [append](#2-append)) | ||
|
||
## CLI Commands | ||
|
||
The Ledger Exporter offers two primary commands to manage ledger data export: | ||
|
||
### 1. scan-and-fill | ||
|
||
**Purpose:** | ||
Exports a specific range of Stellar ledgers, defined by the `--start` and `--end` options. | ||
|
||
**Behavior:** | ||
- Scans the specified ledger sequence range. | ||
- Exports only missing ledgers to the remote datastore (GCS bucket). | ||
- Avoids unnecessary exports if data is already present. | ||
|
||
**Usage:** | ||
|
||
```bash | ||
docker run --platform linux/amd64 -d \ | ||
-v "$HOME/.config/gcloud/application_default_credentials.json":/.config/gcp/credentials.json:ro \ | ||
-e GOOGLE_APPLICATION_CREDENTIALS=/.config/gcp/credentials.json \ | ||
-v ${PWD}/config.toml:/config.toml \ | ||
stellar/ledger-exporter \ | ||
scan-and-fill --start <start_ledger> --end <end_ledger> [--config <config_file>] | ||
``` | ||
|
||
Arguments: | ||
- `--start <start_ledger>` (required): The starting ledger sequence number in the range to export. | ||
- `--end <end_ledger>` (required): The ending ledger sequence number in the range. | ||
- `--config <config_file_path>` (optional): The path to your configuration file, containing details like GCS bucket information. Defaults to `config.toml` in the runtime working directory. | ||
|
||
### 2. append | ||
|
||
**Purpose:** | ||
Exports ledgers starting from `--start`, searching for the next missing ledger sequence number in the datastore. If a missing ledger is found, the export begins from that missing ledger. | ||
|
||
**Behavior:** | ||
- Starts searching from the provided `--start` ledger and identifies the first missing ledger sequence number after `--start` in the remote datastore (GCS bucket). | ||
- Narrows the export range to include only missing ledgers from that point onwards. | ||
- If the `--end` ledger is provided, it will stop the process once export has reached that ledger. If the `--end` ledger is absent or set to 0, the exporter will continuously export new ledgers as they appear on the network. | ||
|
||
**Usage:** | ||
|
||
```bash | ||
docker run --platform linux/amd64 -d \ | ||
-v "$HOME/.config/gcloud/application_default_credentials.json":/.config/gcp/credentials.json:ro \ | ||
-e GOOGLE_APPLICATION_CREDENTIALS=/.config/gcp/credentials.json \ | ||
-v ${PWD}/config.toml:/config.toml \ | ||
stellar/ledger-exporter \ | ||
append --start <start_ledger> [--end <end_ledger>] [--config <config_file>] | ||
``` | ||
|
||
Arguments: | ||
- `--start <start_ledger>` (required): The starting ledger sequence number for the export process. | ||
- `--end <end_ledger>` (optional): The ending ledger sequence number. If omitted or set to 0, the exporter will continuously export new ledgers as they appear on the network. | ||
- `--config <config_file_path>` (optional): The path to your configuration file, containing details like GCS bucket information. Defaults to `config.toml` in the runtime working directory. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
|
||
# Sample TOML Configuration | ||
|
||
# Admin port configuration | ||
# Specifies the port number for hosting the web service locally to publish metrics. | ||
admin_port = 6061 | ||
|
||
# Datastore Configuration | ||
[datastore] | ||
# Specifies the type of datastore. Currently, only Google Cloud Storage (GCS) is supported. | ||
type = "GCS" | ||
|
||
[datastore.parameters] | ||
# The Google Cloud Storage bucket path for storing data, with optional subpaths for organization. | ||
bucket_path = "your-bucket-name/<optional_subpath1>/<optional_subpath2>/" | ||
|
||
[datastore.schema] | ||
# Configuration for ledger and file storage. | ||
ledgers_per_file = 64 # Number of ledgers stored in each file. | ||
files_per_partition = 10 # Number of files per partition directory. | ||
|
||
# Stellar-core Configuration | ||
[stellar_core] | ||
# Use default captive-core config based on network | ||
# Options are "testnet" for the test network or "pubnet" for the public network. | ||
network = "testnet" | ||
|
||
# Alternatively, you can manually configure captive-core parameters (overrides defaults if 'network' is set). | ||
|
||
# Path to the captive-core configuration file. | ||
#captive_core_config_path = "my-captive-core.cfg" | ||
|
||
# URLs for Stellar history archives, with multiple URLs allowed. | ||
#history_archive_urls = ["http://testarchiveurl1", "http://testarchiveurl2"] | ||
|
||
# Network passphrase for the Stellar network. | ||
#network_passphrase = "Test SDF Network ; September 2015" | ||
|
||
# Path to stellar-core binary | ||
# Not required when running in a Docker container as it has the stellar-core installed and path is set. | ||
# When running outside of Docker, it will look for stellar-core in the OS path if it exists. | ||
# If you want to override the path, you can do so here. | ||
#stellar_core_binary_path = "/my/path/to/stellar-core" | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters