From 2f8b0ee441975d70036f920df9b035ea35314eba Mon Sep 17 00:00:00 2001
From: Urvi <urvi.savla@stellar.org>
Date: Thu, 20 Jun 2024 13:56:29 -0700
Subject: [PATCH] exp/services/ledgerexporter: Updated README with step by step
 guide to installing and running ledger exporter

---
 exp/services/ledgerexporter/README.md         | 221 ++++++++++++------
 .../ledgerexporter/config.example.toml        |  44 ++++
 exp/services/ledgerexporter/config.toml       |   1 -
 exp/services/ledgerexporter/internal/main.go  |   8 +-
 .../ledgerexporter/internal/main_test.go      |  14 +-
 5 files changed, 207 insertions(+), 81 deletions(-)
 create mode 100644 exp/services/ledgerexporter/config.example.toml

diff --git a/exp/services/ledgerexporter/README.md b/exp/services/ledgerexporter/README.md
index 57757508e1..bf24e74131 100644
--- a/exp/services/ledgerexporter/README.md
+++ b/exp/services/ledgerexporter/README.md
@@ -1,101 +1,184 @@
-# Ledger Exporter (Work in Progress)
+## Ledger Exporter: Installation and Usage Guide
 
-The Ledger Exporter is a tool designed to export ledger data from a Stellar network and upload it to a specified destination. It supports both bounded and unbounded modes, allowing users to export a specific range of ledgers or continuously export new ledgers as they arrive on the network.
+This guide provides step-by-step instructions on installing and using the Ledger Exporter, a tool that helps you export Stellar network ledger data to a Google Cloud Storage (GCS) bucket for efficient analysis and storage.
 
-Ledger Exporter currently uses captive-core as the ledger backend and GCS as the destination data store.
 
-# Exported Data Format
-The tool allows for the export of multiple ledgers in a single exported file. The exported data is in XDR format and is compressed using zstd before being uploaded.
+**Table of Contents**
 
-```go
-type LedgerCloseMetaBatch struct {
-    StartSequence uint32
-    EndSequence uint32
-    LedgerCloseMetas []LedgerCloseMeta
-}
-```
+* [Prerequisites](#prerequisites)
+* [Installation Steps](#installation-steps)
+  * [Set Up GCP Credentials](#set-up-gcp-credentials)
+  * [Create a GCS Bucket](#create-a-gcs-bucket)
+* [Configuration](#configuration)
+  * [Create a Configuration File (`config.toml`)](#create-a-configuration-file-configtoml)
+* [Running the Ledger Exporter](#running-the-ledger-exporter)
+  * [Pull the Docker Image](#1-pull-the-docker-image)
+  * [Run the Ledger Exporter](#2-run-the-ledger-exporter)
+* [CLI Commands](#cli-commands)
+  1. [scan-and-fill](#1-scan-and-fill)
+  2. [append](#2-append)
 
-## Getting Started
+## Prerequisites
 
-### Installation (coming soon)
+* **Google Cloud Platform (GCP) Account:**  You'll need a GCP account to create a GCS bucket for storing the exported data.
+* **Docker:** Allows you to run the Ledger Exporter in a self-contained environment. The official installation guide: [https://docs.docker.com/engine/install/](https://docs.docker.com/engine/install/)
 
-### Command Line Options
+## Installation Steps
 
-#### Scan and Fill Mode:
-Exports a specific range of ledgers, defined by --start and --end. Will only export to remote datastore if data is absent.
-```bash
-ledgerexporter scan-and-fill --start <start_ledger> --end <end_ledger> --config-file <config_file_path>
-```
+### Set Up GCP Credentials
 
-#### Append Mode:
-Exports ledgers initially searching from --start, looking for the next absent ledger sequence number proceeding --start on the data store. If abscence is detected, the export range is narrowed to `--start <absent_ledger_sequence>`. 
-This feature requires ledgers to be present on the remote data store for some (possibly empty) prefix of the requested range and then absent for the (possibly empty) remainder. 
+Create application default credentials for your Google Cloud Platform (GCP) project by following these steps:
+1. Download the [SDK](https://cloud.google.com/sdk/docs/install).
+2. Install and initialize the [gcloud CLI](https://cloud.google.com/sdk/docs/initializing).
+3. Create [application authentication credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc#google-idp) and store it in a secure location on your system, such as $HOME/.config/gcloud/application_default_credentials.json.
 
-In this mode, the --end ledger can be provided to stop the process once export has reached that ledger, or if absent or 0 it will result in continous exporting of new ledgers emitted from the network. 
+For detailed instructions, refer to the [Providing Credentials for Application Default Credentials (ADC) guide.](https://cloud.google.com/docs/authentication/provide-credentials-adc)
 
- It’s guaranteed that ledgers exported during `append` mode from `start` and up to the last logged ledger file `Uploaded {ledger file name}` were contiguous, meaning all ledgers within that range were exported to the data lake with no gaps or missing ledgers in between.
-```bash
-ledgerexporter append --start <start_ledger> --config-file <config_file_path>
-```
+### Create a GCS Bucket
 
-### Configuration (toml):
-The `stellar_core_config` supports two ways for configuring captive core:
-  - use prebuilt captive core config toml, archive urls, and passphrase based on `stellar_core_config.network = testnet|pubnet`.
-  - manually set the the captive core confg by supplying these core parameters which will override any defaults when `stellar_core_config.network` is present also:
-    `stellar_core_config.captive_core_toml_path`
-    `stellar_core_config.history_archive_urls`
-    `stellar_core_config.network_passphrase`
+1. Go to the GCP Console's Storage section ([https://console.cloud.google.com/storage](https://console.cloud.google.com/storage)) and create a new bucket.
+2. Choose a descriptive name for the bucket, such as `stellar-ledger-data`.
+3. **Note down the bucket name** as you'll need it later in the configuration process.
 
-Ensure you have stellar-core installed and set `stellar_core_config.stellar_core_binary_path` to it's path on o/s.
+## Configuration
 
-Enable web service that will be bound to localhost post and publishes metrics by including `admin_port = {port}`
+### Create a Configuration File (`config.toml`)
+
+The configuration file specifies details about your GCS bucket, stellar network and other settings.
+
+Replace the placeholder values in the [sample file](config.example.toml) with your specific information:
+
+<details>
+<summary> Sample TOML Configuration (config.toml) </summary>
 
-An example config, demonstrating preconfigured captive core settings and gcs data store config.
 ```toml
+# Admin port configuration
+# Specifies the port number for hosting the web service locally to publish metrics.
 admin_port = 6061
 
-[datastore_config]
+# Datastore Configuration
+[datastore]
+# Specifies the type of datastore. Currently, only Google Cloud Storage (GCS) is supported.
 type = "GCS"
 
-[datastore_config.params]
-destination_bucket_path = "your-bucket-name/<optional_subpath1>/<optional_subpath2>/"
+[datastore.parameters]
+# The Google Cloud Storage bucket path for storing data, with optional subpaths for organization.
+bucket_path = "your-bucket-name/<optional_subpath1>/<optional_subpath2>/"
+
+[datastore.schema]
+# Configuration for ledger and file storage.
+ledgers_per_file = 64      # Number of ledgers stored in each file.
+files_per_partition = 10   # Number of files per partition directory.
 
-[datastore_config.schema]
-ledgers_per_file = 64
-files_per_partition = 10
+# Stellar-core Configuration
+[stellar_core]
+# Use default captive-core config based on network
+# Options are "testnet" for the test network or "pubnet" for the public network.
+network = "testnet"
 
-[stellar_core_config]
-  network = "testnet"
-  stellar_core_binary_path = "/my/path/to/stellar-core"
-  captive_core_toml_path = "my-captive-core.cfg"
-  history_archive_urls = ["http://testarchiveurl1", "http://testarchiveurl2"]
-  network_passphrase = "test"
+# Alternatively, you can manually configure captive-core parameters (overrides defaults if 'network' is set).
+
+# Path to the captive-core configuration file.
+#captive_core_config_path = "my-captive-core.cfg"
+
+# URLs for Stellar history archives, with multiple URLs allowed.
+#history_archive_urls = ["http://testarchiveurl1", "http://testarchiveurl2"]
+
+# Network passphrase for the Stellar network.
+#network_passphrase = "Test SDF Network ; September 2015"
+
+# Path to stellar-core binary
+# Not required when running in a Docker container as it has the stellar-core installed and path is set.
+# When running outside of Docker, it will look for stellar-core in the OS path if it exists.
+#stellar_core_binary_path = "/my/path/to/stellar-core
 ```
+</details>
 
-### Exported Files
+## Running the Ledger Exporter
 
-#### File Organization:
-- Ledgers are grouped into files, with the number of ledgers per file set by `ledgers_per_file`.
-- Files are further organized into partitions, with the number of files per partition set by `files_per_partition`.
+### 1. Pull the Docker Image
 
-### Filename Structure:
-- Filenames indicate the ledger range they contain, e.g., `0-63.xdr.zstd` holds ledgers 0 to 63.
-- Partition directories group files, e.g., `/0-639/` holds files for ledgers 0 to 639.
+Open a terminal window and run the following command to download the Stellar Ledger Exporter Docker image:
 
-#### Example:
-with `ledgers_per_file = 64` and `files_per_partition = 10`:
-- Partition names: `/0-639`, `/640-1279`, ...
-- Filenames: `/0-639/0-63.xdr.zstd`, `/0-639/64-127.xdr.zstd`, ...
+```bash
+docker pull stellar/ledger-exporter
+```
+
+### 2. Run the Ledger Exporter
 
-#### Special Cases:
+The following command demonstrates how to run the Ledger Exporter:
 
-- If `ledgers_per_file` is set to 1, filenames will only contain the ledger number.
-- If `files_per_partition` is set to 1, filenames will not contain the partition.
+```bash
+docker run --platform linux/amd64 -d \
+  -v "$HOME/.config/gcloud/application_default_credentials.json":/.config/gcp/credentials.json:ro \
+  -e GOOGLE_APPLICATION_CREDENTIALS=/.config/gcp/credentials.json \
+  -v ${PWD}/config.toml:/config.toml \
+  stellar/ledger-exporter <command> [options]
+```
 
-#### Note:
-- Avoid changing `ledgers_per_file` and `files_per_partition` after configuration for consistency.
+**Explanation:**
 
-#### Retrieving Data:
-- To locate a specific ledger sequence, calculate the partition name and ledger file name using `files_per_partition` and `ledgers_per_file`.
-- The `GetObjectKeyFromSequenceNumber` function automates this calculation.
+* `--platform linux/amd64`: Specifies the platform architecture (adjust if needed for your system).
+* `-d`: Runs the container in detached mode (background process).
+* `-v`: Mounts volumes to map your local GCP credentials and config.toml file to the container:
+  * `$HOME/.config/gcloud/application_default_credentials.json`: Your local GCP credentials file.
+  * `${PWD}/config.toml`: Your local configuration file.
+* `-e GOOGLE_APPLICATION_CREDENTIALS=/.config/gcp/credentials.json`: Sets the environment variable for credentials within the container.
+* `stellar/ledger-exporter`: The Docker image name.
+* `<command>`: The Stellar Ledger Exporter command (e.g., [scan-and-fill](#1-scan-and-fill), [append](#2-append))
+
+## CLI Commands
+
+The Ledger Exporter offers two primary commands to manage ledger data export:
+
+### 1. scan-and-fill
+
+**Purpose:**
+Exports a specific range of Stellar ledgers, defined by the `--start` and `--end` options.
+
+**Behavior:**
+- Scans the specified ledger sequence range.
+- Exports only missing ledgers to the remote datastore (GCS bucket).
+- Avoids unnecessary exports if data is already present.
+
+**Usage:**
+
+```bash
+docker run --platform linux/amd64 -d \
+  -v "$HOME/.config/gcloud/application_default_credentials.json":/.config/gcp/credentials.json:ro \
+  -e GOOGLE_APPLICATION_CREDENTIALS=/.config/gcp/credentials.json \
+  -v ${PWD}/config.toml:/config.toml \
+  stellar/ledger-exporter \
+  scan-and-fill --start <start_ledger> --end <end_ledger> [--config <config_file>]
+```
+
+Arguments:
+- `--start <start_ledger>` (required): The starting ledger sequence number in the range to export.
+- `--end <end_ledger>` (required): The ending ledger sequence number in the range.
+- `--config <config_file_path>` (optional): The path to your configuration file, containing details like GCS bucket information. Defaults to `config.toml` in the runtime working directory.
+
+### 2. append
+
+**Purpose:**
+Exports ledgers starting from `--start`, searching for the next missing ledger sequence number in the datastore. If a missing ledger is found, the export begins from that missing ledger.
+
+**Behavior:**
+- Starts searching from the provided `--start` ledger and identifies the first missing ledger sequence number after `--start` in the remote datastore (GCS bucket).
+- Narrows the export range to include only missing ledgers from that point onwards.
+- If the `--end` ledger is provided, it will stop the process once export has reached that ledger. If the `--end` ledger is absent or set to 0, the exporter will continuously export new ledgers as they appear on the network.
+
+**Usage:**
+
+```bash
+docker run --platform linux/amd64 -d \
+  -v "$HOME/.config/gcloud/application_default_credentials.json":/.config/gcp/credentials.json:ro \
+  -e GOOGLE_APPLICATION_CREDENTIALS=/.config/gcp/credentials.json \
+  -v ${PWD}/config.toml:/config.toml \
+  stellar/ledger-exporter \
+  append --start <start_ledger> [--end <end_ledger>] [--config <config_file>]
+```
 
+Arguments:
+- `--start <start_ledger>` (required): The starting ledger sequence number for the export process.
+- `--end <end_ledger>` (optional): The ending ledger sequence number. If omitted or set to 0, the exporter will continuously export new ledgers as they appear on the network.
+- `--config <config_file_path>` (optional): The path to your configuration file, containing details like GCS bucket information. Defaults to `config.toml` in the runtime working directory.
\ No newline at end of file
diff --git a/exp/services/ledgerexporter/config.example.toml b/exp/services/ledgerexporter/config.example.toml
new file mode 100644
index 0000000000..84e65144bd
--- /dev/null
+++ b/exp/services/ledgerexporter/config.example.toml
@@ -0,0 +1,44 @@
+
+# Sample TOML Configuration
+
+# Admin port configuration
+# Specifies the port number for hosting the web service locally to publish metrics.
+admin_port = 6061
+
+# Datastore Configuration
+[datastore]
+# Specifies the type of datastore. Currently, only Google Cloud Storage (GCS) is supported.
+type = "GCS"
+
+[datastore.parameters]
+# The Google Cloud Storage bucket path for storing data, with optional subpaths for organization.
+bucket_path = "your-bucket-name/<optional_subpath1>/<optional_subpath2>/"
+
+[datastore.schema]
+# Configuration for ledger and file storage.
+ledgers_per_file = 64      # Number of ledgers stored in each file.
+files_per_partition = 10   # Number of files per partition directory.
+
+# Stellar-core Configuration
+[stellar_core]
+# Use default captive-core config based on network
+# Options are "testnet" for the test network or "pubnet" for the public network.
+network = "testnet"
+
+# Alternatively, you can manually configure captive-core parameters (overrides defaults if 'network' is set).
+
+# Path to the captive-core configuration file.
+#captive_core_config_path = "my-captive-core.cfg"
+
+# URLs for Stellar history archives, with multiple URLs allowed.
+#history_archive_urls = ["http://testarchiveurl1", "http://testarchiveurl2"]
+
+# Network passphrase for the Stellar network.
+#network_passphrase = "Test SDF Network ; September 2015"
+
+# Path to stellar-core binary
+# Not required when running in a Docker container as it has the stellar-core installed and path is set.
+# When running outside of Docker, it will look for stellar-core in the OS path if it exists. 
+# If you want to override the path, you can do so here.
+#stellar_core_binary_path = "/my/path/to/stellar-core"
+
diff --git a/exp/services/ledgerexporter/config.toml b/exp/services/ledgerexporter/config.toml
index c41d9376ac..c5c4519f0b 100644
--- a/exp/services/ledgerexporter/config.toml
+++ b/exp/services/ledgerexporter/config.toml
@@ -10,5 +10,4 @@ files_per_partition = 64000
 
 [stellar_core_config]
   network = "testnet"
-  stellar_core_binary_path = "/usr/local/bin/stellar-core"
 
diff --git a/exp/services/ledgerexporter/internal/main.go b/exp/services/ledgerexporter/internal/main.go
index d1409eb89c..425ca5ac6e 100644
--- a/exp/services/ledgerexporter/internal/main.go
+++ b/exp/services/ledgerexporter/internal/main.go
@@ -39,7 +39,7 @@ func defineCommands() {
 		RunE: func(cmd *cobra.Command, args []string) error {
 			settings := bindCliParameters(cmd.PersistentFlags().Lookup("start"),
 				cmd.PersistentFlags().Lookup("end"),
-				cmd.PersistentFlags().Lookup("config-file"),
+				cmd.PersistentFlags().Lookup("config"),
 			)
 			settings.Mode = ScanFill
 			return ledgerExporterCmdRunner(settings)
@@ -52,7 +52,7 @@ func defineCommands() {
 		RunE: func(cmd *cobra.Command, args []string) error {
 			settings := bindCliParameters(cmd.PersistentFlags().Lookup("start"),
 				cmd.PersistentFlags().Lookup("end"),
-				cmd.PersistentFlags().Lookup("config-file"),
+				cmd.PersistentFlags().Lookup("config"),
 			)
 			settings.Mode = Append
 			return ledgerExporterCmdRunner(settings)
@@ -64,14 +64,14 @@ func defineCommands() {
 
 	scanAndFillCmd.PersistentFlags().Uint32P("start", "s", 0, "Starting ledger (inclusive), must be set to a value greater than 1")
 	scanAndFillCmd.PersistentFlags().Uint32P("end", "e", 0, "Ending ledger (inclusive), must be set to value greater than 'start' and less than the network's current ledger")
-	scanAndFillCmd.PersistentFlags().String("config-file", "config.toml", "Path to the TOML config file. Defaults to 'config.toml' on runtime working directory path.")
+	scanAndFillCmd.PersistentFlags().String("config", "config.toml", "Path to the TOML config file. Defaults to 'config.toml' on runtime working directory path.")
 	viper.BindPFlags(scanAndFillCmd.PersistentFlags())
 
 	appendCmd.PersistentFlags().Uint32P("start", "s", 0, "Starting ledger (inclusive), must be set to a value greater than 1")
 	appendCmd.PersistentFlags().Uint32P("end", "e", 0, "Ending ledger (inclusive), optional, setting to non-zero means bounded mode, "+
 		"only export ledgers from 'start' up to 'end' value which must be greater than 'start' and less than the network's current ledger. "+
 		"If 'end' is absent or '0' means unbounded mode, exporter will continue to run indefintely and export the latest closed ledgers from network as they are generated in real time.")
-	appendCmd.PersistentFlags().String("config-file", "config.toml", "Path to the TOML config file. Defaults to 'config.toml' on runtime working directory path.")
+	appendCmd.PersistentFlags().String("config", "config.toml", "Path to the TOML config file. Defaults to 'config.toml' on runtime working directory path.")
 	viper.BindPFlags(appendCmd.PersistentFlags())
 }
 
diff --git a/exp/services/ledgerexporter/internal/main_test.go b/exp/services/ledgerexporter/internal/main_test.go
index 4c9e5412f3..340ead2d03 100644
--- a/exp/services/ledgerexporter/internal/main_test.go
+++ b/exp/services/ledgerexporter/internal/main_test.go
@@ -29,12 +29,12 @@ func TestFlagsOutput(t *testing.T) {
 	}{
 		{
 			name:              "no sub-command",
-			commandArgs:       []string{"--start", "4", "--end", "5", "--config-file", "myfile"},
+			commandArgs:       []string{"--start", "4", "--end", "5", "--config", "myfile"},
 			expectedErrOutput: "Error: ",
 		},
 		{
 			name:              "append sub-command with start and end present",
-			commandArgs:       []string{"append", "--start", "4", "--end", "5", "--config-file", "myfile"},
+			commandArgs:       []string{"append", "--start", "4", "--end", "5", "--config", "myfile"},
 			expectedErrOutput: "",
 			appRunner:         appRunnerSuccess,
 			expectedSettings: RuntimeSettings{
@@ -46,7 +46,7 @@ func TestFlagsOutput(t *testing.T) {
 		},
 		{
 			name:              "append sub-command with start and end absent",
-			commandArgs:       []string{"append", "--config-file", "myfile"},
+			commandArgs:       []string{"append", "--config", "myfile"},
 			expectedErrOutput: "",
 			appRunner:         appRunnerSuccess,
 			expectedSettings: RuntimeSettings{
@@ -58,13 +58,13 @@ func TestFlagsOutput(t *testing.T) {
 		},
 		{
 			name:              "append sub-command prints app error",
-			commandArgs:       []string{"append", "--start", "4", "--end", "5", "--config-file", "myfile"},
+			commandArgs:       []string{"append", "--start", "4", "--end", "5", "--config", "myfile"},
 			expectedErrOutput: "test error",
 			appRunner:         appRunnerError,
 		},
 		{
 			name:              "scanfill sub-command with start and end present",
-			commandArgs:       []string{"scan-and-fill", "--start", "4", "--end", "5", "--config-file", "myfile"},
+			commandArgs:       []string{"scan-and-fill", "--start", "4", "--end", "5", "--config", "myfile"},
 			expectedErrOutput: "",
 			appRunner:         appRunnerSuccess,
 			expectedSettings: RuntimeSettings{
@@ -76,7 +76,7 @@ func TestFlagsOutput(t *testing.T) {
 		},
 		{
 			name:              "scanfill sub-command with start and end absent",
-			commandArgs:       []string{"scan-and-fill", "--config-file", "myfile"},
+			commandArgs:       []string{"scan-and-fill", "--config", "myfile"},
 			expectedErrOutput: "",
 			appRunner:         appRunnerSuccess,
 			expectedSettings: RuntimeSettings{
@@ -88,7 +88,7 @@ func TestFlagsOutput(t *testing.T) {
 		},
 		{
 			name:              "scanfill sub-command prints app error",
-			commandArgs:       []string{"scan-and-fill", "--start", "4", "--end", "5", "--config-file", "myfile"},
+			commandArgs:       []string{"scan-and-fill", "--start", "4", "--end", "5", "--config", "myfile"},
 			expectedErrOutput: "test error",
 			appRunner:         appRunnerError,
 		},