-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add admin-guide for hubble #669
Merged
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
f8b138f
Add admin-guide for hubble
chowbao c99e0ea
Update bad image link
chowbao d6e0e29
Reformat hubble admin docs
chowbao 81247fd
Merge branch 'main' into hubble-admin-guide
chowbao 1a97921
Address comments
chowbao 6c1c5aa
Fix link bug
chowbao f93e246
Add more dbt instructions
chowbao dfe4435
address comments
chowbao 5986f51
Make image bigger
chowbao File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
--- | ||
title: Admin Guide | ||
sidebar_position: 15 | ||
--- | ||
|
||
import DocCardList from "@theme/DocCardList"; | ||
|
||
All you need to know about running a Hubble analytics platform. | ||
|
||
<DocCardList /> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
--- | ||
title: Data Curation | ||
sidebar_position: 20 | ||
--- | ||
|
||
import DocCardList from "@theme/DocCardList"; | ||
|
||
Running stellar-dbt-public to transform raw Stellar network data into something better. | ||
|
||
<DocCardList /> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
--- | ||
title: Architecture | ||
sidebar_position: 10 | ||
--- | ||
|
||
import stellar_dbt_arch from '/img/hubble/stellar_dbt_architecture.png'; | ||
|
||
## Architecture Overview | ||
|
||
<img src={stellar_dbt_arch} width="300"/> | ||
|
||
In general stellar-dbt-public runs by: | ||
|
||
* Selecting a dbt model to run | ||
* Within the model run: | ||
* Sources are referenced and used to create staging tables | ||
* Staging tables then undergo various transformations and are stored in intermediate tables | ||
* Finishing touches and joins are done on the intermediate tables which produce the final analytics friendly mart tables | ||
|
||
We try to adhere to the best practices set by the [dbt docs](https://docs.getdbt.com/docs/build/projects) | ||
|
||
More detailed information about stellar-dbt-public and examples can be found in the [stellar-dbt-public](https://github.com/stellar/stellar-dbt-public/tree/master) repo. |
140 changes: 140 additions & 0 deletions
140
network/hubble/admin-guide/data-curation/getting-started.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
--- | ||
title: Getting Started | ||
sidebar_position: 20 | ||
--- | ||
|
||
[stellar-dbt-public GitHub repository](https://github.com/stellar/stellar-dbt-public/tree/master) | ||
|
||
[stellar/stellar-dbt-public docker images](https://hub.docker.com/r/stellar/stellar-dbt-public) | ||
|
||
## Recommended Usage | ||
|
||
### Docker Image | ||
|
||
Generally if you do not need to modify any of the stellar-dbt-public code, it is recommended that you use the [stellar/stellar-dbt-public docker images](https://hub.docker.com/r/stellar/stellar-dbt-public) | ||
|
||
Example to run locally with docker: | ||
|
||
``` | ||
docker run --platform linux/amd64 -ti stellar/stellar-dbt-public:latest <parameters> | ||
``` | ||
|
||
### Import stellar-dbt-public as a dbt Package | ||
|
||
Alternatively, if you need to build your own models on top of stellar-dbt-public, you can import stellar-dbt-public as a dbt package into a separate dbt project. | ||
|
||
Example instructions: | ||
|
||
* Create a new file `packages.yml` in your dbt project (not the stellar-dbt-public project) with the yml below | ||
|
||
``` | ||
packages: | ||
- git: "https://github.com/stellar/stellar-dbt-public.git" | ||
revision: v0.0.28 | ||
``` | ||
|
||
* (Optional) Update your profiles.yml to include profile configurations for stellar-dbt-public | ||
|
||
``` | ||
new_project: | ||
target: test | ||
outputs: | ||
test: | ||
project: <project> | ||
dataset: <dataset> | ||
<other configurations> | ||
|
||
stellar_dbt_public: | ||
target: test | ||
outputs: | ||
test: | ||
project: <project> | ||
dataset: <dataset> | ||
<other configurations> | ||
``` | ||
|
||
* (Optional) Update your dbt_project.yml to include project configurations for stellar-dbt-public | ||
|
||
``` | ||
name: 'stellar_dbt' | ||
version: '1.0.0' | ||
config-version: 2 | ||
|
||
profile: 'new_project' | ||
|
||
model-paths: ["models"] | ||
analysis-paths: ["analyses"] | ||
test-paths: ["tests"] | ||
seed-paths: ["seeds"] | ||
macro-paths: ["macros"] | ||
snapshot-paths: ["snapshots"] | ||
|
||
target-path: "target" | ||
clean-targets: | ||
- "target" | ||
- "dbt_packages" | ||
|
||
models: | ||
new_project: | ||
staging: | ||
+materialized: view | ||
intermediate: | ||
+materialized: ephemeral | ||
marts: | ||
+materialized: table | ||
|
||
stellar_dbt_public: | ||
staging: | ||
+materialized: ephemeral | ||
intermediate: | ||
+materialized: ephemeral | ||
marts: | ||
+materialized: table | ||
``` | ||
|
||
* Models from the stellar-dbt-public package/repo will now be available in your new dbt project | ||
|
||
## Building and Running Locally | ||
|
||
### Clone the repo | ||
|
||
``` | ||
git clone https://github.com/stellar/stellar-dbt-public | ||
``` | ||
|
||
### Install required python packages | ||
|
||
``` | ||
pip install --upgrade pip && pip install -r requirements.txt | ||
|
||
``` | ||
|
||
### Install required dbt packages | ||
|
||
``` | ||
dbt deps | ||
``` | ||
|
||
### Running dbt | ||
|
||
* There are many useful commands that come with dbt which can be found in the [dbt documentation](https://docs.getdbt.com/reference/dbt-commands#available-commands) | ||
* stellar-dbt-public is designed to use the `dbt build` command which will `run` the model and `test` the model table output | ||
* (Optional) run with the `--full-refresh` option | ||
|
||
``` | ||
dbt build --full-refresh | ||
``` | ||
|
||
* Subsequent runs can be run with incremental mode (only inserts the newest of data instead of rebuilding all of history every time) | ||
|
||
``` | ||
dbt build | ||
``` | ||
|
||
* You can also specify just a single model if you don't want to run all stellar-dbt-public models | ||
|
||
``` | ||
dbt build --select <model name or tag> | ||
``` | ||
|
||
Please see the [stellar-dbt-public/modles/marts](https://github.com/stellar/stellar-dbt-public/tree/master/models/marts) directory to see a full list of the available models that dbt can run |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
--- | ||
title: "Overview" | ||
sidebar_position: 0 | ||
--- | ||
|
||
Data curation in Hubble is done through [stellar-dbt-public](https://github.com/stellar/stellar-dbt-public). stellar-dbt-public transforms raw Stellar network data from BigQuery datasets and tables into aggregates for more user friendly analytics. | ||
|
||
It is worth noting that most users will not need to standup and run their own stellar-dbt-public instance. The Stellar Development Foundation provides public access to fully transformed Stellar network data through the public datasets and tables in GCP BigQuery. Instructions on how to access this data can be found in the [Connecting](https://developers.stellar.org/network/hubble/analyst-guide/connecting) section. | ||
|
||
## Why Run stellar-dbt-public? | ||
|
||
Running stellar-dbt-public within your own infrastructure provides a number of benefits. You can: | ||
|
||
- Have full operational control without dependency on the Stellar Development Foundation for network data | ||
- Run modified ETL/ELT pipelines that fit your individual business needs |
10 changes: 10 additions & 0 deletions
10
network/hubble/admin-guide/scheduling-and-orchestration/README.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
--- | ||
title: Scheduling and Orchestration | ||
sidebar_position: 100 | ||
--- | ||
|
||
import DocCardList from "@theme/DocCardList"; | ||
|
||
Stitching all the components together. | ||
|
||
<DocCardList /> |
18 changes: 18 additions & 0 deletions
18
network/hubble/admin-guide/scheduling-and-orchestration/architecture.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
--- | ||
title: Architecture | ||
sidebar_position: 10 | ||
--- | ||
|
||
import stellar_etl_airflow_arch from '/img/hubble/stellar_etl_airflow_architecture.png'; | ||
|
||
## Architecture Overview | ||
|
||
<img src={stellar_etl_airflow_arch} width="300"/> | ||
|
||
In general stellar-etl-airflow runs by: | ||
|
||
* Scheduling DAGs to run `stellar-etl` and upload the data outputted to BigQuery | ||
* Scheduling DAGs to run `stellar-dbt-public` using the data in BigQuery | ||
* We try to adhere to the best practices set by the [dbt docs](https://docs.getdbt.com/docs/build/projects) | ||
|
||
More detailed information about stellar-etl-airflow can be found in the [stellar-etl-airflow](https://github.com/stellar/stellar-etl-airflow/tree/master) repo. |
87 changes: 87 additions & 0 deletions
87
network/hubble/admin-guide/scheduling-and-orchestration/getting-started.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
--- | ||
title: Getting Started | ||
sidebar_position: 20 | ||
--- | ||
|
||
import history_table_export from '/img/hubble/history_table_export.png'; | ||
import state_table_export from '/img/hubble/state_table_export.png'; | ||
import dbt_enriched_base_tables from '/img/hubble/dbt_enriched_base_tables.png'; | ||
|
||
[stellar-etl-airflow GitHub repository](https://github.com/stellar/stellar-etl-airflow/tree/master) | ||
|
||
## GCP Account Setup | ||
|
||
The Stellar Development Foundation runs Hubble in GCP using Composer and BigQuery. To follow the same deployment you will need to have access to GCP project. Instructions can be found in the [Get Started](https://cloud.google.com/docs/get-started) documentation from Google. | ||
|
||
Note: BigQuery and Composer should be available by default. If they are not you can find instructions for enabling them in the [BigQuery](https://cloud.google.com/bigquery?hl=en) or [Composer](https://cloud.google.com/composer?hl=en) Google documentation. | ||
|
||
## Create GCP Composer Instance to Run Airflow | ||
|
||
Instructions on bringing up a GCP Composer instance to run Hubble can be found in the [Installation and Setup](https://github.com/stellar/stellar-etl-airflow?tab=readme-ov-file#installation-and-setup) section in the [stellar-etl-airflow](https://github.com/stellar/stellar-etl-airflow) repository. | ||
|
||
:::note | ||
|
||
Hardware requirements can be very different depending on the Stellar network data you require. The default GCP settings may be higher/lower than actually required. | ||
|
||
::: | ||
|
||
## Configuring GCP Composer Airflow | ||
|
||
There are two things required for the configuration and setup of GCP Composer Airflow: | ||
|
||
* Upload DAGs to the Composer Airflow Bucket | ||
* Configure the Airflow variables for your GCP setup | ||
|
||
For more detailed instructions please see the [stellar-etl-airflow Installation and Setup](https://github.com/stellar/stellar-etl-airflow?tab=readme-ov-file#installation-and-setup) documentation. | ||
|
||
### Uploading DAGs | ||
|
||
Within the [stellar-etl-airflow](https://github.com/stellar/stellar-etl-airflow) repo there is an [upload_static_to_gcs.sh](https://github.com/stellar/stellar-etl-airflow/blob/master/upload_static_to_gcs.sh) shell script that will upload all the DAGs and schemas into your Composer Airflow bucket. | ||
|
||
This can also be done using the [gcloud CLI or console](https://cloud.google.com/storage/docs/uploading-objects) and manually selecting the dags and schemas you wish to upload. | ||
|
||
### Configuring Airflow Variables | ||
|
||
Please see the [Airflow Variables Explanation](https://github.com/stellar/stellar-etl-airflow?tab=readme-ov-file#airflow-variables-explanation) documentation for more information about what should and needs to be configured. | ||
|
||
## Running the DAGs | ||
|
||
To run a DAG all you have to do is toggle the DAG on/off as seen below | ||
|
||
![Toggle DAGs](/img/hubble/airflow_dag_toggle.png) | ||
|
||
More information about each DAG can be found in the [DAG Diagrams](https://github.com/stellar/stellar-etl-airflow?tab=readme-ov-file#dag-diagrams) documentation. | ||
|
||
## Available DAGs | ||
|
||
More information can be found [here](https://github.com/stellar/stellar-etl-airflow/blob/master/README.md#public-dags) | ||
|
||
### History Table Export DAG | ||
|
||
[This DAG](https://github.com/stellar/stellar-etl-airflow/blob/master/dags/history_tables_dag.py): | ||
|
||
- Exports part of sources: ledgers, operations, transactions, trades, effects and assets from Stellar using the data lake of LedgerCloseMeta files | ||
- Optionally this can ingest data using captive-core but that is not ideal nor recommended for usage with Airflow | ||
- Inserts into BigQuery | ||
|
||
<img src={history_table_export} width="300"/> | ||
|
||
### State Table Export DAG | ||
|
||
[This DAG](https://github.com/stellar/stellar-etl-airflow/blob/master/dags/state_table_dag.py) | ||
|
||
- Exports accounts, account_signers, offers, claimable_balances, liquidity pools, trustlines, contract_data, contract_code, config_settings and ttl. | ||
- Inserts into BigQuery | ||
|
||
<img src={state_table_export} width="300"/> | ||
|
||
### DBT Enriched Base Tables DAG | ||
|
||
[This DAG](https://github.com/stellar/stellar-etl-airflow/blob/master/dags/dbt_enriched_base_tables_dag.py) | ||
|
||
- Creates the DBT staging views for models | ||
- Updates the enriched_history_operations table | ||
- Updates the current state tables | ||
- (Optional) warnings and errors are sent to slack. | ||
|
||
<img src={dbt_enriched_base_tables} width="300"/> |
15 changes: 15 additions & 0 deletions
15
network/hubble/admin-guide/scheduling-and-orchestration/overview.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
--- | ||
title: "Overview" | ||
sidebar_position: 0 | ||
--- | ||
|
||
Hubble uses [stellar-etl-airflow](https://github.com/stellar/stellar-etl-airflow) to schedule and orchestrate all its workflows. This includes the scheduling and running of stellar-etl and stellar-dbt. | ||
|
||
It is worth noting that most users will not need to standup and run their own Hubble. The Stellar Development Foundation provides public access to the data through the public datasets and tables in GCP BigQuery. Instructions on how to access this data can be found in the [Connecting](https://developers.stellar.org/network/hubble/connecting) section. | ||
|
||
## Why Run stellar-etl-ariflow? | ||
|
||
Running stellar-etl-airflow within your own infrastructure provides a number of benefits. You can: | ||
|
||
- Have full operational control without dependency on the Stellar Development Foundation for network data | ||
- Run modified ETL/ELT pipelines that fit your individual business needs |
10 changes: 10 additions & 0 deletions
10
network/hubble/admin-guide/source-system-ingestion/README.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
--- | ||
title: Source System Ingestion | ||
sidebar_position: 10 | ||
--- | ||
|
||
import DocCardList from "@theme/DocCardList"; | ||
|
||
Running stellar-etl for Stellar network data ingestion. | ||
|
||
<DocCardList /> |
25 changes: 25 additions & 0 deletions
25
network/hubble/admin-guide/source-system-ingestion/architecture.mdx
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. wdyt about adding the High Level systems context diagram found here as well? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh yeah I like that. We should update that diagram though |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
--- | ||
title: Architecture | ||
sidebar_position: 10 | ||
--- | ||
|
||
import stellar_arch from '/img/hubble/stellar_overall_architecture.png'; | ||
import stellar_etl_arch from '/img/hubble/stellar_etl_architecture.png'; | ||
|
||
## Architecture Overview | ||
|
||
<img src={stellar_arch} width="600"/> | ||
|
||
<img src={stellar_etl_arch} width="300"/> | ||
|
||
In general stellar-etl runs by: | ||
|
||
* Read raw data from the Stellar network | ||
* This can be done by running a stellar-etl export command to export data between a start and end ledger | ||
* stellar-etl has the ability to read from two different sources: | ||
* Captive-core directly to get LedgerCloseMeta | ||
* A data lake of compressed LedgerCloseMeta files from Ledger Exporter | ||
* Tranforms the LedgerCloseMeta XDR into an easy to parse JSON format | ||
* Optionally uploads the JSON files to GCS or any other cloud storage service | ||
|
||
More detailed information about stellar-etl and examples can be found in the [stellar-etl](https://github.com/stellar/stellar-etl/tree/master) repo. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two options that we can recommend for running
stellar-dbt-public
:@harsha-stellar-data do you have an opinion on the above? Should we document both, or just one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think it does make sense to document both. Below already does option 2.
I'll add another section like
Advanced Usage - Importing as a dbt package
in this getting-started doc