-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add admin-guide for hubble #669
Changes from 7 commits
f8b138f
c99e0ea
d6e0e29
81247fd
1a97921
6c1c5aa
f93e246
dfe4435
5986f51
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
--- | ||
title: Admin Guide | ||
sidebar_position: 15 | ||
--- | ||
|
||
import DocCardList from "@theme/DocCardList"; | ||
|
||
All you need to know about running a Hubble analytics platform. | ||
|
||
<DocCardList /> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
--- | ||
title: Data Curation | ||
sidebar_position: 20 | ||
--- | ||
|
||
import DocCardList from "@theme/DocCardList"; | ||
|
||
Running stellar-dbt-public to transform raw Stellar network data into something better. | ||
|
||
<DocCardList /> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
--- | ||
title: Architecture | ||
sidebar_position: 10 | ||
--- | ||
|
||
import stellar_dbt_arch from '/img/hubble/stellar_dbt_architecture.png'; | ||
|
||
## Architecture Overview | ||
|
||
<img src={stellar_dbt_arch} width="300"/> | ||
|
||
In general stellar-dbt-public runs by: | ||
|
||
* Selecting a dbt model to run | ||
* Within the model run: | ||
* Sources are referenced and used to create staging tables | ||
* Staging tables then undergo various transformations and are stored in intermediate tables | ||
* Finishing touches and joins are done on the intermediate tables which produce the final analytics friendly mart tables | ||
|
||
We try to adhere to the best practices set by the [dbt docs](https://docs.getdbt.com/docs/build/projects) | ||
|
||
More detailed information about stellar-dbt-public and examples can be found in the [stellar-dbt-public](https://github.com/stellar/stellar-dbt-public/tree/master) repo. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
--- | ||
title: Getting Started | ||
sidebar_position: 20 | ||
--- | ||
|
||
[stellar-dbt-public GitHub repository](https://github.com/stellar/stellar-dbt-public/tree/master) | ||
|
||
[stellar/stellar-dbt-public docker images](https://hub.docker.com/r/stellar/stellar-dbt-public) | ||
|
||
## Recommended Usage | ||
|
||
### Docker Image | ||
|
||
Generally if you do not need to modify any of the stellar-dbt-public code, it is recommended that you use the [stellar/stellar-dbt-public docker images](https://hub.docker.com/r/stellar/stellar-dbt-public) | ||
|
||
Example to run locally with docker: | ||
|
||
``` | ||
docker run --platform linux/amd64 -ti stellar/stellar-dbt-public:latest <parameters> | ||
``` | ||
|
||
### Import stellar-dbt-public as a dbt Package | ||
|
||
Alternatively, if you need to build your own models on top of stellar-dbt-public, you can import stellar-dbt-public as a dbt package into a separate dbt project. | ||
|
||
Example instructions: | ||
|
||
* Create a new file `packages.yml` in your dbt project (not the stellar-dbt-public project) with the yml below | ||
|
||
``` | ||
packages: | ||
- git: "https://github.com/stellar/stellar-dbt-public.git" | ||
revision: v0.0.28 | ||
``` | ||
|
||
* (Optional) Update your profiles.yml to include profile configurations for stellar-dbt-public | ||
|
||
``` | ||
new_project: | ||
target: test | ||
outputs: | ||
test: | ||
project: <project> | ||
dataset: <dataset> | ||
<other configurations> | ||
|
||
stellar_dbt_public: | ||
target: test | ||
outputs: | ||
test: | ||
project: <project> | ||
dataset: <dataset> | ||
<other configurations> | ||
``` | ||
|
||
* (Optional) Update your dbt_project.yml to include project configurations for stellar-dbt-public | ||
|
||
``` | ||
name: 'stellar_dbt' | ||
version: '1.0.0' | ||
config-version: 2 | ||
|
||
profile: 'new_project' | ||
|
||
model-paths: ["models"] | ||
analysis-paths: ["analyses"] | ||
test-paths: ["tests"] | ||
seed-paths: ["seeds"] | ||
macro-paths: ["macros"] | ||
snapshot-paths: ["snapshots"] | ||
|
||
target-path: "target" | ||
clean-targets: | ||
- "target" | ||
- "dbt_packages" | ||
|
||
models: | ||
new_project: | ||
staging: | ||
+materialized: view | ||
intermediate: | ||
+materialized: ephemeral | ||
marts: | ||
+materialized: table | ||
|
||
stellar_dbt_public: | ||
staging: | ||
+materialized: ephemeral | ||
intermediate: | ||
+materialized: ephemeral | ||
marts: | ||
+materialized: table | ||
``` | ||
|
||
* Models from the stellar-dbt-public package/repo will now be available in your new dbt project | ||
|
||
## Building and Running Locally | ||
|
||
### Clone the repo | ||
|
||
``` | ||
git clone https://github.com/stellar/stellar-dbt-public | ||
``` | ||
|
||
### Install required python packages | ||
|
||
``` | ||
pip install --upgrade pip && pip install -r requirements.txt | ||
|
||
``` | ||
|
||
### Install required dbt packages | ||
|
||
``` | ||
dbt deps | ||
``` | ||
|
||
### Running dbt | ||
|
||
* There are many useful commands that come with dbt which can be found in the [dbt documentation](https://docs.getdbt.com/reference/dbt-commands#available-commands) | ||
* Most of stellar-dbt-public will want to use the `dbt build` command which will `run` the model and `test` the model table output | ||
chowbao marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* The first time running stellar-dbt-public you will want to run the following to create the tables | ||
chowbao marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
``` | ||
dbt build --full-refresh | ||
``` | ||
|
||
* Subsequent runs can be run with incremental mode (only inserts the newest of data instead of rebuilding all of history every time) | ||
|
||
``` | ||
dbt build | ||
``` | ||
|
||
* You can also specify just a single model if you don't want to run all stellar-dbt-public models | ||
|
||
``` | ||
dbt build --select <model name or tag> | ||
``` | ||
|
||
Please see the [stellar-dbt-public/modles/marts](https://github.com/stellar/stellar-dbt-public/tree/master/models/marts) directory to see a full list of the available models that dbt can run |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
--- | ||
title: "Overview" | ||
sidebar_position: 0 | ||
--- | ||
|
||
Data curation in Hubble is done through [stellar-dbt-public](https://github.com/stellar/stellar-dbt-public). stellar-dbt-public transforms raw Stellar network data from BigQuery datasets and tables into aggregates for more user friendly analytics. | ||
|
||
It is worth noting that most users will not need to standup and run their own stellar-dbt-public instance. The Stellar Development Foundation provides public access to fully transformed Stellar network data through the public datasets and tables in GCP BigQuery. Instructions on how to access this data can be found in the [Connecting](https://developers.stellar.org/network/hubble/analyst-guide/connecting) section. | ||
|
||
## Why Run stellar-dbt-public? | ||
|
||
Running stellar-dbt-public within your own infrastructure provides a number of benefits. You can: | ||
|
||
- Have full operational control without dependency on the Stellar Development Foundation for network data | ||
- Run modified ETL/ELT pipelines that fit your individual business needs |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
--- | ||
title: Scheduling and Orchestration | ||
sidebar_position: 100 | ||
--- | ||
|
||
import DocCardList from "@theme/DocCardList"; | ||
|
||
Stitching all the components together. | ||
|
||
<DocCardList /> |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It may be worthwhile to reference dbt docs and that our project adheres to many best practices laid out by dbt. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want to mention that here or in our Here we follow it for orchestration and scheduling purposes |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
--- | ||
title: Architecture | ||
sidebar_position: 10 | ||
--- | ||
|
||
import stellar_etl_airflow_arch from '/img/hubble/stellar_etl_airflow_architecture.png'; | ||
|
||
## Architecture Overview | ||
|
||
<img src={stellar_etl_airflow_arch} width="300"/> | ||
|
||
In general stellar-etl-airflow runs by: | ||
|
||
* Scheduling DAGs to run `stellar-etl` and upload the data outputted to BigQuery | ||
* Scheduling DAGs to run `stellar-dbt-public` using the data in BigQuery | ||
* We try to adhere to the best practices set by the [dbt docs](https://docs.getdbt.com/docs/build/projects) | ||
|
||
More detailed information about stellar-etl-airflow can be found in the [stellar-etl-airflow](https://github.com/stellar/stellar-etl-airflow/tree/master) repo. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
--- | ||
title: Getting Started | ||
sidebar_position: 20 | ||
--- | ||
|
||
import history_table_export from '/img/hubble/history_table_export.png'; | ||
import state_table_export from '/img/hubble/state_table_export.png'; | ||
import dbt_enriched_base_tables from '/img/hubble/dbt_enriched_base_tables.png'; | ||
|
||
[stellar-etl-airflow GitHub repository](https://github.com/stellar/stellar-etl-airflow/tree/master) | ||
|
||
## GCP Account Setup | ||
|
||
The Stellar Development Foundation runs Hubble in GCP using Composer and BigQuery. To follow the same deployment you will need to have access to GCP project. Instructions can be found in the [Get Started](https://cloud.google.com/docs/get-started) documentation from Google. | ||
|
||
Note: BigQuery and Composer should be available by default. If they are not you can find instructions for enabling them in the [BigQuery](https://cloud.google.com/bigquery?hl=en) or [Composer](https://cloud.google.com/composer?hl=en) Google documentation. | ||
|
||
## Create GCP Composer Instance to Run Airflow | ||
|
||
Instructions on bringing up a GCP Composer instance to run Hubble can be found in the [Installation and Setup](https://github.com/stellar/stellar-etl-airflow?tab=readme-ov-file#installation-and-setup) section in the [stellar-etl-airflow](https://github.com/stellar/stellar-etl-airflow) repository. | ||
|
||
:::note | ||
|
||
Hardware requirements can be very different depending on the Stellar network data you require. The default GCP settings may be higher/lower than actually required. | ||
|
||
::: | ||
|
||
## Configuring GCP Composer Airflow | ||
|
||
There are two things required for the configuration and setup of GCP Composer Airflow: | ||
|
||
* Upload DAGs to the Composer Airflow Bucket | ||
* Configure the Airflow variables for your GCP setup | ||
|
||
For more detailed instructions please see the [stellar-etl-airflow Installation and Setup](https://github.com/stellar/stellar-etl-airflow?tab=readme-ov-file#installation-and-setup) documentation. | ||
|
||
### Uploading DAGs | ||
|
||
Within the [stellar-etl-airflow](https://github.com/stellar/stellar-etl-airflow) repo there is an [upload_static_to_gcs.sh](https://github.com/stellar/stellar-etl-airflow/blob/master/upload_static_to_gcs.sh) shell script that will upload all the DAGs and schemas into your Composer Airflow bucket. | ||
|
||
This can also be done using the [gcloud CLI or console](https://cloud.google.com/storage/docs/uploading-objects) and manually selecting the dags and schemas you wish to upload. | ||
|
||
### Configuring Airflow Variables | ||
|
||
Please see the [Airflow Variables Explanation](https://github.com/stellar/stellar-etl-airflow?tab=readme-ov-file#airflow-variables-explanation) documentation for more information about what should and needs to be configured. | ||
|
||
## Running the DAGs | ||
|
||
To run a DAG all you have to do is toggle the DAG on/off as seen below | ||
|
||
![Toggle DAGs](/img/hubble/airflow_dag_toggle.png) | ||
|
||
More information about each DAG can be found in the [DAG Diagrams](https://github.com/stellar/stellar-etl-airflow?tab=readme-ov-file#dag-diagrams) documentation. | ||
|
||
## Available DAGs | ||
|
||
More information can be found [here](https://github.com/stellar/stellar-etl-airflow/blob/master/README.md#public-dags) | ||
|
||
### History Table Export DAG | ||
|
||
[This DAG](https://github.com/stellar/stellar-etl-airflow/blob/master/dags/history_tables_dag.py): | ||
|
||
- Exports part of sources: ledgers, operations, transactions, trades, effects and assets from Stellar using the data lake of LedgerCloseMeta files | ||
- Optionally this can ingest data using captive-core but that is not ideal nor recommended for usage with Airflow | ||
- Inserts into BigQuery | ||
|
||
<img src={history_table_export} width="300"/> | ||
|
||
### State Table Export DAG | ||
|
||
[This DAG](https://github.com/stellar/stellar-etl-airflow/blob/master/dags/state_table_dag.py) | ||
|
||
- Exports accounts, account_signers, offers, claimable_balances, liquidity pools, trustlines, contract_data, contract_code, config_settings and ttl. | ||
- Inserts into BigQuery | ||
|
||
<img src={state_table_export} width="300"/> | ||
|
||
### DBT Enriched Base Tables DAG | ||
|
||
[This DAG](https://github.com/stellar/stellar-etl-airflow/blob/master/dags/dbt_enriched_base_tables_dag.py) | ||
|
||
- Creates the DBT staging views for models | ||
- Updates the enriched_history_operations table | ||
- Updates the current state tables | ||
- (Optional) warnings and errors are sent to slack. | ||
|
||
<img src={dbt_enriched_base_tables} width="300"/> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
--- | ||
title: "Overview" | ||
sidebar_position: 0 | ||
--- | ||
|
||
Hubble uses [stellar-etl-airflow](https://github.com/stellar/stellar-etl-airflow) to schedule and orchestrate all its workflows. This includes the scheduling and running of stellar-etl and stellar-dbt. | ||
|
||
It is worth noting that most users will not need to standup and run their own Hubble. The Stellar Development Foundation provides public access to the data through the public datasets and tables in GCP BigQuery. Instructions on how to access this data can be found in the [Connecting](https://developers.stellar.org/network/hubble/connecting) section. | ||
|
||
## Why Run stellar-etl-ariflow? | ||
|
||
Running stellar-etl-airflow within your own infrastructure provides a number of benefits. You can: | ||
|
||
- Have full operational control without dependency on the Stellar Development Foundation for network data | ||
- Run modified ETL/ELT pipelines that fit your individual business needs |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
--- | ||
title: Source System Ingestion | ||
sidebar_position: 10 | ||
--- | ||
|
||
import DocCardList from "@theme/DocCardList"; | ||
|
||
Running stellar-etl for Stellar network data ingestion. | ||
|
||
<DocCardList /> |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. wdyt about adding the High Level systems context diagram found here as well? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh yeah I like that. We should update that diagram though |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
--- | ||
title: Architecture | ||
sidebar_position: 10 | ||
--- | ||
|
||
import stellar_arch from '/img/hubble/stellar_overall_architecture.png'; | ||
import stellar_etl_arch from '/img/hubble/stellar_etl_architecture.png'; | ||
|
||
## Architecture Overview | ||
|
||
<img src={stellar_arch} width="300"/> | ||
|
||
<img src={stellar_etl_arch} width="300"/> | ||
|
||
In general stellar-etl runs by: | ||
|
||
* Accepting an export command to export data between a start and end ledger | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you're new to Hubble architecture this part is confusing. Suggestion to add more details about using captive core to read and write ledgerclose meta to a file store There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I updated this but not sure if it's what you wanted There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we just need more context and details in this section in general. This should be done in a follow up PR and does not need to be included in this PR |
||
* Reads the LedgerCloseMeta files from the data lake created from Leger Exporter | ||
* Tranforms the LedgerCloseMeta XDR into an easy to parse JSON format | ||
* Optionally uploads the JSON files to GCS or any other cloud storage service | ||
|
||
More detailed information about stellar-etl and examples can be found in the [stellar-etl](https://github.com/stellar/stellar-etl/tree/master) repo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two options that we can recommend for running
stellar-dbt-public
:@harsha-stellar-data do you have an opinion on the above? Should we document both, or just one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think it does make sense to document both. Below already does option 2.
I'll add another section like
Advanced Usage - Importing as a dbt package
in this getting-started doc