Skip to content
This repository has been archived by the owner on Nov 1, 2023. It is now read-only.
/ celo-etl Public archive

ETL (extract, transform and load) tools for ingesting Celo blockchain data to Google BigQuery and Pub/Sub

Notifications You must be signed in to change notification settings

nansen-ai/celo-etl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Celo ETL

Overview

Celo ETL allows you to setup an ETL pipeline in Google Cloud Platform for ingesting Celo blockchain data into BigQuery and Pub/Sub. It comes with CLI tools for exporting Celo data into convenient formats like CSVs and relational databases.

Architecture

celo_etl_architecture.svg

  1. The nodes are run in a Kubernetes cluster.

  2. Airflow DAGs export and load Celo data to BigQuery daily. Refer to Celo ETL Airflow for deployment instructions.

  3. Celo data is polled periodically from the nodes and pushed to Google Pub/Sub. Refer to Celo ETL Streaming for deployment instructions.

  4. Celo data is pulled from Pub/Sub, transformed and streamed to BigQuery. Refer to Celo ETL Dataflow for deployment instructions.

Setting Up

  1. Follow the instructions in Celo ETL Airflow to deploy a Cloud Composer cluster for exporting and loading historical Celo data. It may take several days for the export DAG to catch up. During this time "load" and "verify_streaming" DAGs will fail.

  2. Follow the instructions in Celo ETL Streaming to deploy the Streamer component. For the value in last_synced_block.txt specify the last block number of the previous day. You can query it in BigQuery: SELECT number FROM crypto_celo.blocks ORDER BY number DESC LIMIT 1.

  3. Follow the instructions in Celo ETL Dataflow to deploy the Dataflow component. Monitor "verify_streaming" DAG in Airflow console, once the Dataflow job catches up the latest block, the DAG will succeed.

About

ETL (extract, transform and load) tools for ingesting Celo blockchain data to Google BigQuery and Pub/Sub

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages