Name		Name	Last commit message	Last commit date
parent directory ..
basic		basic
img		img
optimized		optimized
README.md		README.md

README.md

Using dbt and Cloud Composer for managing BigQuery example code

DBT (Data Building Tool) is a command-line tool that enables data analysts and engineers to transform data in their warehouses simply by writing select statements.
Cloud Composer is a fully managed data workflow orchestration service that empowers you to author, schedule, and monitor pipelines.

This repository demonstrate using the dbt to manage tables in BigQuery and using Cloud Composer for schedule the dbt run.

Code Examples

There are two sets of example:

Basic
The basic example is demonstrating the minimum configuration that you need to run dbt on Cloud Composer
Optimized
The optimized example is demonstrating optimization on splitting the dbt run for each models,
implementing incremental in the dbt model, and using Airflow execution date to handle backfill.

Technical Requirements

These GCP services will be used in the example code:

Cloud Composer
BigQuery
Google Cloud Storage (GCS)
Cloud Build
Google Container Repository (GCR)
Cloud Source Repository (CSR)

High Level Flow

This diagram explains the example solution's flow:

The code starts from a dbt project stored in a repository. (The example is under [basic or optimized]/dbt-project folder)
Any changes from the dbt project will trigger Cloud Build run
The Cloud Build will create/update an image to GCR; and export dbt docs to GCS
The Airflow DAG deployed to Cloud Composer (The example is under [basic or optimized]/dag folder)
The dbt run triggered using KubernetesPodOperator that pulls image from the step #3
At the end of the process the BigQuery objects will be created/updated (i.e datasets and tables)

How to run

Prerequisites

Cloud Composer environment
https://cloud.google.com/composer/docs/how-to/managing/creating
Set 3 ENVIRONMENT VARIABLES in the Cloud Composer (AIRFLOW_VAR_BIGQUERY_LOCATION, AIRFLOW_VAR_RUN_ENVIRONMENT, AIRFLOW_VAR_SOURCE_DATA_PROJECT)
https://cloud.google.com/composer/docs/how-to/managing/environment-variables
Cloud Source Repository (or any git provider)
Store the code from dbt-project in this dedicated repository
The repository should contain dbt_project.yml file (Check the example code under [basic or optimized]/dbt-project] folder)
Note that the dedicated dbt-project repository is not this example code repository (github repo)
Cloud Build triggers
Trigger build from the dbt project repository
https://cloud.google.com/build/docs/automating-builds/create-manage-triggers

Set Trigger's substitution variables : _GCS_BUCKET and _DBT_SERVICE_ACCOUNT
_GCS_BUCKET : A GCS bucket id for storing dbt documentation files.
_DBT_SERVICE_ACCOUNT : A service account to run dbt from Cloud Build.
BigQuery API enabled
Service account to run dbt commands
Kubernetes Secret to be binded with the service account
https://cloud.google.com/kubernetes-engine/docs/concepts/secret

Profiles for running the dbt project

Check in the /dbt-project/.dbt/profiles.yml, you will find 2 options to run the dbt:

local
You can run the dbt project using your local machine or Cloud Shell. To do that, run

gcloud auth application-default login

Trigger dbt run by using this command:

dbt run --vars '{"project_id": [Your Project id], "bigquery_location": "us", "execution_date": "1970-01-01","source_data_project": "bigquery-public-data"}' --profiles-dir .dbt

remote
This option is for running dbt using service account
For example from Cloud build and Cloud Composer
Check cloudbuild.yaml and dag/dbt_with_kubernetes.py to see how to use this option

Run the code

After all the Prerequisites are prepared. You will have:

A dbt-project repository
Airflow DAG to run the dbt

Here are the follow up steps for running the code:

Push the code in dbt-project repository and make sure the Cloud Build triggered; and successfully create the docker image
In the Cloud Composer UI, run the DAG (e.g dbt_with_kubernetes.py)
If successfull, check the BigQuery console to check the tables

With this mechanism, you have 2 independent runs.
Updating the dbt-project, including models, schema and configurations will run the Cloud Build to create the docker image.
The DAG as dbt scheduler will run the dbt-project from the latest docker image available.

Passing variables from Cloud Composer to dbt run

You can pass variables from Cloud Composer to the dbt run.
As an example, in this code we configure the BigQuery dataset location in the US as part of the DAG.

default_dbt_vars = {
        "project_id": project,

        # Example on using Cloud Composer's variable to be passed to dbt
        "bigquery_location": Variable.get("bigquery_location"),
        
        "key_file_dir": '/var/secrets/google/key.json',
        "source_data_project": Variable.get("source_data_project")
    }

In the dbt script, you can use the variable like this:

location: "{{ var('bigquery_location') }}"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dbt-on-cloud-composer

dbt-on-cloud-composer

README.md

Using dbt and Cloud Composer for managing BigQuery example code

Code Examples

Technical Requirements

High Level Flow

How to run

Prerequisites

Profiles for running the dbt project

Run the code

Passing variables from Cloud Composer to dbt run

Files

dbt-on-cloud-composer

Directory actions

More options

Directory actions

More options

Latest commit

History

dbt-on-cloud-composer

Folders and files

parent directory

README.md

Using dbt and Cloud Composer for managing BigQuery example code

Code Examples

Technical Requirements

High Level Flow

How to run

Prerequisites

Profiles for running the dbt project

Run the code

Passing variables from Cloud Composer to dbt run