diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 9d8728cc..8c97f833 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -12,6 +12,7 @@ * [Metrics](features/metrics.md) * [Running SQL queries](features/running-sql-queries.md) +* [Jinja2 templating](features/jinja2-templating.md) ## Customization diff --git a/docs/features/jinja2-templating.md b/docs/features/jinja2-templating.md new file mode 100644 index 00000000..a502f926 --- /dev/null +++ b/docs/features/jinja2-templating.md @@ -0,0 +1,58 @@ +# Jinja2 templating + +{% hint style="info" %} +**Supported connections:** BigQuery, Postgres, Presto, Redshift, Snowflake +{% endhint %} + +All metrics calculations and executed SQL queries will be passed through the Jinja2 engine, so any basic Jinja2 templating, as you might expect, is supported. If you're not familiar with Jinja2, a basic example is shown below. + +```text +{% set table = census.public_data %} +select count(*) from {{ table }} +``` + +## Reusable templates + +On top of this, we provide support for **reusable templates**, which should be saved in the `~/.whale/templates` folder and **named after the name of the warehouse connection that you would like to use this template for**. Connection names can be found by running `wh connections`, in the `name` field of each yaml block. + +```text +.whale +└── templates + └── warehouse-connection-name.sql +``` + +For example, consider the following BigQuery connection setup: + +```text +--- +name: bq-1 +metadata_source: Bigquery +key_path: ~ +project_credentials: ~ +project_id: my-bigquery-project +``` + +The name of the connection here is `bq-1`, so you'll need to create a file as follows: + +```text +.whale +└── templates + └── bq-1.sql +``` + +And the template within will automatically be pre-pended to queries against this connection. + +## Template example + +The following snippet enables the value `{{ last_day }}` to be used to performantly get data from the latest partition in BigQuery. + +```text +{% set last_day = "_PARTITIONDATE = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)" %} +``` + +The following query, then, could be run by `whale`: + +```text +select count(*) from table.schema where {{ last_day }} +``` + diff --git a/docs/features/metrics.md b/docs/features/metrics.md index 62557dbd..2e5b2110 100644 --- a/docs/features/metrics.md +++ b/docs/features/metrics.md @@ -1,6 +1,10 @@ # Metrics -Whale supports automatic barebones metric definition and scheduled calculation. Metrics are defined by creating a ```````metrics```` block, as explained below. Any metric defined in this way will automatically be scheduled alongside the metadata scraping job. +{% hint style="info" %} +**Supported connections:** BigQuery, Postgres, Presto, Redshift, Snowflake +{% endhint %} + +Whale supports automatic barebones metric definition and scheduled calculation. Metrics are defined by creating a ```````metrics```` block, as explained below. Any metric defined in this way will automatically be scheduled alongside the metadata scraping job. Metric definitions support Jinja2 templating -- for more information on how to set this up, see [Jinja2 templating](jinja2-templating.md). ## Basic usage @@ -12,8 +16,8 @@ metric-name: sql: | select statement ``` -``` +```text For example, below two metrics, `null-registrations` and `distinct-registrations` are defined: ```text @@ -31,8 +35,8 @@ distinct-registrations: from mart.user_signups where user_id is not null ``` -``` +```text The same block is shown within the context of a full table stub, below: ```text @@ -59,8 +63,8 @@ distinct-registrations: from mart.user_signups where user_id is not null ``` -``` +```text These metrics will be scheduled, with the latest calculations injected into the programmatic portion of the table stub. An example is shown below: ```text @@ -91,7 +95,8 @@ distinct-registrations: from mart.user_signups where user_id is not null ``` -``` + +\`\`\` A full list of all historical values are saved in `~/.whale/metrics`. diff --git a/docs/features/running-sql-queries.md b/docs/features/running-sql-queries.md index 8f505f27..d39c97e5 100644 --- a/docs/features/running-sql-queries.md +++ b/docs/features/running-sql-queries.md @@ -1,5 +1,9 @@ # Running SQL queries +{% hint style="info" %} +**Supported connections:** BigQuery, Postgres, Presto, Redshift, Snowflake +{% endhint %} + Whale exposes a direct line into `SQLAlchemy` against connections defined in `~/.whale/config/connections.yaml` through the `wh run` command. ```text @@ -12,5 +16,7 @@ If there are multiple warehouses with the same `warehouse_name` the credentials **Note:** this _only_ works for \(a\) direct connections to warehouses \(not the Hive metastore\) and \(b\) connections where permissions allow for query runs. +`wh run` also supports Jinja2 templating -- for more information on how to set this up, see [Jinja2 templating](jinja2-templating.md). + diff --git a/docs/for-developers/file-structure-overview.md b/docs/for-developers/file-structure-overview.md index c1018f45..93ef0da4 100644 --- a/docs/for-developers/file-structure-overview.md +++ b/docs/for-developers/file-structure-overview.md @@ -17,7 +17,8 @@ Whale installs all files within the `~/.whale` path. This path contains the foll ├── logs ├── manifests ├── metadata -└── metrics +├── metrics +└── templates ``` ## Subdirectories @@ -64,3 +65,7 @@ The metadata directory stores all warehouse metadata. When typing `enter` on a s The metrics directory stores all calculated metrics \(along with a timestamp of when the metrics were calculated\). The folder structure follows the same structure as the metadata folder, except the table name is used as a folder to house \(and prevent collisions over\) metric names: `warehouse_name/catalog.schema.table/metric-name.md`. +### templates + +The templates directory is where users can add their own Jinja2 templates. When named in the form `warehouse-connection-name.sql`, these templates are pre-pended to any queries run against the warehouse with connection name `warehouse-connection-name`. See the [Jinja2 templating](../features/jinja2-templating.md) section for more details. Connection names can be found by running `wh connections`, in the `name` field of each yaml block. + diff --git a/docs/setup/connection-configuration.md b/docs/setup/connection-configuration.md index 65f37400..98b7a47a 100644 --- a/docs/setup/connection-configuration.md +++ b/docs/setup/connection-configuration.md @@ -33,6 +33,41 @@ project_id: Only one of `key_path` and `project_credentials` are required. +## Cloud spanner + +```text +--- +name: +metadata_source: spanner +instance: +database: +project_id: +``` + +**To do:** Unlike Bigquery, we currently don't allow you to specify `key_path` or `project_credentials` explicitly. + +## Glue + +```text +--- +name: whatever-you-want # Optional +metadata_source: Glue +``` + +A `name` parameter will place all of your glue documentation within a separate folder, as is done with the other extractors. But because Glue is already a metadata aggregator, this may not be optimal, particularly if you connect to other warehouses with whale directly. In this case, the `name` parameter can be omitted, and the table stubs will reside within subdirectories named after the underlying warehouse/instance. + +For example, with `name`, your files will be organized like this: + +```text +your-name/my-instance/postgres_public_table +``` + +Without `name`, your files will be stored like this: + +```text +my-instance/postgres_public_table +``` + ## Hive metastore ```text