GitBook: [master] 12 pages modified

rsyi · Nov 21, 2020 · 774e6c7 · 774e6c7
1 parent 6f14e4f
commit 774e6c7
Show file tree

Hide file tree

Showing 6 changed files with 116 additions and 6 deletions.
diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
@@ -12,6 +12,7 @@
 
 * [Metrics](features/metrics.md)
 * [Running SQL queries](features/running-sql-queries.md)
+* [Jinja2 templating](features/jinja2-templating.md)
 
 ## Customization
 

diff --git a/docs/features/jinja2-templating.md b/docs/features/jinja2-templating.md
@@ -0,0 +1,58 @@
+# Jinja2 templating
+
+{% hint style="info" %}
+**Supported connections:** BigQuery, Postgres, Presto, Redshift, Snowflake
+{% endhint %}
+
+All metrics calculations and executed SQL queries will be passed through the Jinja2 engine, so any basic Jinja2 templating, as you might expect, is supported. If you're not familiar with Jinja2, a basic example is shown below.
+
+```text
+{% set table = census.public_data %}
+select count(*) from {{ table }}
+```
+
+## Reusable templates
+
+On top of this, we provide support for **reusable templates**, which should be saved in the `~/.whale/templates` folder  and **named after the name of the warehouse connection that you would like to use this template for**. Connection names can be found by running `wh connections`, in the `name` field of each yaml block. 
+
+```text
+.whale
+└── templates
+    └── warehouse-connection-name.sql
+```
+
+For example, consider the following BigQuery connection setup:
+
+```text
+---
+name: bq-1
+metadata_source: Bigquery
+key_path: ~
+project_credentials: ~
+project_id: my-bigquery-project
+```
+
+The name of the connection here is `bq-1`, so you'll need to create a file as follows:
+
+```text
+.whale
+└── templates
+    └── bq-1.sql
+```
+
+And the template within will automatically be pre-pended to queries against this connection.
+
+## Template example
+
+The following snippet enables the value `{{ last_day }}` to be used to performantly get data from the latest partition in BigQuery.
+
+```text
+{% set last_day = "_PARTITIONDATE = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)" %}
+```
+
+The following query, then, could be run by `whale`:
+
+```text
+select count(*) from table.schema where {{ last_day }}
+```
+
diff --git a/docs/features/metrics.md b/docs/features/metrics.md
@@ -1,6 +1,10 @@
 # Metrics
 
-Whale supports automatic barebones metric definition and scheduled calculation. Metrics are defined by creating a ```````metrics```` block, as explained below. Any metric defined in this way will automatically be scheduled alongside the metadata scraping job.
+{% hint style="info" %}
+**Supported connections:** BigQuery, Postgres, Presto, Redshift, Snowflake
+{% endhint %}
+
+Whale supports automatic barebones metric definition and scheduled calculation. Metrics are defined by creating a ```````metrics```` block, as explained below. Any metric defined in this way will automatically be scheduled alongside the metadata scraping job. Metric definitions support Jinja2 templating -- for more information on how to set this up, see [Jinja2 templating](jinja2-templating.md).
 
 ## Basic usage
 
@@ -12,8 +16,8 @@ metric-name:
   sql: |
     select statement
 ```
-```
 
+```text
 For example, below two metrics, `null-registrations` and `distinct-registrations` are defined:
 
 ```text
@@ -31,8 +35,8 @@ distinct-registrations:
     from mart.user_signups
     where user_id is not null
 ```
-```
 
+```text
 The same block is shown within the context of a full table stub, below:
 
 ```text
@@ -59,8 +63,8 @@ distinct-registrations:
     from mart.user_signups
     where user_id is not null
 ```
-```
 
+```text
 These metrics will be scheduled, with the latest calculations injected into the programmatic portion of the table stub. An example is shown below:
 
 ```text
@@ -91,7 +95,8 @@ distinct-registrations:
     from mart.user_signups
     where user_id is not null
 ```
-```
+
+\`\`\`
 
 A full list of all historical values are saved in `~/.whale/metrics`.
 
diff --git a/docs/features/running-sql-queries.md b/docs/features/running-sql-queries.md
@@ -1,5 +1,9 @@
 # Running SQL queries
 
+{% hint style="info" %}
+**Supported connections:** BigQuery, Postgres, Presto, Redshift, Snowflake
+{% endhint %}
+
 Whale exposes a direct line into `SQLAlchemy` against connections defined in `~/.whale/config/connections.yaml` through the `wh run` command.
 
 ```text
@@ -12,5 +16,7 @@ If there are multiple warehouses with the same `warehouse_name` the credentials
 
 **Note:** this _only_ works for \(a\) direct connections to warehouses \(not the Hive metastore\) and \(b\) connections where permissions allow for query runs.
 
+`wh run` also supports Jinja2 templating -- for more information on how to set this up, see [Jinja2 templating](jinja2-templating.md).
+
 
 
diff --git a/docs/for-developers/file-structure-overview.md b/docs/for-developers/file-structure-overview.md
@@ -17,7 +17,8 @@ Whale installs all files within the `~/.whale` path. This path contains the foll
 ├── logs
 ├── manifests
 ├── metadata
-└── metrics
+├── metrics
+└── templates
 ```
 
 ## Subdirectories
@@ -64,3 +65,7 @@ The metadata directory stores all warehouse metadata. When typing `enter` on a s
 
 The metrics directory stores all calculated metrics \(along with a timestamp of when the metrics were calculated\). The folder structure follows the same structure as the metadata folder, except the table name is used as a folder to house \(and prevent collisions over\) metric names: `warehouse_name/catalog.schema.table/metric-name.md`.
 
+### templates
+
+The templates directory is where users can add their own Jinja2 templates. When named in the form `warehouse-connection-name.sql`, these templates are pre-pended to any queries run against the warehouse with connection name `warehouse-connection-name`. See the [Jinja2 templating](../features/jinja2-templating.md) section for more details. Connection names can be found by running `wh connections`, in the `name` field of each yaml block. 
+
diff --git a/docs/setup/connection-configuration.md b/docs/setup/connection-configuration.md
@@ -33,6 +33,41 @@ project_id:
 
 Only one of `key_path` and `project_credentials` are required.
 
+## Cloud spanner
+
+```text
+---
+name: 
+metadata_source: spanner
+instance: 
+database:
+project_id:
+```
+
+**To do:** Unlike Bigquery, we currently don't allow you to specify `key_path` or `project_credentials` explicitly.
+
+## Glue
+
+```text
+---
+name: whatever-you-want  # Optional
+metadata_source: Glue
+```
+
+A `name` parameter will place all of your glue documentation within a separate folder, as is done with the other extractors. But because Glue is already a metadata aggregator, this may not be optimal, particularly if you connect to other warehouses with whale directly. In this case, the `name` parameter can be omitted, and the table stubs will reside within subdirectories named after the underlying warehouse/instance.
+
+For example, with `name`, your files will be organized like this:
+
+```text
+your-name/my-instance/postgres_public_table
+```
+
+Without `name`, your files will be stored like this:
+
+```text
+my-instance/postgres_public_table
+```
+
 ## Hive metastore
 
 ```text