Skip to content

Commit

Permalink
GitBook: [master] 12 pages modified
Browse files Browse the repository at this point in the history
  • Loading branch information
rsyi authored and gitbook-bot committed Nov 21, 2020
1 parent 6f14e4f commit 774e6c7
Show file tree
Hide file tree
Showing 6 changed files with 116 additions and 6 deletions.
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

* [Metrics](features/metrics.md)
* [Running SQL queries](features/running-sql-queries.md)
* [Jinja2 templating](features/jinja2-templating.md)

## Customization

Expand Down
58 changes: 58 additions & 0 deletions docs/features/jinja2-templating.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Jinja2 templating

{% hint style="info" %}
**Supported connections:** BigQuery, Postgres, Presto, Redshift, Snowflake
{% endhint %}

All metrics calculations and executed SQL queries will be passed through the Jinja2 engine, so any basic Jinja2 templating, as you might expect, is supported. If you're not familiar with Jinja2, a basic example is shown below.

```text
{% set table = census.public_data %}
select count(*) from {{ table }}
```

## Reusable templates

On top of this, we provide support for **reusable templates**, which should be saved in the `~/.whale/templates` folder and **named after the name of the warehouse connection that you would like to use this template for**. Connection names can be found by running `wh connections`, in the `name` field of each yaml block.

```text
.whale
└── templates
└── warehouse-connection-name.sql
```

For example, consider the following BigQuery connection setup:

```text
---
name: bq-1
metadata_source: Bigquery
key_path: ~
project_credentials: ~
project_id: my-bigquery-project
```

The name of the connection here is `bq-1`, so you'll need to create a file as follows:

```text
.whale
└── templates
└── bq-1.sql
```

And the template within will automatically be pre-pended to queries against this connection.

## Template example

The following snippet enables the value `{{ last_day }}` to be used to performantly get data from the latest partition in BigQuery.

```text
{% set last_day = "_PARTITIONDATE = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)" %}
```

The following query, then, could be run by `whale`:

```text
select count(*) from table.schema where {{ last_day }}
```

15 changes: 10 additions & 5 deletions docs/features/metrics.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# Metrics

Whale supports automatic barebones metric definition and scheduled calculation. Metrics are defined by creating a ```````metrics```` block, as explained below. Any metric defined in this way will automatically be scheduled alongside the metadata scraping job.
{% hint style="info" %}
**Supported connections:** BigQuery, Postgres, Presto, Redshift, Snowflake
{% endhint %}

Whale supports automatic barebones metric definition and scheduled calculation. Metrics are defined by creating a ```````metrics```` block, as explained below. Any metric defined in this way will automatically be scheduled alongside the metadata scraping job. Metric definitions support Jinja2 templating -- for more information on how to set this up, see [Jinja2 templating](jinja2-templating.md).

## Basic usage

Expand All @@ -12,8 +16,8 @@ metric-name:
sql: |
select statement
```
```

```text
For example, below two metrics, `null-registrations` and `distinct-registrations` are defined:
```text
Expand All @@ -31,8 +35,8 @@ distinct-registrations:
from mart.user_signups
where user_id is not null
```
```

```text
The same block is shown within the context of a full table stub, below:
```text
Expand All @@ -59,8 +63,8 @@ distinct-registrations:
from mart.user_signups
where user_id is not null
```
```

```text
These metrics will be scheduled, with the latest calculations injected into the programmatic portion of the table stub. An example is shown below:
```text
Expand Down Expand Up @@ -91,7 +95,8 @@ distinct-registrations:
from mart.user_signups
where user_id is not null
```
```

\`\`\`

A full list of all historical values are saved in `~/.whale/metrics`.

6 changes: 6 additions & 0 deletions docs/features/running-sql-queries.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Running SQL queries

{% hint style="info" %}
**Supported connections:** BigQuery, Postgres, Presto, Redshift, Snowflake
{% endhint %}

Whale exposes a direct line into `SQLAlchemy` against connections defined in `~/.whale/config/connections.yaml` through the `wh run` command.

```text
Expand All @@ -12,5 +16,7 @@ If there are multiple warehouses with the same `warehouse_name` the credentials

**Note:** this _only_ works for \(a\) direct connections to warehouses \(not the Hive metastore\) and \(b\) connections where permissions allow for query runs.

`wh run` also supports Jinja2 templating -- for more information on how to set this up, see [Jinja2 templating](jinja2-templating.md).



7 changes: 6 additions & 1 deletion docs/for-developers/file-structure-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ Whale installs all files within the `~/.whale` path. This path contains the foll
├── logs
├── manifests
├── metadata
└── metrics
├── metrics
└── templates
```

## Subdirectories
Expand Down Expand Up @@ -64,3 +65,7 @@ The metadata directory stores all warehouse metadata. When typing `enter` on a s

The metrics directory stores all calculated metrics \(along with a timestamp of when the metrics were calculated\). The folder structure follows the same structure as the metadata folder, except the table name is used as a folder to house \(and prevent collisions over\) metric names: `warehouse_name/catalog.schema.table/metric-name.md`.

### templates

The templates directory is where users can add their own Jinja2 templates. When named in the form `warehouse-connection-name.sql`, these templates are pre-pended to any queries run against the warehouse with connection name `warehouse-connection-name`. See the [Jinja2 templating](../features/jinja2-templating.md) section for more details. Connection names can be found by running `wh connections`, in the `name` field of each yaml block.

35 changes: 35 additions & 0 deletions docs/setup/connection-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,41 @@ project_id:

Only one of `key_path` and `project_credentials` are required.

## Cloud spanner

```text
---
name:
metadata_source: spanner
instance:
database:
project_id:
```

**To do:** Unlike Bigquery, we currently don't allow you to specify `key_path` or `project_credentials` explicitly.

## Glue

```text
---
name: whatever-you-want # Optional
metadata_source: Glue
```

A `name` parameter will place all of your glue documentation within a separate folder, as is done with the other extractors. But because Glue is already a metadata aggregator, this may not be optimal, particularly if you connect to other warehouses with whale directly. In this case, the `name` parameter can be omitted, and the table stubs will reside within subdirectories named after the underlying warehouse/instance.

For example, with `name`, your files will be organized like this:

```text
your-name/my-instance/postgres_public_table
```

Without `name`, your files will be stored like this:

```text
my-instance/postgres_public_table
```

## Hive metastore

```text
Expand Down

0 comments on commit 774e6c7

Please sign in to comment.