Skip to content

A dbt SQL package for ensuring documentation and test coverage, with granular control.

License

Notifications You must be signed in to change notification settings

tnightengale/dbt-meta-testing

Repository files navigation

buildstatus

dbt Meta Testing

This dbt package contains macros to assert test and documentation coverage from dbt_project.yml configuration settings.

Table of Contents

Install

Include in packages.yml:

packages:
  - package: tnightengale/dbt_meta_testing
    version: 0.3.6

For latest release, see https://github.com/tnightengale/dbt-meta-testing/releases.

Configurations

This package features two meta configs that can be applied to a dbt project:

  1. +required_tests
  2. +required_docs

Read the dbt documentation here to learn more about model configurations in dbt.

Required Tests

To require test coverage, define the +required_tests configuration on a model path in dbt_project.yml:

# dbt_project.yml
...
models:
  project:
    +required_docs: true
    marts:
      +required_tests: {"unique.*|not_null": 1}
      model_2:
        +required_tests:
          "mocker.*|unique": 1
          "mock_schema_test": 1
          ".*data_test": 1

The +required_tests config must be None or a dict with str keys and int values. YAML dictionaries are accepted.

All the regular dbt configuration hierarchy rules apply. For example, individual model configs will override configs from the dbt_project.yml:

# /models/marts/core/your_model.sql

-- This overrides the config in dbt_project.yml, and this model will not require tests
{{ config(required_tests=None) }}

SELECT
...

New in Version 0.3.3

The keys of the config are evaluated against both data and schema tests (including any custom tests) using the re.fullmatch function.

Therefore, any test restriction which can be expressed in regex can be evaluated.

For example:

# dbt_project.yml
...
models:
  project:
    +required_docs: true
    # The following configuration on the `marts` model path requires
    # each model in that path to have at least one test that either:
    #
    #    1. starts with "unique" (note the ".*" regex suffix) OR (note the "|" regex)
    #    2. is an exact match for the "not_null" test.

    marts:
      +required_tests: {"unique.*|not_null": 1}

Schema tests are matched against their common names, (eg. not_null, accepted_values).

Data tests are matched against their macro name.

Custom schema tests are matched against their name, eg. mock_schema_test:

# models/schema.yml
...
    - name: model_2
      description: ""
      tests:
        - equal_rowcount:
            compare_model: ref('model_1')
      columns:
          - name: id
            description: "The primary key for this table"
            tests:
                - unique
                - not_null
                - mock_schema_test

Models that do not meet their configured test minimums, because they either lack the tests or are not documented, will be listed in the error when validated via a run-operation:

usr@home dbt-meta-testing $ dbt run-operation required_tests
Running with dbt=0.20.0
Encountered an error while running operation: Compilation Error in macro required_tests (macros/required_tests.sql)
  Insufficient test coverage from the 'required_tests' config on the following models:
  Model: 'model_1' Test: 'not_null' Got: 1 Expected: 2
  Model: 'model_1' Test: 'mock_schema_test' Got: 0 Expected: 1

  > in macro _evaluate_required_tests (macros/utils/required_tests/evaluate_required_tests.sql)
  > called by macro required_tests (macros/required_tests.sql)
  > called by macro required_tests (macros/required_tests.sql)
usr@home dbt-meta-testing $

Required Docs

To require documentation coverage, define the +required_docs configuration on a model path in dbt_project.yml:

# dbt_project.yml
...
models:
    project:
        +required_docs: true

The +required_docs config must be a bool.

It also does not check ephemeral models. This is because it cannot leverage adapter.get_columns_in_relation() macro on ephemeral models, which it uses to fetch columns from the data warehouse and detect columns without documentation.

When applied to a non-ephemeral model, this config will ensure 3 things:

  1. The model has a non-empty description
  2. The columns in the model are specified in the model .yml
  3. The columns specified in the model .yml have non-empty descriptions

For example, the following configurations:

# models/schema.yml
version: 2

models:
    - name: model_1
      description: "A starter dbt model"
      columns:
          - name: id
            description: ""
            tests:
                - unique
                - not_null

    - name: model_2
      description: ""
      tests:
        - equal_rowcount:
            compare_model: ref('model_1')
      columns:
          - name: id
            description: "The primary key for this table"
            tests:
                - unique
                - not_null

Where model_2 has a column new which is not defined in the .yml above:

-- models/example/model_2.sql
select
    *,
    'new' as new
from {{ ref('model_1') }}
where id = 1

And all models in the example path require docs:

# dbt_project.yml
...
models:
    project:
        example:
            +required_docs: true

Would result in the following error when validated via a run-operation:

usr@home dbt-meta-testing $ dbt run-operation required_docs
Running with dbt=0.20.0
Encountered an error while running operation: Compilation Error in macro required_docs (macros/required_docs.sql)
  The following models are missing descriptions:
   - model_2
  The following columns are missing from the model yml:
   - model_2.new
  The following columns are present in the model yml, but have blank descriptions:
   - model_1.id

  > in macro _evaluate_required_docs (macros/utils/required_docs/evaluate_required_docs.sql)
  > called by macro required_docs (macros/required_docs.sql)
  > called by macro required_docs (macros/required_docs.sql)
usr@home dbt-meta-testing $

Usage

To assert either the +required_tests or +required_docs configuration, run the correpsonding macro as a run-operation within the dbt CLI.

By default the macro will check all models with the corresponding configuration. If any model does not meet the configuration, the run-operation will fail (non-zero) and display an appropriate error message.

To assert the configuration for only a subset of the configured models (eg. new models only in a CI) pass an argument, models, to the macro as a space delimited string of model names to use.

It's also possible to pass in the result of a dbt ls -m <selection_syntax> command, in order to make use of dbt node selection syntax. Use shell subsitution in a dictionary representation.

For example, to run only changed models using dbt's Slim CI feature:

dbt run-operation required_tests --args "{'models':'$(dbt list -m state:modified --state <filepath>)'}"

Alternatively, a space delimited string of model names will work as well:

dbt run-operation required_tests --args "{'models':'model1 model2 model3'}"

required_tests (source)

Validates that models meet the +required_tests configurations applied in dbt_project.yml. Typically used only as a run-operation in a CI pipeline.

Usage:

dbt run-operation required_tests [--args "{'models': '<space_delimited_models>'}"]

required_docs (source)

Validates that models meet the +required_docs configurations applied in dbt_project.yml. Typically used only as a run-operation in a CI pipeline.

Usage:

dbt run-operation required_docs [--args "{'models': '<space_delimited_models>'}"]

Note: Run this command after dbt run: only models that already exist in the warehouse can be validated for columns that are missing from the model .yml. By default, column names are assumed to be lower case in the DBT documentation, if this is not the case in your project, setting the variable convert_column_names_to_lower_case to false in dbt_project.yml will compare the column names in the case they appear.

Contributions

Feedback on this project is welcomed and encouraged. Please open an issue or start a discussion if you would like to request a feature change or contribute to this project.

Testing

The integration tests for this package are located at ./integration_tests/tests/.

To run the tests locally, ensure you have the correct environment variables set according to the targets in ./integration_tests/profiles.yml and use:

cd integration_tests
dbt test --data

Verified Data Warehouses

This package has been tested for the following data warehouses:

  • Snowflake