test coverage #589

HarlanH · 2017-11-08T19:01:10Z

OK, this is wishful thinking... You know how lots of test frameworks have the ability to calculate the proportion of the code tested by a test suite? Coverage.py, for Python, and so forth? What if dbt had the ability to compute the proportion of models (or columns) that have schema or custom data tests against them?

I'm not even entirely sure I know what this would mean, and I certainly don't know how to do it. But I am sure it'd be useful in pushing dbt developers to write more tests...

The text was updated successfully, but these errors were encountered:

drewbanin · 2017-11-08T19:33:16Z

@HarlanH I think this is a good tie-in with this issue: #375

Whatever we come up with for documentation should definitely support building some sort of code/table test coverage like this.

TBD exactly what that looks like, but thanks for making this issue

tayloramurphy · 2018-09-10T14:38:01Z

As a first iteration I think it should just spit out basic percentages:

Percentage of tables with 1 or more tests
Percent of columns with 1 or more tests

Then you could start to break it down by test type (null, unique, and then custom).

Along with this, I would love to see a severity / type config for tests. We have some "freshness" tests on data we're pulling from Google Sheets. I'd love to just warn that it's not passing on those tests instead of failing.

@drewbanin let me know if this warrants a separate issue.

ThomasLaPiana · 2018-09-10T14:51:46Z

@tayloramurphy i like that but even one step further.

The main reason for test coverage statistics is to know where the most vulnerable parts of the code are. I think summary statistics like "X% of tables have a test" is good but I also need to know specifically what tests exist for what models.

For instance (with a very verbose option):

dbt test-coverage -vv
model | schema | custom
base_sfdc | 3 | 2
base_zendesk | 1 | 0

I can immediately understand that our base_sfdc model has 3 schema tests and 2 custom tests and that base_zendesk has 1 schema test and no custom tests.

I also agree with being able to break it down by type of test, as that will be vital for determining the quality of the tests.

Another pipe dream is also having a strict dbt mode where models without tests can't be run

drewbanin · 2018-09-10T14:56:03Z

Check out the strict option in the original v2 schema.yml spec: #790

This wasn't implemented in our first-pass, but I think something like that is a really good idea, and we now have an obvious place for it.

I like the idea of warning vs. failure, and I've also thought about some sort of max_error config for fickle tests (that fail eg. due to data load latency). This is all really cool stuff -- maybe I can make a new issue to spec out all sorts of test improvements like these

ThomasLaPiana · 2018-09-10T15:16:31Z

i was thinking strict in terms of dbt run --strict and any models that didn't have tests (or a user-defined level of testing is required, for instance tests must have unique and non-null tests to be runnable) simply wouldn't be run.

@drewbanin sounds like a plan 👍

ThomasLaPiana · 2018-09-19T19:46:44Z

for version1 (first iteration) the goal is just to add a --coverage flag to dbt test that will output a table to stdout showing the breakdown of tests for each model, will definitely take some inspiration from pytest-cov. Using this flag will NOT run the tests

drewbanin · 2018-09-19T20:53:49Z

@ThomasLaPiana great!

You can find the code that runs tests here: https://github.com/fishtown-analytics/dbt/blob/dev/guion-bluford/dbt/node_runners.py#L423

I think you'll want to override the execute method to not run tests if the --coverage flag is provided. Further, you'll want to implement the after_run classmethod in the TestRunner. The after_run method accepts manifest as an argument, which you'll be able to use to generate a table of test coverage.

It's not super clear to me how best to pass the --coverage flag into the TestRunner. I'd start by looking at the TestTask class and tracing execution through to the TestRunner. CLI arguments are defined in main.py.

Once you have access to a Manifest object, you can build a table of models and their associated tests.

The Manifest is an object that contains all the information we have about a project. I think you'll want to pluck out the test nodes from manifest.nodes, consulting the column_name attribute to determine which column the test applies to. You can use the depends_on attribute to understand which models a given test applies to.

There's sort of a lot here, but I wanted to point you in the right direction! Let me know if you have any questions, and feel free to open a PR if you want to discuss some code inline :)

Thanks for taking this!

ThomasLaPiana · 2018-09-25T21:12:42Z

PR is here @drewbanin https://github.com/ThomasLaPiana/dbt/pull/1/files
Still checking everywhere you said, will let you know if i have specific questions

github-actions · 2022-01-09T01:54:02Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

cmcarthur added the question label Nov 8, 2017

drewbanin added the auto-generated docs label Nov 10, 2017

drewbanin added good_first_issue Straightforward + self-contained changes, good for new contributors! and removed question labels Sep 19, 2018

drewbanin added dbt-docs [dbt feature] documentation site, powered by metadata artifacts and removed auto-generated docs labels Nov 12, 2018

bastienboutonnet mentioned this issue Mar 8, 2021

Build audit task bitpicky/dbt-sugar#118

Closed

github-actions bot added the stale Issues that have gone stale label Jan 9, 2022

github-actions bot closed this as completed Jan 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test coverage #589

test coverage #589

HarlanH commented Nov 8, 2017

drewbanin commented Nov 8, 2017 •

edited

Loading

tayloramurphy commented Sep 10, 2018

ThomasLaPiana commented Sep 10, 2018

drewbanin commented Sep 10, 2018

ThomasLaPiana commented Sep 10, 2018

ThomasLaPiana commented Sep 19, 2018

drewbanin commented Sep 19, 2018

ThomasLaPiana commented Sep 25, 2018

github-actions bot commented Jan 9, 2022

test coverage #589

test coverage #589

Comments

HarlanH commented Nov 8, 2017

drewbanin commented Nov 8, 2017 • edited Loading

tayloramurphy commented Sep 10, 2018

ThomasLaPiana commented Sep 10, 2018

drewbanin commented Sep 10, 2018

ThomasLaPiana commented Sep 10, 2018

ThomasLaPiana commented Sep 19, 2018

drewbanin commented Sep 19, 2018

ThomasLaPiana commented Sep 25, 2018

github-actions bot commented Jan 9, 2022

drewbanin commented Nov 8, 2017 •

edited

Loading