Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test coverage #589

Closed
HarlanH opened this issue Nov 8, 2017 · 9 comments
Closed

test coverage #589

HarlanH opened this issue Nov 8, 2017 · 9 comments
Labels
dbt-docs [dbt feature] documentation site, powered by metadata artifacts good_first_issue Straightforward + self-contained changes, good for new contributors! stale Issues that have gone stale

Comments

@HarlanH
Copy link

HarlanH commented Nov 8, 2017

OK, this is wishful thinking... You know how lots of test frameworks have the ability to calculate the proportion of the code tested by a test suite? Coverage.py, for Python, and so forth? What if dbt had the ability to compute the proportion of models (or columns) that have schema or custom data tests against them?

I'm not even entirely sure I know what this would mean, and I certainly don't know how to do it. But I am sure it'd be useful in pushing dbt developers to write more tests...

@drewbanin
Copy link
Contributor

drewbanin commented Nov 8, 2017

@HarlanH I think this is a good tie-in with this issue: #375

Whatever we come up with for documentation should definitely support building some sort of code/table test coverage like this.

TBD exactly what that looks like, but thanks for making this issue

@tayloramurphy
Copy link

As a first iteration I think it should just spit out basic percentages:

  • Percentage of tables with 1 or more tests
  • Percent of columns with 1 or more tests

Then you could start to break it down by test type (null, unique, and then custom).

Along with this, I would love to see a severity / type config for tests. We have some "freshness" tests on data we're pulling from Google Sheets. I'd love to just warn that it's not passing on those tests instead of failing.

@drewbanin let me know if this warrants a separate issue.

@ThomasLaPiana
Copy link

@tayloramurphy i like that but even one step further.

The main reason for test coverage statistics is to know where the most vulnerable parts of the code are. I think summary statistics like "X% of tables have a test" is good but I also need to know specifically what tests exist for what models.

For instance (with a very verbose option):

dbt test-coverage -vv
model | schema | custom
base_sfdc | 3 | 2
base_zendesk | 1 | 0

I can immediately understand that our base_sfdc model has 3 schema tests and 2 custom tests and that base_zendesk has 1 schema test and no custom tests.

I also agree with being able to break it down by type of test, as that will be vital for determining the quality of the tests.

Another pipe dream is also having a strict dbt mode where models without tests can't be run

@drewbanin
Copy link
Contributor

Check out the strict option in the original v2 schema.yml spec: #790

This wasn't implemented in our first-pass, but I think something like that is a really good idea, and we now have an obvious place for it.

I like the idea of warning vs. failure, and I've also thought about some sort of max_error config for fickle tests (that fail eg. due to data load latency). This is all really cool stuff -- maybe I can make a new issue to spec out all sorts of test improvements like these

@ThomasLaPiana
Copy link

i was thinking strict in terms of dbt run --strict and any models that didn't have tests (or a user-defined level of testing is required, for instance tests must have unique and non-null tests to be runnable) simply wouldn't be run.

@drewbanin sounds like a plan 👍

@ThomasLaPiana
Copy link

for version1 (first iteration) the goal is just to add a --coverage flag to dbt test that will output a table to stdout showing the breakdown of tests for each model, will definitely take some inspiration from pytest-cov. Using this flag will NOT run the tests

@drewbanin
Copy link
Contributor

@ThomasLaPiana great!

You can find the code that runs tests here: https://github.com/fishtown-analytics/dbt/blob/dev/guion-bluford/dbt/node_runners.py#L423

I think you'll want to override the execute method to not run tests if the --coverage flag is provided. Further, you'll want to implement the after_run classmethod in the TestRunner. The after_run method accepts manifest as an argument, which you'll be able to use to generate a table of test coverage.

It's not super clear to me how best to pass the --coverage flag into the TestRunner. I'd start by looking at the TestTask class and tracing execution through to the TestRunner. CLI arguments are defined in main.py.

Once you have access to a Manifest object, you can build a table of models and their associated tests.

The Manifest is an object that contains all the information we have about a project. I think you'll want to pluck out the test nodes from manifest.nodes, consulting the column_name attribute to determine which column the test applies to. You can use the depends_on attribute to understand which models a given test applies to.

There's sort of a lot here, but I wanted to point you in the right direction! Let me know if you have any questions, and feel free to open a PR if you want to discuss some code inline :)

Thanks for taking this!

@drewbanin drewbanin added good_first_issue Straightforward + self-contained changes, good for new contributors! and removed question labels Sep 19, 2018
@ThomasLaPiana
Copy link

PR is here @drewbanin https://github.com/ThomasLaPiana/dbt/pull/1/files
Still checking everywhere you said, will let you know if i have specific questions

@drewbanin drewbanin added dbt-docs [dbt feature] documentation site, powered by metadata artifacts and removed auto-generated docs labels Nov 12, 2018
@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2022

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Jan 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dbt-docs [dbt feature] documentation site, powered by metadata artifacts good_first_issue Straightforward + self-contained changes, good for new contributors! stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

5 participants