[CT-3123] Make seed inputs optional for unit tests #8652

graciegoheen · 2023-09-15T20:13:55Z

Description

Our proposed unit testing spec, doesn't yet take into account if one of the inputs to the model you want to add a unit test to is a seed.

Current behavior: Let's say model_a depends on my_seed. If you want to add a unit test to model_a - you need to define an input for my_seed. That feels redundant since seeds don't often change and are already small data inputs.

Proposed solution:

Let's say you're unit testing my_favorite_model that depends on:

my_favorite_source (a source)
my_second_favorite_model (a model)
and my_special_seed (a seed)

You must supply mock input data for all of the direct parent models and sources.

unit-tests:
  - name: test_my_model
    model: my_model
    given:
      - input: ref('my_favorite_source')
        format: csv
        fixture: my_favorite_source_fixture
      - input: ref('my_second_favorite_model')
        format: csv
        fixture: my_second_favorite_model_fixture

In this case, we just use your seed for my_special_seed input.

You may supply mock input data for direct parent seeds, if desired.

unit-tests:
  - name: test_my_model
    model: my_model
    given:
      - input: ref('my_favorite_source')
        format: csv
        fixture: my_favorite_source_fixture
      - input: ref('my_second_favorite_model')
        format: csv
        fixture: my_second_favorite_model_fixture
      - input: ref('my_special_seed')
        format: csv
        fixture: my_special_seed_fixture

In this case, we use your fixture my_special_seed_fixture for my_special_seed input.

Acceptance Criteria

If no input is provided for a seed, we will use your seed (ignore pre and post hooks)
If an input is provided for a seed, we will use the input

Impact to other teams

None

Will backports be required?

No

Context

The text was updated successfully, but these errors were encountered:

dbeatty10 · 2023-09-20T04:53:14Z

Maybe it "just works"?

Let's suppose my_upstream_model_a is a model (rather than a seed) and this is how we configure a fixture for it:

unit-tests:
  - name: test_my_model
    model: my_model
    given:
      - input: ref('my_upstream_model_a')
        format: csv
        fixture: my_upstream_model_a_fixture

Then we can imagine that the above should "just work" even if my_upstream_model_a were a seed instead -- the seed's CSV rows would just be replaced by the CSV rows within my_upstream_model_a_fixture, right?

Can we think of reasons why it wouldn't "just work"?

An additional thought

For the special case of my_upstream_model_a_fixture being a seed, what about the idea of being able to make format and fixture optional:

unit-tests:
  - name: test_my_model
    model: my_model
    given:
      - input: ref('my_upstream_model_a')

Or maybe even going the whole way and making the input: optional for upstream seeds:

unit-tests:
  - name: test_my_model
    model: my_model
    given:
      # presumably other `input`s are explicitly defined here, but at least `my_upstream_model_a` is implicitly defined -- since it's a seed, we already have all the data we need to mock it

graciegoheen · 2023-09-21T21:35:55Z

Updated the issue!! Thanks @dbeatty10 for your great ideas :)

Additional thought...

What would happen if you didn't care to supply mock data for one of the direct parent models (because it's not relevant for the logic that you're trying to test), where you want to instead use an "empty" mock data set or a one-row, all NULL data set?

would you not have to supply the input at all?

unit-tests:
  - name: test_my_model
    model: my_model
    given:

or would we require that you explicitly specify that behavior?

unit-tests:
  - name: test_my_model
    model: my_model
    given:
      - input: ref('my_upstream_model_a')
        rows: {}

The second feels less magical, but I like that it's an intentional decision rather than the risk of "oops, I forgot to add mock data for my_upstream_model_a but didn't get any error"!

Food for thought... probably for a separate issue. cc: @MichelleArk

graciegoheen · 2023-09-25T15:17:06Z

Notes from refinement:

we should have a consistent limit for how large a seed & fixture can be (maybe if you seed file is larger than the size limit for fixture, you must define a fixture for your seed) -> let's open up a new issue for this!

graciegoheen · 2023-10-09T15:48:24Z

Notes from estimation:

seeds can have pre-hooks & post-hooks; can we just ignore these for unit testing? that's what we're doing for models

aranke · 2023-11-14T14:57:01Z

Hey @graciegoheen, can you please clarify how this scenario can be reached during unit testing?

seeds can have pre-hooks & post-hooks; can we just ignore these for unit testing? that's what we're doing for models

From what I can see, these only get triggered during dbt seed and not during dbt unit-test.

Furthermore, the spec labels this as an anti-goal:

Unit testing seeds, snapshots, analyses, or macros directly (they can be tested by making a passthrough model though!)

Given this, is it fair to cut pre- / post-hooks for seeds out of scope?

graciegoheen · 2023-11-14T15:10:43Z

Yes, unit testing seeds directly is an anti-goal - so we can cut pre- / post- hooks for seeds out of scope!

…fied in YAML config (#9064) Co-authored-by: Michelle Ark <[email protected]> Fix #8652: Use seed value if rows not specified

* Initial implementation of unit testing (from pr #2911) Co-authored-by: Michelle Ark <[email protected]> * 8295 unit testing artifacts (#8477) * unit test config: tags & meta (#8565) * Add additional functional test for unit testing selection, artifacts, etc (#8639) * Enable inline csv format in unit testing (#8743) * Support unit testing incremental models (#8891) * update unit test key: unit -> unit-tests (#8988) * convert to use unit test name at top level key (#8966) * csv file fixtures (#9044) * Unit test support for `state:modified` and `--defer` (#9032) Co-authored-by: Michelle Ark <[email protected]> * Allow use of sources as unit testing inputs (#9059) * Use daff for diff formatting in unit testing (#8984) * Fix #8652: Use seed file from disk for unit testing if rows not specified in YAML config (#9064) Co-authored-by: Michelle Ark <[email protected]> Fix #8652: Use seed value if rows not specified * Move unit testing to test and build commands (#9108) * Enable unit testing in non-root packages (#9184) * convert test to data_test (#9201) * Make fixtures files full-fledged members of manifest and enable partial parsing (#9225) * In build command run unit tests before models (#9273) --------- Co-authored-by: Michelle Ark <[email protected]> Co-authored-by: Michelle Ark <[email protected]> Co-authored-by: Emily Rockman <[email protected]> Co-authored-by: Jeremy Cohen <[email protected]> Co-authored-by: Kshitij Aranke <[email protected]>

graciegoheen added the enhancement New feature or request label Sep 15, 2023

github-actions bot changed the title ~~Handle input seeds in unit tests~~ [CT-3123] Handle input seeds in unit tests Sep 15, 2023

graciegoheen mentioned this issue Sep 15, 2023

[CT-2911] [Epic] Unit testing dbt models #8283

Closed

graciegoheen changed the title ~~[CT-3123] Handle input seeds in unit tests~~ [CT-3123] Make seed inputs optional for unit tests Sep 21, 2023

graciegoheen mentioned this issue Sep 25, 2023

[CT-3153] [implementation] limit the size of fixtures for unit testing #8708

Open

1 task

graciegoheen added the user docs [docs.getdbt.com] Needs better documentation label Oct 5, 2023

graciegoheen assigned gshank and unassigned gshank Oct 10, 2023

graciegoheen assigned aranke Oct 25, 2023

aranke mentioned this issue Nov 13, 2023

Fix #8652: Use seed file from disk for unit testing if rows not specified in YAML config #9064

Merged

5 tasks

aranke added a commit that referenced this issue Nov 13, 2023

Fix #8652: Use seed value if rows not specified

8b2948d

aranke mentioned this issue Nov 15, 2023

[CT-2967] [Feature] run dbt in --sample mode #8378

Open

3 tasks

aranke added a commit that referenced this issue Nov 16, 2023

Fix #8652: Use seed file from disk for unit testing if rows not speci…

3432436

…fied in YAML config (#9064) Co-authored-by: Michelle Ark <[email protected]> Fix #8652: Use seed value if rows not specified

graciegoheen closed this as completed Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-3123] Make seed inputs optional for unit tests #8652

[CT-3123] Make seed inputs optional for unit tests #8652

graciegoheen commented Sep 15, 2023 •

edited

Loading

dbeatty10 commented Sep 20, 2023

graciegoheen commented Sep 21, 2023 •

edited

Loading

graciegoheen commented Sep 25, 2023 •

edited

Loading

graciegoheen commented Oct 9, 2023

aranke commented Nov 14, 2023

graciegoheen commented Nov 14, 2023

[CT-3123] Make seed inputs optional for unit tests #8652

[CT-3123] Make seed inputs optional for unit tests #8652

Comments

graciegoheen commented Sep 15, 2023 • edited Loading

Description

Acceptance Criteria

Impact to other teams

Will backports be required?

Context

dbeatty10 commented Sep 20, 2023

Maybe it "just works"?

An additional thought

graciegoheen commented Sep 21, 2023 • edited Loading

graciegoheen commented Sep 25, 2023 • edited Loading

graciegoheen commented Oct 9, 2023

aranke commented Nov 14, 2023

graciegoheen commented Nov 14, 2023

graciegoheen commented Sep 15, 2023 •

edited

Loading

graciegoheen commented Sep 21, 2023 •

edited

Loading

graciegoheen commented Sep 25, 2023 •

edited

Loading