Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2917] Add ability to use csv file fixtures for inputs/expected in addition to inline yaml dict/csv #8290

Closed
Tracked by #8283
gshank opened this issue Aug 2, 2023 · 7 comments
Assignees
Labels
enhancement New feature or request user docs [docs.getdbt.com] Needs better documentation

Comments

@gshank
Copy link
Contributor

gshank commented Aug 2, 2023

Description

The initial implement of unit testing specifies input rows and expected rows via yaml dictionaries or inline csv. Some users of this feature would prefer to use csv files, particularly for tables with many columns. Enable a "fixture" specifier for format: csv and implement ingesting and running csv files as unit test inputs.

Fixtures are csv files in the tests/fixtures folder (or whatever you've configured for test-paths).

If I have a csv file - tests/fixtures/my_model_a_fixture.csv - this is the definition of my fixture named my_model_a_fixture (similar to how if I have a sql file models/my_model_a.sql that is the definition of my model named my_model_a), and I can reference that fixture in my yml with the fixture name:

unit-tests:
  - name: test_my_model
    model: my_model
    given:
      - input: ref('my_model_a')
        format: csv
        fixture: my_model_a_fixture
...

Note: this means fixture names must be unique, but they do not need to be the same name as the corresponding model

For test coverage purposes, we may want to store the column names somewhere. It may also be necessary or useful to specify the column data types.

Acceptance Criteria

  • Unit tests can be written with inputs as "fixtures" that reference csv files

Impact to Other Teams

None

Will backports be required?

No

Context

Note: This ticket will not handle data types, that will be covered in #8649

@github-actions github-actions bot changed the title Add ability to specify the format for fixtures and use csv files for inputs/expected in addition to yaml [CT-2917] Add ability to specify the format for fixtures and use csv files for inputs/expected in addition to yaml Aug 2, 2023
@MichelleArk
Copy link
Contributor

This capability should probably also be available for expect fixtures in addition to given input fixtures. We'll need to generalize the type of expect to support this (and unify it with the parsing of InuptFixtures, as it is currently simply a List[Dict[str, Any]]

Here's a first stab at what this spec could look like:

unit:
  - model: my_model
    tests:
      - name: test_my_model
        given:
          - input: ref('my_model_a')
            format: csv_file # optional
            rows: path/to/model_a_fixture.csv
          - input: ref('my_model_b')
            rows: #default 
              - {id: 1, b: 2}
              - {id: 2, b: 2}
        expect:
          format: csv #format can also be applied to expected
          rows: """
          c
          3
          """

But its worth considering whether the format config would be applied at the fixture (input or expected) level, as above --
or at the test (suite?) level, for example:

unit:
  - model: my_model
     config:
       format: csv_file #applies to all fixtures in tests for my_model
    tests:
      - name: test_my_model
        config:
          format: csv_file #applies to all fixtures in test_my_model
        given:
          - input: ref('my_model_a')
            rows: path/to/fixture.csv
          - input: ref('my_model_b')
            rows: path/to/fixture_b.csv
        expect:
          rows: path/to/expected_model.csv

either way, should a single test case be able to use multiple different input formats? for example, input: ref('my_model_a') uses format: dicts and input: ref('my_model_b') uses csv_file? i think that variability within a test case could reduce readability but there may be valid use cases to support it.

@graciegoheen
Copy link
Contributor

graciegoheen commented Sep 5, 2023

A few thoughts here, in somewhat of a random order:

The default for format is an "inline list of dictionaries" - what is the appropriate explicit name for this? I see @MichelleArk mentioned format: dicts as an option - I feel good about this, unless anyone has strong opinions otherwise.

I'm curious about the implications for using something other than rows for formats other than dicts - what if format: dicts used rows, and format: csv_file used path?

How do you feel about specifying the path to the csv file (ex: path: path/to/model_a_fixture.csv) vs. just specifying the name of the fixture file (ex: filename: model_a_fixture.csv) or even something akin to a docs block (ex: rows: {{ fixture('model_a_fixture') }}). Do we imagine a future format where we allow people to specify multiple fixtures in a single csv file?

I think there could be some confusion with specifying the file path - is it the relative path? the path from the root directory?

Should we havea unit-paths config in the dbt_project.yml similar to test-paths - by default maybe this is a new subfolder unit/, but folks could override it if they wanted to specify a different location for fixtures like tests/unit or fixtures/.

I think we should definitely support format: csv_file - format: csv for inline csvs seems less important to me, though curious other folks' opinions on this.

But its worth considering whether the format config would be applied at the fixture (input or expected) level, as above --
or at the test (suite?) level

In the same vein, is the format config something people would want to override at the dbt_project.yml file level?

#dbt_project.yml
unit:
    format: csv_file

I do think we should support variability of formats within a single unit test.

@graciegoheen
Copy link
Contributor

    format: csv
    fixture: my_model_fixture
    format: csv
    rows: """
          c
          3
          """

@MichelleArk
Copy link
Contributor

Split out the format: csv piece in #8626. Let's continue refining the fixture spec here.

@graciegoheen
Copy link
Contributor

graciegoheen commented Sep 14, 2023

From refinement:

If I have a csv file - tests/fixtures/my_model_a_fixture.csv - this is the definition of my fixture named my_model_a_fixture (similar to how if I have a sql file models/my_model_a.sql that is the definition of my model named my_model_a), and I can reference that fixture in my yml with the file name:

unit-tests:
  - name: test_my_model
    model: my_model
    given:
      - input: ref('my_model_a')
        format: csv
        fixture: my_model_a_fixture
...

This feels similar to how defined_in for model versions work:
Screenshot 2023-09-14 at 3 31 15 PM

Note: this means that fixture names must be unique.

@graciegoheen graciegoheen changed the title [CT-2917] Add ability to specify the format for fixtures and use csv files for inputs/expected in addition to yaml [CT-2917] Add ability to use csv file fixtures for inputs/expected in addition to inline yaml dict/csvs Sep 22, 2023
@graciegoheen graciegoheen changed the title [CT-2917] Add ability to use csv file fixtures for inputs/expected in addition to inline yaml dict/csvs [CT-2917] Add ability to use csv file fixtures for inputs/expected in addition to inline yaml dict/csv Sep 22, 2023
@graciegoheen
Copy link
Contributor

graciegoheen commented Sep 25, 2023

Notes from refinement:

  • should we add a limit to how large the csv files / inputs can be?

@emmyoop
Copy link
Member

emmyoop commented Nov 10, 2023

resolved by #9044

@emmyoop emmyoop closed this as completed Nov 10, 2023
@graciegoheen graciegoheen added the user docs [docs.getdbt.com] Needs better documentation label Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
4 participants