Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3497] I want to add a description/label to each of the rows in my unit test to explicitly call out the edge cases I'm testing for #9283

Open
3 tasks done
Tracked by #8283
graciegoheen opened this issue Dec 13, 2023 · 2 comments
Labels
enhancement New feature or request unit tests Issues related to built-in dbt unit testing functionality

Comments

@graciegoheen
Copy link
Contributor

graciegoheen commented Dec 13, 2023

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

When creating a unit test in my project:

unit_tests:
  - name: a # this is the unique name of the test
    model: dim_wizards # name of the model I'm unit testing
    given: # the mock data for your inputs
      - input: ref('stg_wizards')
        rows:
          - {wizard_id: 1, email: [email protected],     email_top_level_domain: example.com}
          - {wizard_id: 2, email: [email protected],     email_top_level_domain: unknown.com}
          - {wizard_id: 3, email: badgmail.com,         email_top_level_domain: gmail.com}
          - {wizard_id: 4, email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
      - input: ref('top_level_email_domains')
        rows:
          - {tld: example.com}
          - {tld: gmail.com}
      - input: ref('stg_worlds')
        rows:
          - {world_id: 1}
    expect: # the expected output given the inputs above
      rows:
        - {wizard_id: 1, is_valid_email_address: true}
        - {wizard_id: 2, is_valid_email_address: false}
        - {wizard_id: 3, is_valid_email_address: false}
        - {wizard_id: 4, is_valid_email_address: false}

I want to optionally add descriptions/labels to each of my input rows to explain what each of the edge cases are. Something like:

      - input: ref('stg_wizards')
        rows:
          - {wizard_id: 1, email: [email protected],     email_top_level_domain: example.com}
             description: valid email
          - {wizard_id: 2, email: [email protected],     email_top_level_domain: unknown.com}
             description: incorrect email domain
          - {wizard_id: 3, email: badgmail.com,         email_top_level_domain: gmail.com}
             description: no @ symbol
          - {wizard_id: 4, email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
             description: no period

More product/dx refinement needed on the spec. We should be able to add descriptions/labels regardless of which format: is used.

Describe alternatives you've considered

I could just put a large block of text in the description: field of the unit test.

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

@graciegoheen graciegoheen added enhancement New feature or request triage and removed triage labels Dec 13, 2023
@github-actions github-actions bot changed the title [Feature] I want to add a description to each of the rows in my unit test to explicitly call out the edge cases I'm testing for [CT-3497] [Feature] I want to add a description to each of the rows in my unit test to explicitly call out the edge cases I'm testing for Dec 13, 2023
@graciegoheen graciegoheen changed the title [CT-3497] [Feature] I want to add a description to each of the rows in my unit test to explicitly call out the edge cases I'm testing for [CT-3497] I want to add a description/label to each of the rows in my unit test to explicitly call out the edge cases I'm testing for Dec 13, 2023
@dbeatty10
Copy link
Contributor

Good idea about describing each test case 🤩

Adding a description as additional sub-item of each row might be tricky.

With the features that currently exist, here's several different ways to describe individual test cases (none of which I actually tested to confirm if they work or not):

  1. One description to rule them all
  2. YAML comments
  3. Individual unit tests

How do you feel about the pros/cons of each? (Can't say I had the most fun writing out 3. 😂)

One description to rule them all

unit_tests:
  - name: a # this is the unique name of the test
    description: |
      There are four test cases:
      1. valid email
      2. incorrect email domain
      3. no @ symbol
      4. no period
    model: dim_wizards # name of the model I'm unit testing
    given: # the mock data for your inputs
      - input: ref('stg_wizards')
        rows:
          ....

YAML comments

      - input: ref('stg_wizards')
        rows:
          # valid email
          - {wizard_id: 1, email: [email protected],     email_top_level_domain: example.com}
          # incorrect email domain
          - {wizard_id: 2, email: [email protected],     email_top_level_domain: unknown.com}
          # no @ symbol
          - {wizard_id: 3, email: badgmail.com,         email_top_level_domain: gmail.com}
          # no period
          - {wizard_id: 4, email: missingdot@gmailcom,  email_top_level_domain: gmail.com}

Individual unit tests

unit_tests:

  - name: a_valid_email
    description: valid email
    model: dim_wizards
    given:
      - input: ref('stg_wizards')
        rows:
          - {wizard_id: 1, email: [email protected],     email_top_level_domain: example.com}
      - input: ref('top_level_email_domains')
        rows:
          - {tld: example.com}
          - {tld: gmail.com}
      - input: ref('stg_worlds')
        rows:
          - {world_id: 1}
    expect:
      rows:
        - {wizard_id: 1, is_valid_email_address: true}

  - name: a_incorrect_email_domain
    description: incorrect email domain
    model: dim_wizards
    given:
      - input: ref('stg_wizards')
        rows:
          - {wizard_id: 2, email: [email protected],     email_top_level_domain: unknown.com}
      - input: ref('top_level_email_domains')
        rows:
          - {tld: example.com}
          - {tld: gmail.com}
      - input: ref('stg_worlds')
        rows:
          - {world_id: 1}
    expect:
      rows:
        - {wizard_id: 2, is_valid_email_address: false}

  - name: a_no_at_symbol
    description: no @ symbol
    model: dim_wizards
    given:
      - input: ref('stg_wizards')
        rows:
          - {wizard_id: 3, email: badgmail.com,         email_top_level_domain: gmail.com}
      - input: ref('top_level_email_domains')
        rows:
          - {tld: example.com}
          - {tld: gmail.com}
      - input: ref('stg_worlds')
        rows:
          - {world_id: 1}
    expect:
      rows:
        - {wizard_id: 3, is_valid_email_address: false}

  - name: a_no_period
    description: no period
    model: dim_wizards
    given:
      - input: ref('stg_wizards')
        rows:
          - {wizard_id: 4, email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
      - input: ref('top_level_email_domains')
        rows:
          - {tld: example.com}
          - {tld: gmail.com}
      - input: ref('stg_worlds')
        rows:
          - {world_id: 1}
    expect:
      rows:
        - {wizard_id: 4, is_valid_email_address: false}

@alison985
Copy link

FWIW, there would be value in printing the description of the test case in the test output to help with debugging. Individual unit tests aren't DRY. YAML comments wouldn't output when running the test.

Of the three above, I like description best. It may also be the easiest thing to add to test output. It also gives space for longer descriptions. It does mean whoever updates test cases has to remember to update the description though.

This isn't a great idea because it depends on implied order which again a test case updater would have to remember to update, but you could do:

unit_tests:
  - name: a # this is the unique name of the test
    model: dim_wizards # name of the model I'm unit testing
    given: # the mock data for your inputs
      - input: ref('stg_wizards')
        rows:
          - {wizard_id: 1, email: [email protected],     email_top_level_domain: example.com}
          - {wizard_id: 2, email: [email protected],     email_top_level_domain: unknown.com}
          - {wizard_id: 3, email: badgmail.com,         email_top_level_domain: gmail.com}
          - {wizard_id: 4, email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
        description:
          - "valid email"
          - "incorrect email domain"
          - "no @ symbol"
          - "no period"
      - input: ref('top_level_email_domains')
        rows:
          - {tld: example.com}
          - {tld: gmail.com}
      - input: ref('stg_worlds')
        rows:
          - {world_id: 1}
    expect: # the expected output given the inputs above
      rows:
        - {wizard_id: 1, is_valid_email_address: true}
        - {wizard_id: 2, is_valid_email_address: false}
        - {wizard_id: 3, is_valid_email_address: false}
        - {wizard_id: 4, is_valid_email_address: false}

The following is probably slightly better from a developer user experience standpoint and an avoiding bugs based on implied order standpoint. However, it may be worse if it performs more queries or depending on how the last element here would have to flow. I have no knowledge of unit_tests outside of this thread to be able to guess.

    expect: # the expected output given the inputs above
      rows:
        - {wizard_id: 1, is_valid_email_address: true, 'valid email'}
        - {wizard_id: 2, is_valid_email_address: false, 'incorrect email domain'}
        - {wizard_id: 3, is_valid_email_address: false, 'no @ symbol'}
        - {wizard_id: 4, is_valid_email_address: false, 'no period'}

@dbeatty10 dbeatty10 added the unit tests Issues related to built-in dbt unit testing functionality label Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request unit tests Issues related to built-in dbt unit testing functionality
Projects
None yet
Development

No branches or pull requests

3 participants