Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1590] [Feature] Improving dbt selection options #6365

Closed
3 tasks done
AGPapa opened this issue Dec 2, 2022 · 13 comments · Fixed by #6366
Closed
3 tasks done

[CT-1590] [Feature] Improving dbt selection options #6365

AGPapa opened this issue Dec 2, 2022 · 13 comments · Fixed by #6366
Assignees
Labels
dbt tests Issues related to built-in dbt testing functionality enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors node selection Functionality and syntax for selecting DAG nodes
Milestone

Comments

@AGPapa
Copy link
Contributor

AGPapa commented Dec 2, 2022

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Our organization's code typically has tests that compare models with their dependencies. (Ex: if Model A contains $1 million in transactions, and Model B transforms that data, we should still have $1 million in transactions).

image

When building only part of the graph the existing selection modes do not provide good options for running the tests. For example, let's say we're making a change to Model B and want to make sure that it is working correctly. We run dbt build -s model_b

The "eager" mode will run Model B, Test AB and Test CB. But Test CB is likely to fail because we haven't built Model C yet!
The "cautious" mode will run Model B and no tests - so we're not sure if the change worked or not.

Instead we'd like a selection mode that will run only Model B and Test AB. Test AB should pass because Model A was required to have been already been built to run Model B.

In fact - the existing cautious mode will not run Test AB even if Model A is a source! Sources don't need to be built at all, so that test should always run.

Describe alternatives you've considered

Instead of adding a new selection mode we might want to consider changing the behavior of the cautious mode. I'm hesitant to suggest changing existing behavior - but I believe the suggested new mode would be preferable to cautious mode, and having only two modes is easier for users to understand. At the least cautious mode should be updated to allow tests that reference sources.

Who will this benefit?

This will benefit developers who are testing changes on a subset of the graph, rather than building the entire graph.

Are you interested in contributing this feature?

Yes - I have code changes working that add this new mode and am willing to make updates to it based on feedback.

Anything else?

No response

@AGPapa AGPapa added enhancement New feature or request triage labels Dec 2, 2022
@github-actions github-actions bot changed the title [Feature] Improving dbt selection options [CT-1590] [Feature] Improving dbt selection options Dec 2, 2022
@dbeatty10
Copy link
Contributor

@AGPapa This is a cool idea!

Thanks for a great write-up and including that diagram and walk-through -- it was crucial for me to be able to walk through selections of the DAG successfully along with you 🙌

Does the following additional markup faithfully represent the example you gave for multi-parent tests?

image

Are we interested in your feature?

I think so! We've given signals in the past that we're not stuck on bi-modal behavior and indeed open to a continuum:

It will come as no surprise that I think selection is really important, especially as projects get bigger, and complementary tools like deferral/cloning get more powerful. The right answer, ultimately, might be to support multiple configurations, or gradations of eagerness ¹

if we envision a future where there could be other modes ²

If we go with your initially-proposed verbiage, we'd have the following continuum:

  1. cautious
  2. semicautious [NEW]
  3. eager

Finalizing naming

Can you imagine any other selection behavior? Depending on if we can think of any other relevant modes that are somewhere between the existing cautious mode and your proposed semicautious, we may want to rename yours to semieager instead.

@jtcohen6 jtcohen6 added dbt tests Issues related to built-in dbt testing functionality node selection Functionality and syntax for selecting DAG nodes labels Dec 12, 2022
@jtcohen6
Copy link
Contributor

@dbeatty10 I think I agree with your take here!

Thanks for making the diagram above — just want to confirm the details:

  • model_b would be selected in all three modes
  • test_ab would be selected by "eager" + "semi-cautious" (the new one proposed here), but not by cautious
  • test_bc would be selected by "eager" only

semicautious
semieager

I would be interested in a more precise name here, if we can manage one. "Eager with parents, cautious with children"?

Given the extensible enum for IndirectSelection already, and the already open PR, I'm also going to label this one as help_wanted.

@jtcohen6 jtcohen6 added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors and removed awaiting_response labels Dec 12, 2022
@AGPapa
Copy link
Contributor Author

AGPapa commented Dec 12, 2022

model_b would be selected in all three modes
test_ab would be selected by "eager" + "semi-cautious" (the new one proposed here), but not by cautious
test_bc would be selected by "eager" only

Yes that is correct!

On the naming issue - I also struggled to pick a good name. I like the idea of choosing something more precise.

@jtcohen6 jtcohen6 added the Refinement Maintainer input needed label Dec 12, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 12, 2022

Adding Refinement for a better name! Doug, any ideas? :)

I think we can move forward with functional/code review of the PR in the meantime; we should just wait to merge until we have a name we're happy with.

@dbeatty10
Copy link
Contributor

I would be interested in a more precise name here, if we can manage one. "Eager with parents, cautious with children"?

I heard once that the only thing that's easy in computer science is naming. Oh, er, forget that I said that!

Agreed that it would be nice to come up with a descriptive name. Can't say that any of the proposals so far are easy for me to quickly understand.

A brainstorm for the verbiage for selecting multi-parent tests as assertively as possible:

  • semi-cautious
  • semi-eager
  • eager with parents, cautious with children
  • goldilocks
  • best
  • balanced
  • reliable
  • dependable
  • buildable
  • doable
  • safe
  • attested
  • full consensus
  • full quorum
  • unanimous quorum
  • assertive
  • min-max

Do any of these spark joy or serve as kindling for better ideas?

@MichelleArk
Copy link
Contributor

Of the options above, 'buildable' feels like the most concise description of what expectations a user could have with this selection option, and also declaratively describes the selection methodology itself: dbt selects test nodes that can are runnable (doable?) assuming the model itself is buildable. Maybe there's a more precise word there than 'buildable' but I like the direction!

@dbeatty10
Copy link
Contributor

Thanks for your feedback @MichelleArk, especially given your prior experiences dealing with complicated selection criteria! 🧠

My understanding is that the new mode represents the maximal selection of nodes "guaranteed" to be ready and available during dbt build. Both assertive and bold, it's like a golden mean between the timidness of cautious and the risky impulses of eager.

Let's talk more about the name buildable. Through a proscriptive/imperative lens, it doesn't have the precision of "eager with parents, cautious with children". But through a descriptive/declarative lens, maybe we're on the right track 🤔

Q: Which multi-parent test nodes will be selected?
A: All the buildable ones!

How do you feel about the following continuum @jtcohen6 and @AGPapa? buildable or an alternative strike your fancy? Or shall we go back to the drawing board?

  1. cautious
  2. buildable [NEW]
  3. eager

@AGPapa
Copy link
Contributor Author

AGPapa commented Dec 12, 2022

I'm on board with buildable or something similar

@dbeatty10 dbeatty10 self-assigned this Dec 12, 2022
@jtcohen6
Copy link
Contributor

I like it! It takes a minute to internalize the meaning, but once done, it's declarative & memorable.

We'll want to clearly document it here; including a diagram like the one above would be particularly helpful, not just for the new buildable mode, but to illustrate the existing ones as well.

Removing Refinement, as I think this is good to go forward.

@dbeatty10
Copy link
Contributor

We'll want to clearly document it here; including a diagram like the one above would be particularly helpful, not just for the new buildable mode, but to illustrate the existing ones as well.

Opened up an issue here:

@AGPapa if you already opened up an issue in the dbt-labs/docs.getdbt.com, just let me know and we can merge them.

@AGPapa
Copy link
Contributor Author

AGPapa commented Dec 22, 2022

Hey @dbeatty10 - is there anything else needed from me for this to move forward? I know things tend to slow down this time of year, should I just wait until January for feedback on the code changes?

@dbeatty10
Copy link
Contributor

It will probably be January until one of the engineers will be able to provide feedback on your PR.

One thing you could do between now and then is to run changie new to enter the details for the changelog entry. Running changie will require installing the development dependencies in your local environment. I'll add this note to the PR as well.

@ChenyuLInx
Copy link
Contributor

@iknox-fa we might need to port this selection argument to new CLI?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dbt tests Issues related to built-in dbt testing functionality enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors node selection Functionality and syntax for selecting DAG nodes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants