[CT-1590] [Feature] Improving dbt selection options #6365

AGPapa · 2022-12-02T15:58:31Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Our organization's code typically has tests that compare models with their dependencies. (Ex: if Model A contains $1 million in transactions, and Model B transforms that data, we should still have $1 million in transactions).

When building only part of the graph the existing selection modes do not provide good options for running the tests. For example, let's say we're making a change to Model B and want to make sure that it is working correctly. We run dbt build -s model_b

The "eager" mode will run Model B, Test AB and Test CB. But Test CB is likely to fail because we haven't built Model C yet!
The "cautious" mode will run Model B and no tests - so we're not sure if the change worked or not.

Instead we'd like a selection mode that will run only Model B and Test AB. Test AB should pass because Model A was required to have been already been built to run Model B.

In fact - the existing cautious mode will not run Test AB even if Model A is a source! Sources don't need to be built at all, so that test should always run.

Describe alternatives you've considered

Instead of adding a new selection mode we might want to consider changing the behavior of the cautious mode. I'm hesitant to suggest changing existing behavior - but I believe the suggested new mode would be preferable to cautious mode, and having only two modes is easier for users to understand. At the least cautious mode should be updated to allow tests that reference sources.

Who will this benefit?

This will benefit developers who are testing changes on a subset of the graph, rather than building the entire graph.

Are you interested in contributing this feature?

Yes - I have code changes working that add this new mode and am willing to make updates to it based on feedback.

Anything else?

No response

The text was updated successfully, but these errors were encountered:

dbeatty10 · 2022-12-10T01:07:47Z

@AGPapa This is a cool idea!

Thanks for a great write-up and including that diagram and walk-through -- it was crucial for me to be able to walk through selections of the DAG successfully along with you 🙌

Does the following additional markup faithfully represent the example you gave for multi-parent tests?

Are we interested in your feature?

I think so! We've given signals in the past that we're not stuck on bi-modal behavior and indeed open to a continuum:

It will come as no surprise that I think selection is really important, especially as projects get bigger, and complementary tools like deferral/cloning get more powerful. The right answer, ultimately, might be to support multiple configurations, or gradations of eagerness ¹

if we envision a future where there could be other modes ²

If we go with your initially-proposed verbiage, we'd have the following continuum:

cautious
semicautious [NEW]
eager

Finalizing naming

Can you imagine any other selection behavior? Depending on if we can think of any other relevant modes that are somewhere between the existing cautious mode and your proposed semicautious, we may want to rename yours to semieager instead.

jtcohen6 · 2022-12-12T17:22:12Z

@dbeatty10 I think I agree with your take here!

Thanks for making the diagram above — just want to confirm the details:

model_b would be selected in all three modes
test_ab would be selected by "eager" + "semi-cautious" (the new one proposed here), but not by cautious
test_bc would be selected by "eager" only

semicautious
semieager

I would be interested in a more precise name here, if we can manage one. "Eager with parents, cautious with children"?

Given the extensible enum for IndirectSelection already, and the already open PR, I'm also going to label this one as help_wanted.

AGPapa · 2022-12-12T17:45:23Z

model_b would be selected in all three modes
test_ab would be selected by "eager" + "semi-cautious" (the new one proposed here), but not by cautious
test_bc would be selected by "eager" only

Yes that is correct!

On the naming issue - I also struggled to pick a good name. I like the idea of choosing something more precise.

jtcohen6 · 2022-12-12T17:46:04Z

Adding Refinement for a better name! Doug, any ideas? :)

I think we can move forward with functional/code review of the PR in the meantime; we should just wait to merge until we have a name we're happy with.

dbeatty10 · 2022-12-12T17:57:04Z

I would be interested in a more precise name here, if we can manage one. "Eager with parents, cautious with children"?

~~I heard once that the only thing that's easy in computer science is naming.~~ Oh, er, forget that I said that!

Agreed that it would be nice to come up with a descriptive name. Can't say that any of the proposals so far are easy for me to quickly understand.

A brainstorm for the verbiage for selecting multi-parent tests as assertively as possible:

semi-cautious
semi-eager
eager with parents, cautious with children
goldilocks
best
balanced
reliable
dependable
buildable
doable
safe
attested
full consensus
full quorum
unanimous quorum
assertive
min-max

Do any of these spark joy or serve as kindling for better ideas?

MichelleArk · 2022-12-12T19:57:08Z

Of the options above, 'buildable' feels like the most concise description of what expectations a user could have with this selection option, and also declaratively describes the selection methodology itself: dbt selects test nodes that can are runnable (doable?) assuming the model itself is buildable. Maybe there's a more precise word there than 'buildable' but I like the direction!

dbeatty10 · 2022-12-12T21:14:44Z

Thanks for your feedback @MichelleArk, especially given your prior experiences dealing with complicated selection criteria! 🧠

My understanding is that the new mode represents the maximal selection of nodes "guaranteed" to be ready and available during dbt build. Both assertive and bold, it's like a golden mean between the timidness of cautious and the risky impulses of eager.

Let's talk more about the name buildable. Through a proscriptive/imperative lens, it doesn't have the precision of "eager with parents, cautious with children". But through a descriptive/declarative lens, maybe we're on the right track 🤔

Q: Which multi-parent test nodes will be selected?
A: All the buildable ones!

How do you feel about the following continuum @jtcohen6 and @AGPapa? buildable or an alternative strike your fancy? Or shall we go back to the drawing board?

cautious
buildable [NEW]
eager

AGPapa · 2022-12-12T21:34:11Z

I'm on board with buildable or something similar

jtcohen6 · 2022-12-13T11:46:58Z

I like it! It takes a minute to internalize the meaning, but once done, it's declarative & memorable.

We'll want to clearly document it here; including a diagram like the one above would be particularly helpful, not just for the new buildable mode, but to illustrate the existing ones as well.

Removing Refinement, as I think this is good to go forward.

dbeatty10 · 2022-12-13T14:41:40Z

We'll want to clearly document it here; including a diagram like the one above would be particularly helpful, not just for the new buildable mode, but to illustrate the existing ones as well.

Opened up an issue here:

buildable indirect selection mode docs.getdbt.com#2568

@AGPapa if you already opened up an issue in the dbt-labs/docs.getdbt.com, just let me know and we can merge them.

AGPapa · 2022-12-22T20:27:35Z

Hey @dbeatty10 - is there anything else needed from me for this to move forward? I know things tend to slow down this time of year, should I just wait until January for feedback on the code changes?

dbeatty10 · 2022-12-22T21:45:01Z

It will probably be January until one of the engineers will be able to provide feedback on your PR.

One thing you could do between now and then is to run changie new to enter the details for the changelog entry. Running changie will require installing the development dependencies in your local environment. I'll add this note to the PR as well.

ChenyuLInx · 2023-01-10T21:52:19Z

@iknox-fa we might need to port this selection argument to new CLI?

AGPapa added enhancement New feature or request triage labels Dec 2, 2022

github-actions bot changed the title ~~[Feature] Improving dbt selection options~~ [CT-1590] [Feature] Improving dbt selection options Dec 2, 2022

AGPapa mentioned this issue Dec 2, 2022

Adds 'buildable' selection mode #6366

Merged

6 tasks

dbeatty10 added awaiting_response and removed triage labels Dec 10, 2022

jtcohen6 added dbt tests Issues related to built-in dbt testing functionality node selection Functionality and syntax for selecting DAG nodes labels Dec 12, 2022

jtcohen6 added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors and removed awaiting_response labels Dec 12, 2022

github-actions bot added the triage label Dec 12, 2022

jtcohen6 added awaiting_response and removed triage labels Dec 12, 2022

github-actions bot added triage and removed awaiting_response labels Dec 12, 2022

jtcohen6 added the Refinement Maintainer input needed label Dec 12, 2022

dbeatty10 self-assigned this Dec 12, 2022

jtcohen6 removed triage Refinement Maintainer input needed labels Dec 13, 2022

dbeatty10 mentioned this issue Dec 13, 2022

buildable indirect selection mode dbt-labs/docs.getdbt.com#2568

Closed

1 task

ChenyuLInx closed this as completed in #6366 Jan 10, 2023

jtcohen6 added this to the v1.4 milestone Jan 11, 2023

dbeatty10 mentioned this issue May 21, 2023

Add diagram(s) to explain different indirect selection modes dbt-labs/docs.getdbt.com#3403

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-1590] [Feature] Improving dbt selection options #6365

[CT-1590] [Feature] Improving dbt selection options #6365

AGPapa commented Dec 2, 2022 •

edited

Loading

dbeatty10 commented Dec 10, 2022

jtcohen6 commented Dec 12, 2022

AGPapa commented Dec 12, 2022

jtcohen6 commented Dec 12, 2022 •

edited

Loading

dbeatty10 commented Dec 12, 2022

MichelleArk commented Dec 12, 2022

dbeatty10 commented Dec 12, 2022

AGPapa commented Dec 12, 2022

jtcohen6 commented Dec 13, 2022

dbeatty10 commented Dec 13, 2022

AGPapa commented Dec 22, 2022

dbeatty10 commented Dec 22, 2022

ChenyuLInx commented Jan 10, 2023

[CT-1590] [Feature] Improving dbt selection options #6365

[CT-1590] [Feature] Improving dbt selection options #6365

Comments

AGPapa commented Dec 2, 2022 • edited Loading

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

dbeatty10 commented Dec 10, 2022

Are we interested in your feature?

Finalizing naming

jtcohen6 commented Dec 12, 2022

AGPapa commented Dec 12, 2022

jtcohen6 commented Dec 12, 2022 • edited Loading

dbeatty10 commented Dec 12, 2022

MichelleArk commented Dec 12, 2022

dbeatty10 commented Dec 12, 2022

AGPapa commented Dec 12, 2022

jtcohen6 commented Dec 13, 2022

dbeatty10 commented Dec 13, 2022

AGPapa commented Dec 22, 2022

dbeatty10 commented Dec 22, 2022

ChenyuLInx commented Jan 10, 2023

AGPapa commented Dec 2, 2022 •

edited

Loading

jtcohen6 commented Dec 12, 2022 •

edited

Loading