-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-1590] [Feature] Improving dbt selection options #6365
Comments
@AGPapa This is a cool idea! Thanks for a great write-up and including that diagram and walk-through -- it was crucial for me to be able to walk through selections of the DAG successfully along with you 🙌 Does the following additional markup faithfully represent the example you gave for multi-parent tests? Are we interested in your feature?I think so! We've given signals in the past that we're not stuck on bi-modal behavior and indeed open to a continuum:
If we go with your initially-proposed verbiage, we'd have the following continuum:
Finalizing namingCan you imagine any other selection behavior? Depending on if we can think of any other relevant modes that are somewhere between the existing |
@dbeatty10 I think I agree with your take here! Thanks for making the diagram above — just want to confirm the details:
I would be interested in a more precise name here, if we can manage one. "Eager with parents, cautious with children"? Given the extensible enum for |
Yes that is correct! On the naming issue - I also struggled to pick a good name. I like the idea of choosing something more precise. |
Adding I think we can move forward with functional/code review of the PR in the meantime; we should just wait to merge until we have a name we're happy with. |
Agreed that it would be nice to come up with a descriptive name. Can't say that any of the proposals so far are easy for me to quickly understand. A brainstorm for the verbiage for selecting multi-parent tests as assertively as possible:
Do any of these spark joy or serve as kindling for better ideas? |
Of the options above, 'buildable' feels like the most concise description of what expectations a user could have with this selection option, and also declaratively describes the selection methodology itself: dbt selects test nodes that can are runnable (doable?) assuming the model itself is buildable. Maybe there's a more precise word there than 'buildable' but I like the direction! |
Thanks for your feedback @MichelleArk, especially given your prior experiences dealing with complicated selection criteria! 🧠 My understanding is that the new mode represents the maximal selection of nodes "guaranteed" to be ready and available during Let's talk more about the name Q: Which multi-parent test nodes will be selected? How do you feel about the following continuum @jtcohen6 and @AGPapa?
|
I'm on board with |
I like it! It takes a minute to internalize the meaning, but once done, it's declarative & memorable. We'll want to clearly document it here; including a diagram like the one above would be particularly helpful, not just for the new Removing |
Opened up an issue here: @AGPapa if you already opened up an issue in the dbt-labs/docs.getdbt.com, just let me know and we can merge them. |
Hey @dbeatty10 - is there anything else needed from me for this to move forward? I know things tend to slow down this time of year, should I just wait until January for feedback on the code changes? |
It will probably be January until one of the engineers will be able to provide feedback on your PR. One thing you could do between now and then is to run |
@iknox-fa we might need to port this selection argument to new CLI? |
Is this your first time submitting a feature request?
Describe the feature
Our organization's code typically has tests that compare models with their dependencies. (Ex: if Model A contains $1 million in transactions, and Model B transforms that data, we should still have $1 million in transactions).
When building only part of the graph the existing selection modes do not provide good options for running the tests. For example, let's say we're making a change to Model B and want to make sure that it is working correctly. We run
dbt build -s model_b
The "eager" mode will run Model B, Test AB and Test CB. But Test CB is likely to fail because we haven't built Model C yet!
The "cautious" mode will run Model B and no tests - so we're not sure if the change worked or not.
Instead we'd like a selection mode that will run only Model B and Test AB. Test AB should pass because Model A was required to have been already been built to run Model B.
In fact - the existing cautious mode will not run Test AB even if Model A is a source! Sources don't need to be built at all, so that test should always run.
Describe alternatives you've considered
Instead of adding a new selection mode we might want to consider changing the behavior of the cautious mode. I'm hesitant to suggest changing existing behavior - but I believe the suggested new mode would be preferable to cautious mode, and having only two modes is easier for users to understand. At the least cautious mode should be updated to allow tests that reference sources.
Who will this benefit?
This will benefit developers who are testing changes on a subset of the graph, rather than building the entire graph.
Are you interested in contributing this feature?
Yes - I have code changes working that add this new mode and am willing to make updates to it based on feedback.
Anything else?
No response
The text was updated successfully, but these errors were encountered: