-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalized dbt build
command
#2743
Comments
For me to verify: the goal of this is to: |
That's the idea. And since snapshots can also participate in the middle of a DAG ( |
@jtcohen6 are we thinking about making tests participate more naturally in the DAG here? Ie:
If Edit: basically just curious if #2234 is a part of this or a separate issue |
@drewbanin I'm thinking that's in scope here, yes. This command would be a more generalized version of the Both |
This exactly is the reason I was asking for clarification on the WHAT of this to ticket. |
Hello, just to mention this post in 2234 : #2234 (comment). |
@jtcohen6 I have been stuck on this idea that I just cannot shake! Wanted to mention it here. IF:
THEN:
I think there's some more formality / rigor to apply here, and I'm actually not 100% sure that this requires the existence of a To get more concrete, here are some of the examples I'm considering:
A table/incremental model only needs to be built when:
I think that we can get at a lot of this stuff with the
|
@drewbanin That's a really neat thought. We've talked about some kind of Node selection also doesn't have a clear conception of "already exists in the database," since that tends to live in the materialization logic, but the |
@drewbanin, teams that manage grants on objects via post-hooks may also want to re-build a model when a config (as opposed to just sql logic) changes. Our team has recently moved post-hook definitions from EDIT: it looks like the state method accounts for configs in general, but I'm not 100% sure that post-hooks are among them. |
This Slack thread from Nadya Hrebinka asked "is there a way to [...] combine commands into one to save a couple of minutes?". My mind went to this GitHub issue. Could Nadya and others experiencing long dbt project parse times use the |
Yes! By operating over multiple resource types in a single invocation, |
I just discovered that this command would be mostly answering the problems we have had with minor issues in base models blocking the whole project from running. I would imagine that in perfect situation I would have:
This way an error in a node, either in running or testing, would only block the models that are downstream from it. |
@kosti-hokkanen-supermetrics Cool to hear what you're hoping to do with it! The first cut of
That said, I believe all the right constructs are there. I bet you can combine several |
See also: #1054, #1227, #2234, this comment
Describe the feature
Each dbt node-resource type has a task-command associated with it:
dbt run
dbt test
dbt seed
dbt snapshot
dbt source snapshot-freshness
Additionally, there could be a generalized command
dbt build
1 that would step through a DAG of multiple resource types and "build" them accordingly.What would this look like? I imagine an argument syntax similar to
dbt ls
, i.e.1 name subject to change, though for the ultimate command of the data build tool, it'd be hard to think of one more apropos...
Example
Let's imagine we had
model_a
that depends on a source (my_source.table
) and a seed (my_seed
), a snapshot (my_snapshot
) ofmodel_a
, and thenmodel_b
which selected frommy_snapshot
. Of course, we also have tests on many of them. Roughly:Within a single invocation,
dbt build
would go through motions analogous to running the following dbt commands. It would only proceed to the next numerical steps if all upstream steps succeed:1a.
dbt seed my_seed
1b.
dbt source snapshot-freshness --select my_source.table
2a.
dbt test --models my_seed
2b.
dbt test --models source:my_source.table
3.
dbt run --models model_a
4.
dbt test --models model_a
5.
dbt snapshot --select my_snapshot
6.
dbt test --models my_snapshot
7.
dbt run --models model_b
8.
dbt test --models model_b
Complexities
run
,test
,snapshot
), some are not (seed
,snapshot-freshness
)dbt run --full-refresh
vs.dbt seed --full-refresh
dbt test
almost feels like an exception. Technically,dbt test
operations on test nodes, but other node types can be passed into its selection syntax, with selector expansion as the last step, so it "feels" like you're testing a model or a snapshot. (Edit: this behavior may someday change.)dbt build
? Should we be weary of creating one command to rule them all?Describe alternatives you've considered
dbt run+test
(as outlined in linked issues)Who will this benefit?
The text was updated successfully, but these errors were encountered: