-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds dbt bootstrap subcommand #1238
Adds dbt bootstrap subcommand #1238
Conversation
* Introduces new dependency on oyaml * Needs tests * Works on postgres
Started working on adding tests for this and am blocked because I'm not sure how to add the new dependency into the docker / tox build (so far I've tried to keep my nose out of the part of the dbt build world). Currently can't get the tests to run at all because they die on |
Hey @mikekaminsky - thanks for making this PR! This is an interesting and novel feature for dbt. Whereas dbt typically only runs the code defined in a dbt project, this will make dbt generate code! That's cool, powerful, and oft-requested to be sure. At the same time, so much of the complexity involved here is not technical in nature, but instead is a reflection of all the variability in how dbt users write and run their code. There are a lot of questions to answer about how this should work:
These are all important questions, and they're the kinds of things I consider when speccing out new features. The good news is, I think we're well suited to answer them :) Can you pause on this PR so we can discuss some of the details? This is the first foray into a whole new class of functionality in dbt, so I want to be super sure that we get it right! Finally, while I'm super sold on the benefit of functionality like this, I'm not fully convinced that the code should live inside of dbt. Another way for this to work is by exposing an API that external scripts can consume. That would make it possible/easy to build whole suites of tools that specialize in exactly this type of code generation. Further, those types of tools can be iterated on with much greater frequency than we intend practice with dbt. For my part, I'll have a deep think about where code like this belongs. Super happy to discuss when you have the time! |
@drewbanin some quick hits on the easy questions
I'm not sure this is entirely true!
Maybe(?) I'm less convinced that this is super critical. One of the nice things about this feature is that it works with models you've created in DBT (so you're working on the code in your new This isn't just for getting started with "upstream" source tables. Edit (2019-01-14 5:55pm): To be clear, if we want DBT to be able to add the base models, I think that would go in a separate sub-command.
I'm not sure what you're asking here -- do you just want to change the subcommand name from
Fair point. I took a second to look around for a variable that would have the right file location in it but I couldn't find anything. Happy to update if we have a better way of identifying that location?
I punted on this. Dealing with trying to update the
That's what I've wanted to do in the past (and that's how I wrote the GH issue). Would be really easy to add a As to whether or not this should live in DBT ... that's obviously a tough question. I think yes, because this functionality is really tightly coupled with the particulars of DBT and fits well into the DBT workflow. Maybe what you're suggesting is that you want to factor out the CLI / workflow components out of DBT and really have "DBT" only be the core model-running / testing code kernel. That's an interesting idea, but a pretty big departure from the paradigm today. If you wanted to move that direction, you'd probably pull Without doing extensive reflection, I'm not sure that's the right way to go in so far as it requires analysts / DBAs to learn two different tools for working with the database. However, maybe it's better to pull this apart sooner-rather-than-later in the interest of unix-style do-one-thing-really-well CLIs. HMU on slack if you want to chat! |
Thanks for the comments - it's super clear to me that you gave this a lot of thought! I didn't enumerate my questions to pick apart your PR -- I'm super aware that this is a WIP and happy to work together on the specifics once we get the overall design sorted. Rather, I wanted to give you a sense for how I evaluate the complexity of features and indicate that I think the hard part of this PR is around UX and not technical feasibility. I think the only point I'd contest from your response is:
I actually don't think this is true, and it's a super important distinction (and the basis for my current opinion on this feature!). The The operative difference is that people have strong opinions on how source code should be written, whereas they tend to be less opinionated on things like compiled assets. Whereas no one has strong feelings about the mess of compiled code that's in the rendered It's good and reasonable that different teams have different preferences about how to structure these things, but I'm averse to the idea that dbt should 1) implement an opinion on the matter and 2) be the arbiter of that stylistic decision. This is in the class of PRs that I'm interested in reviewing exactly once, but I can totally imagine folks adding flags left and right for minor updates to this command's functionality. So: that's the big problem. I think that it's good and reasonable for folks to want to tweak this feature to their needs, but I'm opposed to having them implement those stylistic tweaks via dbt's PR process. Ok, so how do we proceed from here? I want to paint a different picture of how this feature could be implemented that's more closely aligned with the core feature set of dbt. Imagine you created a macro that can generate a schema.yml spec for a given model, and also imagine that dbt had some mechanism to invoke macros dynamically from the command line, passing along any supplied CLI flags as macro arguments. That would make it possible to implement this feature, as well as tons of other related features, without actually modifying dbt code. It would be pretty easy to extend this macro to sources or even to base models. Macros like these could be wrapped up in dbt packages, and folks could tweak them to suit their precise needs without needing to fork dbt. To be sure, there are still some challenges associated with making dbt do exactly the thing described here, but these changes are very much directionally aligned with the changes we want to make to dbt long-term. As such, I find them way more compelling than one-off features that accomplish the same end goal. Whereas the operation + macro approach represents a doubling down on dbt's position as a code compiler, a new top-level command essentially becomes a maintenance burden! I just threw a whole lot at you. I'm very curious to hear what you think about all of this both in regards to 1) the implementation of this particular feature and 2) the types of features that I think are well-suited to live in dbt. If you buy the approach, then I'm super happy to spec out the work that needs to happen for us to get there. A meta-pointAs dbt grows in popularity and complexity, it's going to be increasingly important for the Fishtown team to identify which issues we think can be picked up right away, and which ones should probably be discussed further before beginning implementation. I'm super glad you've been contributing code to dbt, and I definitely don't want to discourage that kind of behavior! For my part, I'll comb over our outstanding issues and tag the gnarlier ones with a |
Closing this (potentially temporarily) while thinking through the future of dbt compare and dbt bootstrap (#1217) |
Addresses #1082
Example: