-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: dbt compare subcommand #1135
Comments
dbt compare
subcommand
@mikekaminsky I think I'm into this idea. It would be convenient if the command would output a list of statements that could be run to drop the tables too. I think this would work really well as an operation. Maybe those should have access to the For anyone else coming to this thread: there are heaps of ways that dbt could get this "wrong". If a user forgets to run Thanks for the request @mikekaminsky! |
My 2 cents: would be useful to have a config setting per model/model folder how dbt should behave in regards to synching the list of models and the list of db objects. E.g. value |
If dbt is going to provide features like this, then there needs to be something other than console output. |
Stumbled onto this after the tenth time of us deleting models but forgetting to drop them in the dbt-managed schema. It would be helpful to have a command like this, maybe |
Thanks for the bump @tayloramurphy - I feel better about building this these days than I have in the past! The thing that's changed is that dbt's core constructs have coalesced around a handful of resources (seeds, models, snapshots). I actually think the obviously better version of this (one which both enumerates the relations, but also supports dropping the deleted ones) is within our reach too :) My one hesitation is that adding a new subcommand ( You buy all of that? |
@drewbanin for sure I buy that! I don't have a strong opinion that this should be a separate command, I'm just not knowledgeable enough about the internal workings to know if it would make sense as a flag against run/test or via some other implementation. The key thing for me on this one is, if we make the assumption that dbt is an infra as code tool where we define what the state of our warehouse should be, then there needs to be a way to sync the warehouse to what the code says should exist. Basically - here's what the information schema should say exists, here's what the information schema does say exists, and here's how to sync them. |
I'd be for pulling https://github.com/mikekaminsky/dbt-helper/blob/master/core/compare.py |
In thinking about this for an upcoming sprint, I agree with the comments above that:
To my mind, one way we could accomplish the dbt ls --orphaned # list database relations that do not map to current dbt models
dbt ls --orphaned --execute-drop --dry-run # list DDL dbt will run
dbt ls --orphaned --execute-drop # execute drop statements I admit that this is different behavior from current |
For what it's worth, |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
I'm still interested, any chance this feature might get it's big break? |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers. |
Add a
compare
subcommand for identifying stale/unmanaged relations.Feature description
Similar to #615, but maybe less objectionable!
dbt compare
should inspect the code repository and determine which views and tables are described therein. DBT should compare those relations to the system tables, and produce some output identifying any discrepancies. Something likeWho will this benefit?
This feature is useful for warehouse admins who want to identify and remove stale relations.
I frequently forget to clean up after myself if I've done some refactoring (e.g., renaming models or deleting old models that have been superseded by others). Since DBT has all of the requisite information to tell me about this, it seems like the right tool for the job.
Risks
DBT probably shouldn't be in the business of dropping relations, and there's some risk that this feature could be mis-used or the output misinterpreted (in particular, for people who have complex schema-renaming rules for different targets). @drewbanin can add some more detailed thoughts about where this might go wrong.
The text was updated successfully, but these errors were encountered: