Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1698] [Feature] dbt should default to generate_schema_name_for_env behavior on new projects #6469

Closed
3 tasks done
PedramNavid opened this issue Dec 20, 2022 · 5 comments
Closed
3 tasks done
Labels
discussion enhancement New feature or request paper_cut A small change that impacts lots of users in their day-to-day Refinement Maintainer input needed stale Issues that have gone stale

Comments

@PedramNavid
Copy link

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

I have setup many a dbt project, and in every single one, one thing I always end up doing is adding the following macro to macros/get_custom_schema.sql.

{% macro generate_schema_name(custom_schema_name, node) -%}
    {{ generate_schema_name_for_env(custom_schema_name, node) }}
{%- endmacro %}

My understanding is that this overrides the default behavior found here.

This pattern comes from dbt's own docs:

https://docs.getdbt.com/docs/build/custom-schemas#an-alternative-pattern-for-generating-schema-names.

I would like to propose that this pattern be the default for new projects. The simplest method is to add the macro to the starter_project, which will prevent any breaking changes.

Describe alternatives you've considered

  • Manually adding this macro to my own project every time (not great)
  • Adding a command-line option to let the users decide whether or not they want this feature (I think this hurts the new user experience. I am just learning dbt and you want me to decide what exactly??)
  • Updating the global template and potentially breaking things for everyone (I do like this approach but that has more to do with my destructive tendencies than any reasonable approach to things)

Who will this benefit?

Both experienced dbt users who have to remember to add this functionality and new dbt users who end up with less-than-ideal first-user-experiences.

I do think this change also ties nicely into #1958

Are you interested in contributing this feature?

Absolutely

Anything else?

I polled 10000 users and they all said they were in favor of this change.

image

@PedramNavid PedramNavid added enhancement New feature or request triage labels Dec 20, 2022
@github-actions github-actions bot changed the title [Feature] dbt should default to generate_schema_name_for_env behavior on new projects [CT-1698] [Feature] dbt should default to generate_schema_name_for_env behavior on new projects Dec 20, 2022
@jtcohen6 jtcohen6 self-assigned this Jan 2, 2023
@jtcohen6
Copy link
Contributor

jtcohen6 commented Jan 6, 2023

@PedramNavid Thanks for opening this issue — it was a fun read. Sorry for the delayed response, & hope you had a nice holiday break :)

I've had a thought similar to yours before: Why do our very own docs present this as a clearly recommended (even preferable!) pattern, without actually making it the default?

The new-ish user experience

It's worth saying that this decision was made a long long time ago (though I can still remember how that music used to make me smile), and it's remained unchanged since then. That's not a reason to keep it around forever—though perhaps it is reason enough to wait until dbt v2—but I also appreciated the prompt to reread the description in Drew's PR from >5 years ago: #522

By default, this macro just concatenates the schema defined in the active profile with the schema defined in the model config, if any.
To override this behavior, users can create a macro in their own project called generate_schema_name.
A more useful strategy uses the profile schema in development, and the custom model schema in production.

I stand by this syllogism today.

I think it would be more confusing to try adding a custom schema config for the first time, in development, and then see it have ... no result, the same as before. That's what would happen, if generate_schema_name_for_env were the out-of-the-box default. On the flip side, the most obvious default behavior—just use the schema as the schema, not as a suffix to target.schema—encourages the anti-pattern of users clobbering each others' models in development.

The current default is decidedly a compromise:

  • you configure the custom schema
  • you see it take a clear & obvious effect, so you know it's working
  • it's probably not the effect you were expecting
  • you read the docs to learn about why
  • you think about managing environments, which might be a new-ish concept!
  • and then... !!

You learn that you can override a subset of built-in dbt behavior, by defining a macro with the same name as a built-in macro. I don't have good data to back this up (we don't collect anonymous usage stats about macro overrides), but I'd guess that generate_x_name macros are one of the most common early overrides for new-ish users. To be clear, I wouldn't want to see everyone overriding everything, left right & center, but I do think it's important that dbt users learn to think of it as a set of sensible defaults that can, in many cases, be extended or customized.

I prefer that as a rite of passage, versus the experience of adding a custom schema for the first time and dbt running in development, to zero apparent effect. (Is dbt broken?? Should I post in Slack / open an issue?)

The options are...

I think you captured these well. If we did want to change this to be the default, I agree with your tactical proposal for how we'd do it.

I would like to propose that this pattern be the default for new projects. The simplest method is to add the macro to the starter_project, which will prevent any breaking changes.

I do like the cleanliness of this approach! (Bookkeeping: For a silly reason, we'd also need to make this change in the dbt-starter-project repo, and cut a new tag for [dbt version+], since that's what dbt Cloud actually clones when initializing projects in the IDE.)

Manually adding this macro to my own project every time (not great)

This is the status quo, and I'm going to argue that it's not such a big deal. Why?

# packages.yml
packages:
  - git: https://github.com/PedramNavid/dbt-pedrams-way
    revision: main
# dbt_project.yml
dispatch:
  - macro_namespace: dbt
    search_order: ['dbt_pedrams_way', 'dbt']
    # this will prefer 'dbt_pedrams_way.generate_schema_name' over the default 'dbt.generate_schema_name'
    # and do the same for all other global macros in the 'dbt' namespace

(ok, 3 + 3 = 6 lines of code)

Adding a command-line option to let the users decide whether or not they want this feature (I think this hurts the new user experience. I am just learning dbt and you want me to decide what exactly??)

Agreed! This would be really confusing! The behavior of generate_schema_name is already quite subtle; it's not an "advanced" feature, but perhaps an "intermediate" one, and definitely not something someone should need to think about on day one of their dbt journey.

Updating the global template and potentially breaking things for everyone (I do like this approach but that has more to do with my destructive tendencies than any reasonable approach to things)

:) some people just want to watch my dbt Slack DMs burn

Airing of grievances

Don't get me wrong — a lot about generate_schema_name does annoy me. I don't love the decision that exposed node to the generate_x_name macros (#1483), such that the same custom schema config input can actually land a model in a different schema based on model-level properties, instead of just environment-level ones. The macro might also be easier to read & reason, if it took target as an explicit argument, instead of just being pulled out of the global context as it is today.

{% macro default__generate_schema_name(custom_schema_name, node, target) -%}
    {% set schema =
        (target.schema + "_" + custom_schema_name|trim) if custom_schema_name
        else target.schema
    -%}
    {{ return(schema) }}
{%- endmacro %}
...
{% macro generate_schema_name_for_env(custom_schema_name, node, target) -%}
    {%- if target.name == 'prod' and custom_schema_name is not none -%}
        {{ return(custom_schema_name | trim) }}
    {%- else -%}
        {{ return(target.schema) }}
    {%- endif -%}
{%- endmacro %}

Users could still, of course, pull in {{ var() }} and {{ env_var() }} to write conditional logic to their heart's delight.

If I really had my druthers, I'd also cut node as the second argument in those macros, and force projects to have clear-cut rules about which schema inputs yield which schema outputs. That would need to be a v2 thing, if at all. It might be an easier sell in a future where orgs can deploy dbt via multiple smaller projects, with their own rules, rather than the monoliths of today...

@jtcohen6 jtcohen6 added discussion Refinement Maintainer input needed and removed triage labels Jan 6, 2023
@jtcohen6 jtcohen6 removed their assignment Jan 6, 2023
@PedramNavid
Copy link
Author

Thanks for that detailed response @jtcohen6 !

If most people are overriding the default behavior with a custom macro, then the default should be that overridden behavior. A frictionless, simpler user experience is preferable to one that forces the user to consider why something isn't what they expect.

For new users learning how to set up schemas, they would be in the docs already reading about how to specify them and where. A note explaining that the schema names, by default, only work in production might be enough?

But, given all that, I understand the points against not changing it too. Just thought I'd put my two cents in, but I don't feel too strongly either way!

@jtcohen6 jtcohen6 added the paper_cut A small change that impacts lots of users in their day-to-day label Feb 22, 2023
@christineberger
Copy link
Contributor

I'm going to leave my experiences here working with many projects: I think that "most" folks are not overriding it, but that it's a really good mix. Smaller projects tend to use and need only the defaults, while bigger analytics architecture tends to override these a lot. I'm in favor of keeping the current behavior default as I agree with Jeremy's points about how folks learn what can be done by a need vs. needing to do something they don't quite understand yet.

With that said, I do think that the schema override can be simpler. I'm not asking to be able to understand every single need for a custom schema override, just for the most basic one: overriding by environment. How that would be able to be overridden in a simpler way, I don't know - but I have gotten feedback from others that doing one thing from deployment and one thing from development should seem easier than a macro override. Perhaps this could just be a checkbox in Cloud for the deployment environment? (i.e, [ ] Prefix schema configurations with target schema)

Copy link
Contributor

github-actions bot commented Jan 2, 2024

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Jan 2, 2024
Copy link
Contributor

github-actions bot commented Jan 9, 2024

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion enhancement New feature or request paper_cut A small change that impacts lots of users in their day-to-day Refinement Maintainer input needed stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

3 participants