Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom variables inside profiles.yml #2514

Closed
dalebradman opened this issue Jun 8, 2020 · 15 comments
Closed

Allow custom variables inside profiles.yml #2514

dalebradman opened this issue Jun 8, 2020 · 15 comments
Labels
enhancement New feature or request paper_cut A small change that impacts lots of users in their day-to-day stale Issues that have gone stale

Comments

@dalebradman
Copy link

Describe the feature

I would like to be able to create and set custom variables inside ~/.dbt/profiles.yml file and then use these variables within models.

Describe alternatives you've considered

  • Setting environment variables. However this would require further setup instructions to ask people to add variables to their .profile, .bash_rc, .zshrc etc. which requires additional command line knowledge. It would also be harder to phase more variables in and deprecate them.
  • Creating a macro that can parse a yaml file (i.e. with open('~/.dbt/profiles.yml') ... ) and setting variables from there. However this is not currently possibly.

Additional context

The goal is to make the initial setup process for installing dbt as simple as possible. A lot of the users will not have command line experience, so it's a big step to get them to update their ~/.dbt/profiles.yml file. Therefore asking them to also maintain their own environment variables is not ideal and we would be able to provision these variables within our GitHub repo (and deploy efficiently).

This would also allow greater flexibility for what we can conditionally do inside dbt_project.yml and each of the individual models.

Who will this benefit?

The driver behind this is for everyone to set a DBT_USER_ALIAS variable that be used as a prefix to table names (inside generate_alias_name) and therefore prevent conflicts from occurring.

It will benefit those without that much command line experience, and subsequently administrators who spend time supporting them.

@dalebradman dalebradman added enhancement New feature or request triage labels Jun 8, 2020
@jtcohen6
Copy link
Contributor

Hey @dalebradman, thanks for the writeup.

I don't think we're planning to add anything to ~/.dbt/profiles.yml that doesn't immediately relate to how dbt connects to the warehouse. In the general case, if you need more values that differ based on who or where is running dbt:

  • environment variable
  • a macro that, depending on a value from target, returns a conditional set of values

The driver behind this is for everyone to set a DBT_USER_ALIAS variable that be used as a prefix to table names (inside generate_alias_name) and therefore prevent conflicts from occurring.

This sounds like a reasonable use case! I want to understand better. Do all of your development users materialize their models in the same database + schema? What value does each person set for database and schema in profiles.yml?

I'm wondering if you could create a generate_alias_name that prefixes each model with the value of target.schema. We always target.schema as a way to give each development user their own sandbox environment, and to avoid namespace collisions.

@jtcohen6 jtcohen6 removed the triage label Jun 10, 2020
@vembloud
Copy link

I'm having a use case for this, too.

This is what I'm currently using to dynamically select database for my sources (in schema.yml):

sources:
  - name: extract
    database: "{{ env_var('SOURCE_EXTRACT_DB') }}"
    schema: extract
    tables: ...

I have to set env variable before running dbt and change them before running for a different profile/target, which can easily lead to errors. I'd appreciate if I could define custom variables for each target in profiles.yml and then use it like this:

sources:
  - name: extract
    database: "{{ target.source_extract_db }}"

@jtcohen6
Copy link
Contributor

@vembloud How many different environments are you running against? Have you considered conditional logic like:

sources:
  - name: extract
    database: "{{ 'extract_prod' if target.name == 'prod' else 'extract_dev' }}"
    schema: extract
    tables: ...

In general, we find a pretty smooth handoff point from "I'm running with a few environments and can write target-based conditional logic" to "I have a complex deployment with several environments, codified in an orchestration tool that can also reliably set environment variables."

@vembloud
Copy link

@jtcohen6 Thanks for your quick answer. I have considered the conditional logic, but that would require developers to follow the same target naming pattern. We'd probably stick to env variables, then :)

@jtcohen6
Copy link
Contributor

Fair enough!

I'm going to close this issue, since we're not planning to make the code change originally requested (custom target vars in profiles.yml).

@dalebradman
Copy link
Author

@jtcohen6 apologies for the late reply, this skipped my inbox!

I want to understand better. Do all of your development users materialize their models in the same database + schema? What value does each person set for database and schema in profiles.yml?

Yes all dev users materialise their tables in the same BigQuery project. We've got a like-for-like match between our prod and dev env which is bound to datasets so creating a project per user is out of the question really (for provisioning/maintenance reasons as well).

What we did for this is asked each user to set schema in profiles.yml to firstname_lastname i.e. dale_b and then modified the generate_alias_name macro to:

{% macro generate_alias_name(custom_alias_name=none, node=none) -%}
    
    {%- if custom_alias_name is none and (target.name.split('-')[0] == 'prod' or target.schema == 'landing') -%}
    
        {{ node.name }}
        
    {%- elif target.name.split('-')[0] == 'prod' or target.schema == 'landing' -%}
    
        {{ custom_alias_name | trim }}_{{ node.name }}
        
    {%- elif custom_alias_name is none -%}
    
        {{ target.schema | trim }}_{{ node.name }}
        
    {%- else -%}
    
        {{ target.schema | trim }}_{{ custom_alias_name | trim }}_{{ node.name }}

    {%- endif -%}
{%- endmacro %}

This has worked very well for us since.

@jtcohen6
Copy link
Contributor

What we did for this is asked each user to set schema in profiles.yml to firstname_lastname i.e. dale_b

This is in line with what we do in development, too. I don't totally follow how there can be namespace conflicts if everyone is building into their own dev schema.

All the same, I'm glad yo've been able to use generate_alias_name and find a solution that works well.

@ybressler
Copy link

@vembloud How many different environments are you running against? Have you considered conditional logic like:

sources:
  - name: extract
    database: "{{ 'extract_prod' if target.name == 'prod' else 'extract_dev' }}"
    schema: extract
    tables: ...

In general, we find a pretty smooth handoff point from "I'm running with a few environments and can write target-based conditional logic" to "I have a complex deployment with several environments, codified in an orchestration tool that can also reliably set environment variables."

I dislike this solution since it is not declarative – Ideally, a model can determine its state without dependencies of env vars, at least in the yaml. I would much rather a way of storing variables with a yaml – and in the case of sensitive values, have the keys point to env variables.

@cwelton
Copy link

cwelton commented Apr 27, 2022

Adding my voice to the others asking for this. Being able to set variables as part of a profile is a much cleaner solution than putting that configuration into conditional expressions or requiring additional setting of environment variables.

The choice to not implement this as a capability feels like the wrong decision.

@inakrin
Copy link

inakrin commented Jun 18, 2022

The choice to not implement this as a capability feels like the wrong decision.
I agree as well. This feature would be extremely handy for my current project

@adam-wrobel
Copy link

Would be nice to have this, I have extremely long -vars "{...."} in every invocation because I have multiple depending on the environment, but they dont fit into schema of profile.yml

Combined with #4314 it makes the life very hard

@franviera92
Copy link

I would also like this change, if I need to specify a vars per environment, how would I apply this change currently?

@jtcohen6
Copy link
Contributor

Heard!

I am sympathetic to the need for better per-env config options. Writing the same conditional expressions (Jinja-in-yaml) doesn't feel very good. Setting environment variables does feel like The Right Solution, but it's a bigger lift for simpler deployments.

I still believe this specific proposal would require overloading the profile/target, which is really just the set of necessary credentials & configuration for making & managing a data platform connection. It shouldn't be a means of setting arbitrary conditional configuration, which is what dbt makes possible with vars and env_vars.

But: There's no great way today to tie the value of vars or env_vars to the same "environments" that are already (loosely) defined as target names within profiles.yml. If anything, this makes me want to revisit the structure of profiles.yml (long overdue - #1958), in a way that might desire/require some breaking changes, and therefore land in the next major version (v2).

In the shorter term: We've been spending the last several months revisiting & revamping our configuration story, and there are several ideas proposed in #6207. Some of the options:

  • A dbt.env file for setting environment variables—in version control, to be shared by all users, if you so chose
  • A dbt_config.yml file that could live in the root directory, or the user's "home" directory (similar to profiles.yml). Should it be possible to set vars in there as well (instead of on the CLI)? That wouldn't support defining multiple different environments, though—and I'm not sure how we would support that, short of requiring an explicit flag be passed every time. (Because this file needs to be read before all other flags / env vars are resolved, we can't have a circular dependency of "read this config file to determine project/profile location" → "read the project file to determine the profile" → "read the profile to determine the default target" → "reread this config file now that we know the target.")

@jtcohen6 jtcohen6 added paper_cut A small change that impacts lots of users in their day-to-day and removed triage labels Jun 25, 2023
@jtcohen6 jtcohen6 reopened this Jun 25, 2023
Copy link
Contributor

github-actions bot commented Jan 4, 2024

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Jan 4, 2024
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request paper_cut A small change that impacts lots of users in their day-to-day stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

8 participants