Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass parsed_node into generate_schema_name. #1463

Closed
nydnarb opened this issue May 14, 2019 · 2 comments · Fixed by #1483
Closed

Pass parsed_node into generate_schema_name. #1463

nydnarb opened this issue May 14, 2019 · 2 comments · Fixed by #1483

Comments

@nydnarb
Copy link
Contributor

nydnarb commented May 14, 2019

Feature description

Recently, I submitted a feature to allow dbt users to specify a generate_alias_name macro in their project. Therefore, for example, the name of the table that the model is being built in could be implemented as a function of context like the target of the build or tags on the model. When implementing that feature, we decided to pass a node object into the get_alias function. https://github.com/fishtown-analytics/dbt/blob/4b7bddb48189c926e2fc6b753bd5328ce4f953bc/core/dbt/parser/base.py#L213

We have a use case where we would like to use different schema depending on tags that might be included in a model (e.g., we might launch a model in production with a experimental tag). While investigating if I could update our generate_schema_name macro to access the model's tags, I realized that there is no model or node object in the available context.

My proposal is that we make some kind of node or model object available at time of rendering a model's custom schema. I think it will be as easy as updating get_schema to also accept parsed_node: https://github.com/fishtown-analytics/dbt/blob/4b7bddb48189c926e2fc6b753bd5328ce4f953bc/core/dbt/parser/base.py#L209
(Note, we would probably have to add it as a second input and not the first. If that's the case, we should reverse the order of the inputs in get_alias prior to release of custom aliases, for consistency).

Who will this benefit?

My organization! But also, anyone who might want to dynamically generate schema names based on things in the model config, like tags, sortkeys, distkeys, materialization, etc. Not just, for example, the target and schema in config.

@brandfocus
Copy link

+1

@drewbanin
Copy link
Contributor

@nydnarb thanks for making this issue!

The core challenge with operating in this part of dbt's codebase is that the state of the resources that dbt operates on are still in flux! The schema name for a model is determined during parsing, so there are still some attributes of the ParsedNode that haven't been finalized yet.

Check out the _update_parsed_node_info function. The following attributes of the ParsedNode are all finalized in this order:

  1. schema
  2. alias
  3. database
  4. tags
  5. config
  6. pre-/post-hooks

Looking at this code again, I think we're actually not doing the strictly correct thing when generating an alias name! Specifically, the database, tags, config, and hooks that are present in the parsed_node on Line 213 do not account for configs that were specified in the model, eg:

{{ config(tags=['abc', '123']) }}

select 1 as id

Only configs that are specified in dbt_project.yml are present in the parsed_node object at the time when get_alias is currently being called! If you try to log(parsed_node.tags) in the generate_alias_name macro, I think you'll find that it results in an empty list using the model shown above.

I think there's a quick fix though - we just need to wait until these other attributes are finalized before calling get_schema and get_alias. There is some fundamental problem here though: either one of get_schema or get_alias will need to happen first, so the one that happens last will potentially have the wrong value for the one that happens first. We either need to run dbt on quantum computers, or prevent folks from operating on these two values in their generate_schema_name and generate_alias_name macros (either via good docs, or something programatic).

Apologies for leading you astray in your generate_alias_name PR -- I think I recommended using parsed_node there, and I didn't realize that this would be a problem.

Overall though, I love the suggestion presented here, and think it will be a great addition to dbt if we can get the specifics right.

Note, we would probably have to add it as a second input and not the first. If that's the case, we should reverse the order of the inputs in get_alias prior to release of custom aliases, for consistency

Yep, totally agree! I didn't realize generate_alias_name wasn't released yet - let's definitely swap the order of the args in generate_alias_name before that code is released! Really good catch :)

cc @beckjake - did I miss anything here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants