Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2465] [Bug] meta field inheritance in sources works exactly the wrong way #7440

Closed
2 tasks done
Maayan-s opened this issue Apr 23, 2023 · 9 comments
Closed
2 tasks done
Labels
bug Something isn't working stale Issues that have gone stale user docs [docs.getdbt.com] Needs better documentation

Comments

@Maayan-s
Copy link

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Hi team 😁

According to the docs here:
"meta dictionaries are merged (a more specific key-value pair replaces a less specific value with the same key)"

This indeed happens for models and tests.
However in sources the less specific value is replaces the more specific one.

I played with configuring meta in different levels, and the priorities are:

  1. dbt_project.yml meta
  2. source level meta
  3. table level meta

Expected Behavior

Priorities should be:
3. dbt_project.yml meta
2. source level meta

  1. table level meta
    (exactly the other way around from current behavior)

Steps To Reproduce

I'm using dbt latest release (1.4.6).

In the dbt_project.yml I configured:

sources:
  +meta:
    debug: "project_level"

And for a specific source I configured:

sources:
  - name: source_name
    schema: source_schema
    meta:
      debug: "source_level"
    tables:
      - name: table_name
        meta:
          debug: "table_level"

When I extract the meta field from the graph, I get the project_level value under debug.
When I remove the config from dbt_project.yml and extract the meta field from the graph, I get the source_level value under debug.
When I remove the config from the source and extract the meta field from the graph, I get the table_level value under debug.

Relevant log output

No response

Environment

- OS: macOS 13.3.1
- Python: 3.9.15
- dbt: 1.4.6

Which database adapter are you using with dbt?

No response

Additional Context

No response

@Maayan-s Maayan-s added bug Something isn't working triage labels Apr 23, 2023
@github-actions github-actions bot changed the title [Bug] meta field inheritance in sources works exactly the wrong way [CT-2465] [Bug] meta field inheritance in sources works exactly the wrong way Apr 23, 2023
@dbeatty10 dbeatty10 self-assigned this Apr 25, 2023
@dbeatty10
Copy link
Contributor

Hi @Maayan-s 😁

Something to try

What do the priorities look like if you try the following instead?

In the dbt_project.yml:

sources:

  # for sources, it appears okay to put config up here, but I don't think it's okay for models
  # +meta:
  #   debug: "top_level_config"

  your_dbt_project_name:
    +meta:
      debug: "project_level_config"

And for a specific source:

sources:
  - name: source_name
    schema: source_schema
    config:
      meta:
        debug: "source_level_config"
    tables:
      - name: table_name
        config:
          meta:
            debug: "table_level_config"

In analyses/my_analysis.sql (for examining all the details of each node):

{% set sources = [] -%}
{% for node in graph.sources.values() -%}
  {{ tojson(node) }}
{%- endfor %}

Compile and pretty-print the JSON for the source nodes:

dbt compile
jq . target/compiled/your_dbt_project_name/analyses/my_analysis.sql > source_nodes.json

Maximum detail

To really see in detail how everything is affecting the node dictionary, I did the following:

In the dbt_project.yml:

sources:
  +meta:
    debug: "top_level"
  config:
    +meta:
      debug: "top_level_config"
  my_dbt_project:
    +meta:
      debug: "project_level"
    config:
      +meta:
        debug: "project_level_config"

And for a specific source:

sources:
  - name: source_name
    schema: source_schema

    meta:
      debug: "source_level"
    config:
      meta:
        debug: "source_level_config"

    tables:
      - name: table_name
        meta:
          debug: "table_level"
        config:
          meta:
            debug: "table_level_config"

Here's a subset of the JSON key/values:

{
  "database": "postgres",
  "schema": "dbt_dbeatty",
  "name": "sample_model",
  "resource_type": "source",
  "package_name": "my_dbt_project",
  "path": "models/_sources.yml",
  "original_file_path": "models/_sources.yml",
  "unique_id": "source.my_dbt_project.dbt_dbeatty.sample_model",
  "fqn": [
    "my_dbt_project",
    "dbt_dbeatty",
    "sample_model"
  ],
  "source_name": "dbt_dbeatty",
  "identifier": "sample_model",
  "meta": {
    "debug": "table_level"
  },
  "source_meta": {
    "debug": "source_level"
  },
  "config": {
    "enabled": true,
    "meta": {
      "debug": "table_level_config"
    }
  },
  "unrendered_config": {
    "meta": {
      "debug": "table_level_config"
    }
  },
  "relation_name": "\"postgres\".\"dbt_dbeatty\".\"sample_model\"",
}

This looks like the priorities you expecting, but let me know either way!

@dbeatty10 dbeatty10 removed their assignment Apr 25, 2023
@Maayan-s
Copy link
Author

Hi @dbeatty10,
So I did what you suggested and learned that:

  1. Config is not valid in the dbt_project.yml:
    image

  2. When I placed meta under config I did get the priorities I expect:

image

Whereas without config I get this:
image

The thing is this is really inconsistent (and undocumented).
On models - the priorities work with just meta.
On sources - put meta under config.
Unless it's sources in dbt_project.yml, then don't use config.

@dbeatty10
Copy link
Contributor

The thing is this is really inconsistent (and undocumented).

😅 This feedback is fair, @Maayan-s

Inconsistencies

I don't know for certain, but some or all of the inconsistency for sources might be covered by #3662

Documentation

At the very least, would some clear documentation within the docs for meta stating the following be helpful?

In order for meta dictionaries to merge with a more specific key-value pair replacing a less specific value with the same key:

  • For all* resource types in _properties.yml files:
    • put meta under config.
  • Within dbt_project.yml don't use config:
    • use a top-level meta key instead.

* For models - meta also works as a top-level key.

Of course we'd also want to update all the examples within the meta docs to nest the meta key under the config key in the all right places.

@dbeatty10 dbeatty10 added awaiting_response user docs [docs.getdbt.com] Needs better documentation and removed triage labels Apr 25, 2023
@jtcohen6
Copy link
Contributor

I agree this is quite inconsistent & confusing. The current complexity is there for two reasons:

  • Backward compatibility when we switched meta from being a "property" to a "configuration," back in v0.21
  • We only made that change for a subset of resource types, and sources were not one of them

https://docs.getdbt.com/reference/resource-configs/meta#definition

Depending on the resource you're configuring, meta may be available within the config property, or as a top-level key. (For backwards compatibility, meta is always supported as a top-level key, though without the capabilities of config inheritance.)

I would like to develop a general-purpose answer to the inconsistencies that exist today between "configurations" (which can be inherited & overridden) and "properties" (which cannot):

@Maayan-s
Copy link
Author

Thanks @jtcohen6 and @dbeatty10!

Changing the docs is for sure the immediate solution, and I understand the planned solution and think it's solid and will solve a lot of confusion!

@alison985
Copy link

If I should break this off into a separate bug or feature request issue please let me know.

In dbt-labs/docs.getdbt.com#3710, which came from this issue, @dbeatty10 says * For models only - meta also works as a top-level key (but better to use consistent syntax across resource types).

The following in a schema.yml file in a dbt core 1.4.6 project will error out a dbt parse command.

models: 
  meta:
    owner: "@Richard" 
    dbt_builder: "@Alison"
    
  - name: model_name_A
  - name: model_name_B

The error is:

    Raw Error:
    ------------------------------
    while parsing a block mapping
      in "<unicode string>", line 4, column 3
    did not find expected key
      in "<unicode string>", line 8, column 3

but the following is fine:

models: 
  - name: model_name_A
    meta:
      owner: "@Richard" 
      dbt_builder: "@Alison"
  - name: model_name_B
    meta:
      owner: "@Richard" 
      dbt_builder: "@Alison"

I should(want to be able to) specify meta data for an entire set of models without having to put the same meta data under each model in a model set. In that way, I can keep my code DRYer.

I also tried the following 2 versions and got the same dbt parse error.

models: 
  config:
    meta:
      owner: "@Richard" 
      dbt_builder: "@Alison"

  - name: model_name_A
    config:
      meta:
        owner: "@Richard Model" 
        dbt_builder: "@Alison Model"
models: 
  config:
    +meta:
      owner: "@Richard" 
      dbt_builder: "@Alison"

  - name: model_name_A
    config:
      +meta:
        owner: "@Richard Model" 
        dbt_builder: "@Alison Model"

@dbeatty10
Copy link
Contributor

@alison985 I think this is a just a misunderstanding due to me using the phrase "top-level" too loosely in combination with schema.yml and dbt_project.yml behaving differently from each other 😅

Inside of schema.yml files, the meta configuration must be repeated underneath each model name, as you saw.

dbt_project.yml is the mechanism to use if you want to configure a set of models. It requires organizing sets of models into a hierarchy of directories.

Example

Suppose you put all your Richard/Alison models into a folder named ra. Then your dbt_project.yml would look something like this:

name: "my_project"
profile: "my_profile"

models:
  my_project:

    +meta:
      owner: "@Default_Owner_Across_This_Project"
      dbt_builder: "@Default_dbt_Builder_Across_This_Project"

    # relative path: models/ra
    ra:
      +meta:
        owner: "@Richard"
        dbt_builder: "@Alison"

Explanation

In this example, all models within models/ra (and all its subdirectories) will have the @richard / @Alison meta config (unless overridden by a schema.yml file or {{ config(...) }} within the model itself).

Otherwise, they will have the @Default... meta config that is directly under my_project:.

Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Feb 14, 2024
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale Issues that have gone stale user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
Development

No branches or pull requests

4 participants