Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Inconsistent behaviour of 'defer' with the 'dbt compile' command #9991

Closed
2 tasks done
siljamardla opened this issue Apr 22, 2024 · 2 comments
Closed
2 tasks done
Labels
bug Something isn't working state Stateful selection (state:modified, defer) wontfix Not a bug or out of scope for dbt-core

Comments

@siljamardla
Copy link

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Defer does not work with simple compile. This will compile to SQL that uses the dev schema tables:

dbt compile --defer --state prod-run-artifacts

However, if you need to compile production SQL for a specific table, this will work:

dbt compile --defer --state prod-run-artifacts --select model_a

Expected Behavior

Any dbt command executed with --defer should have the same behaviour of combining live data with data that exist in the dev environment.

Steps To Reproduce

  • I dropped my dev schema
  • I ran dbt compile --defer --state target/live (this is where I store the latest live manifest)
  • Checked the compiled output. All the refs use my personal schema name.
  • I recreated my dev schema. I checked that there are no tables in it.
  • I ran dbt compile --defer --state target/live again
  • Checked the compiled output, again all the refs use my personal schema name.
  • I ran dbt compile --defer --state target/live --select my_model_name > target/tmp_my_model_name.sql
  • Checked the output, in this one all the refs have been compiled to use the live schema names
  • Just in case, I deleted my target folder and went through the same routine again, still the same outcome

Relevant log output

No response

Environment

- OS: Mac OS
- Python: 3.11
- dbt: 1.7.9

Which database adapter are you using with dbt?

spark

Additional Context

dbt-labs/docs.getdbt.com#5162

@siljamardla siljamardla added bug Something isn't working triage labels Apr 22, 2024
@siljamardla siljamardla changed the title [Bug] Inconsistent behavoour of 'defer' with the 'dbt compile' command [Bug] Inconsistent behaviour of 'defer' with the 'dbt compile' command Apr 22, 2024
@dbeatty10 dbeatty10 added the state Stateful selection (state:modified, defer) label Apr 22, 2024
@dbeatty10 dbeatty10 self-assigned this Apr 22, 2024
@dbeatty10
Copy link
Contributor

Thanks for reaching out @siljamardla -- indeed state selection can be tricky to think about!

I don't think this is a bug. Let me explain why.

Project setup

Suppose you have the following simple models in your dbt project and you start with a set of production artifacts:

Project files

models/model_a.sql

-- model_a: {{ this }}

select 1 as id

models/model_b.sql

-- model_b: {{ this }}

select 2 as id

models/model_c.sql

-- model_c: {{ this }}
-- depends on:
--   model_a: {{ ref("model_a") }}
--   model_b: {{ ref("model_b") }}

select 1 as id
Production artifacts

`profiles.yml

duckdb:
  target: dev
  outputs:
    prod:
      type: duckdb
      path: 'db.db'
      schema: prod
    dev:
      type: duckdb
      path: 'db.db'
      schema: feature_456

Build production and examine the database table names:

dbt build --target prod
dbt --quiet ls --output json --output-keys database schema alias --target prod | jq .

Save the production state:

rm -rf prod-run-artifacts
mv target prod-run-artifacts
image

💡 Insight 1

When you do a dbt compile --target dev, it will:

  • only compile selected nodes in your project
  • use the dev target for all references

👉 Not specifying --select is equivalent to selecting all nodes.

💡 Insight 2

When you add --defer --favor-state --state prod-run-artifacts, then it will:

  • only use prod-run-artifacts for references to nodes that were not selected.

👉 --defer --state only comes into play for any un-selected nodes.

Example 1

These are all equivalent:

dbt compile --target dev --defer --favor-state --state prod-run-artifacts
dbt compile --target dev --defer --favor-state --state prod-run-artifacts --select "fqn:*"
dbt compile --target dev --defer --favor-state --state prod-run-artifacts --select model_c model_b model_a

And they all compile to this result (target/compiled/my_project/models/model_c.sql):

-- model_c: "db"."feature_456"."model_c"
-- depends on:
--   model_a: "db"."feature_456"."model_a"
--   model_b: "db"."feature_456"."model_b"

select 1 as id

Examples 2 and 3

Either of the examples below will compile with different output than above since some nodes are not selected.

Example 1

dbt compile --target dev --defer --favor-state --state prod-run-artifacts --select model_c model_b

is the same as:

dbt compile --target dev --defer --favor-state --state prod-run-artifacts --select model_c model_b model_a --exclude model_a

is the same as:

dbt compile --target dev --defer --favor-state --state prod-run-artifacts --exclude model_a

Compiled output:

-- model_c: "db"."feature_456"."model_c"
-- depends on:
--   model_a: "db"."prod"."model_a"
--   model_b: "db"."feature_456"."model_b"

select 1 as id

Example 2

dbt compile --target dev --defer --favor-state --state prod-run-artifacts --select model_c

Compiled output:

-- model_c: "db"."feature_456"."model_c"
-- depends on:
--   model_a: "db"."prod"."model_a"
--   model_b: "db"."prod"."model_b"

select 1 as id

Notes

  • All the commands above start with dbt compile --target dev --defer --favor-state --state prod-run-artifacts. The only thing that changes is what follows --select.
  • For sake of simplicity, I used the --favor-state flag in the examples above. But the general principles stay the same when that flag is not added.

Summary

I'm going to close this a not a bug, but please let me know I've missed something important and we can re-consider. 🙏

@dbeatty10 dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 22, 2024
@dbeatty10 dbeatty10 removed the triage label Apr 22, 2024
@dbeatty10 dbeatty10 removed their assignment Apr 22, 2024
@dbeatty10 dbeatty10 added the wontfix Not a bug or out of scope for dbt-core label Apr 22, 2024
@siljamardla
Copy link
Author

siljamardla commented May 14, 2024

@dbeatty10
Thank you for the detailed examples, I get what you mean by defer never using production schema for those models that have been included in the --select clause.

I've actually come back to this issue because I was trying to use defer for testing. I guess I'm hitting the same concept here. I tried something like dbt test --select my_model --defer --state target/live. The model has a not null test defined for the order_key column.
I was hoping I could compile and run the test for live data. Because, well, it would be good to test my test before I deploy it to live. And testing my test would be much more convenient and appropriate on live data, not on dev data.
However, the compiled test reads:

select order_key
from dev_schema.my_model
where order_key is null

Is there a way to compile the test to run on live? If not, should I create a feature request? What could be a good way to achieve this functionality?

Edit:
Out of curiosity I've also defined a more complex (dbt-expectations) test that has a ref to another table in it:

    tests:
      - dbt_expectations.expect_table_aggregation_to_equal_other_table:
          expression: count(*)
          compare_model: source("source_name","source_table_for_my_model")
          compare_expression: count(*)
          group_by: [created_date]
          compare_group_by: [created_date]
          row_condition: 
            created_date = (SELECT max(created_date) from {{ref('my_other_model')}})
          compare_row_condition: >
            created_date = (SELECT max(created_date) from {{ref('my_other_model')}})

I was thinking that maybe this reference to another table will compile for the live schema because it's not included in the --select when I run the test on my_model. But no, it will still compile to dev_schema.my_other_model in the test script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working state Stateful selection (state:modified, defer) wontfix Not a bug or out of scope for dbt-core
Projects
None yet
Development

No branches or pull requests

2 participants