Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add full_refresh option to on_schema_change of incremental models #6412

Conversation

pol-defont-reaulx
Copy link

@pol-defont-reaulx pol-defont-reaulx commented Dec 9, 2022

resolves dbt-labs/dbt-adapters#154

Description

I added a full_refresh option on on_schema_change option for incremental models. Now, with this option and when finding a schema change on a model it will do a full refresh on this model

Checklist

@jtcohen6 jtcohen6 added Team:Adapters Issues designated for the adapter area of the code ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering labels Jan 2, 2023
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pol-defont-reaulx Thanks for the contribution! I know many folks have wanted this for a while, so props to you for rolling up your sleeves and starting to make it happen.

After some local testing, I've realized there's a more foundational issue at play here. Given a model like this one:

{{ config(
  materialized = 'incremental',
  on_schema_change = 'full_refresh'
) }}

with source_data as (
  
    select * from {{ ref('some_model') }}
    
    {% if is_incremental() %}
    
      -- always last 3 days (PostgreSQL)
      where date_day > (current_date - 3 days)
    
    {% endif %}
  
),

select ...

If the on_schema_change behavior is detected, and we need to move into "full refresh mode" — we actually need to re-compile the model's SQL, with is_incremental now returning false. Why? Otherwise, we'll be replacing the existing table with only the past 3 days of data, when what we really want to do is start the whole process over again, as if we had configured the model with full_refresh: true or passed in the --full-refresh flag.

I'm not sure of the right way to trigger that re-compilation, let alone how to do it from within the Jinja (materialization) context. These are two clearly separated steps today, and dbt expects a model's compiled SQL to be available before it starts doing anything materialization-related.

There's also a performance consideration: By the time we know there's been a schema change, we've already created the temp table with new data, which now needs to be thrown away in favor of the "full" version. We could try hacking around that, by using get_columns_in_query (where false limit 0) to detect schema changes, instead of actually running + saving the query, but there's always risk involved with that sort of proxy approach.

(cc @colin-rogers-dbt - this could be a fun problem to jam on, but definitely not trivial)


Some more tactical considerations, typed out before I started thinking about the point above:

  • Ultimately, we'll want automated testing for this. The existing functional tests for on_schema_change behavior in incremental models are defined here: https://github.com/dbt-labs/dbt-core/tree/main/tests/functional/incremental_schema_tests
  • In order to support this behavior on all our adapters, we'll need similar changes to the incremental materializations in dbt-snowflake, dbt-bigquery, and dbt-spark. Several of these don't use an "alter-rename-swap" mechanism, but instead just an atomic create or replace.

@@ -128,6 +128,9 @@

{% do exceptions.raise_compiler_error(fail_msg) %}

{% elif on_schema_change == 'full_refresh' %}
{{ return({"full_refresh": True}) }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about some additional debug-level logging here? Something like:

      {% do log("Full refreshing " ~ target_relation ~ " on account of schema change") %}

Comment on lines +48 to +51
{% if dest_columns is mapping and dest_columns.get("full_refresh") %}
{% set build_sql = get_create_table_as_sql(False, intermediate_relation, sql) %}
{% set need_swap = true %}
{% else %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need logic like this in every incremental materialization, including the modified versions in other adapter plugins we maintain:

@jtcohen6 jtcohen6 added Refinement Maintainer input needed and removed ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering labels Jan 5, 2023
@nbsmo4
Copy link

nbsmo4 commented May 31, 2023

Any progress on this?

@Fleid
Copy link
Contributor

Fleid commented May 31, 2023

It wasn't picked up on our side, so no.

@dbeatty10 do you think you could pick that up as part of the incremental effort?

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jun 1, 2023

This will be quite tricky to implement! As described in my comment above - we to re-compile the model's SQL mid-materialization, now with is_incremental() returning False. I think that's possible, but it's not something for which we have an existing pattern, and it likely requires changes outside the scope of just the incremental materialization (Jinja) code.

@Fleid
Copy link
Contributor

Fleid commented Jun 2, 2023

Oh, by "pick that up" I meant the refinement, not the implementation!
Thanks for the reminder that it's not an easy task Jeremy :)

So no @nbsmo4, no progress on it sadly :/

@Stijn-Hoeke
Copy link

Would be amazing if this feature makes it through, won't be able to contribute myself unfortunately.

@dbeatty10 dbeatty10 added the community This PR is from a community member label Mar 22, 2024
@dbeatty10 dbeatty10 added dbt-adapters Needs migration to the dbt-adapters repo enhancement New feature or request paper_cut A small change that impacts lots of users in their day-to-day labels Apr 9, 2024
@dbeatty10
Copy link
Contributor

Thanks for taking the time to open this PR @pol-defont-reaulx! Since opening, we've decoupled dbt Adapters from dbt Core, and this code now lives in a separate repo: dbt-adapters.

A consequence of the decoupling is that PR can't be merged anymore as is, so we're closing it. For more context see #9171.

The linked issue has already been transferred. Please don't hesitate to re-open your proposed code changes within this PR in the dbt-adapters repo.

@dbeatty10 dbeatty10 closed this Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes community This PR is from a community member dbt-adapters Needs migration to the dbt-adapters repo enhancement New feature or request paper_cut A small change that impacts lots of users in their day-to-day Refinement Maintainer input needed Team:Adapters Issues designated for the adapter area of the code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support on_schema_change: full_refresh mode
6 participants