feat: add full_refresh option to on_schema_change of incremental models #6412

pol-defont-reaulx · 2022-12-09T16:24:57Z

Description

I added a full_refresh option on on_schema_change option for incremental models. Now, with this option and when finding a schema change on a model it will do a full refresh on this model

Checklist

I have read the contributing guide and understand what's expected of me
I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have opened an issue to add/update docs, or docs changes are not required/relevant for this PR
I have run changie new to create a changelog entry

jtcohen6

@pol-defont-reaulx Thanks for the contribution! I know many folks have wanted this for a while, so props to you for rolling up your sleeves and starting to make it happen.

After some local testing, I've realized there's a more foundational issue at play here. Given a model like this one:

{{ config(
  materialized = 'incremental',
  on_schema_change = 'full_refresh'
) }}

with source_data as (
  
    select * from {{ ref('some_model') }}
    
    {% if is_incremental() %}
    
      -- always last 3 days (PostgreSQL)
      where date_day > (current_date - 3 days)
    
    {% endif %}
  
),

select ...

If the on_schema_change behavior is detected, and we need to move into "full refresh mode" — we actually need to re-compile the model's SQL, with is_incremental now returning false. Why? Otherwise, we'll be replacing the existing table with only the past 3 days of data, when what we really want to do is start the whole process over again, as if we had configured the model with full_refresh: true or passed in the --full-refresh flag.

I'm not sure of the right way to trigger that re-compilation, let alone how to do it from within the Jinja (materialization) context. These are two clearly separated steps today, and dbt expects a model's compiled SQL to be available before it starts doing anything materialization-related.

There's also a performance consideration: By the time we know there's been a schema change, we've already created the temp table with new data, which now needs to be thrown away in favor of the "full" version. We could try hacking around that, by using get_columns_in_query (where false limit 0) to detect schema changes, instead of actually running + saving the query, but there's always risk involved with that sort of proxy approach.

(cc @colin-rogers-dbt - this could be a fun problem to jam on, but definitely not trivial)

Some more tactical considerations, typed out before I started thinking about the point above:

Ultimately, we'll want automated testing for this. The existing functional tests for on_schema_change behavior in incremental models are defined here: https://github.com/dbt-labs/dbt-core/tree/main/tests/functional/incremental_schema_tests
In order to support this behavior on all our adapters, we'll need similar changes to the incremental materializations in dbt-snowflake, dbt-bigquery, and dbt-spark. Several of these don't use an "alter-rename-swap" mechanism, but instead just an atomic create or replace.

jtcohen6 · 2023-01-05T10:05:49Z

core/dbt/include/global_project/macros/materializations/models/incremental/on_schema_change.sql

@@ -128,6 +128,9 @@

          {% do exceptions.raise_compiler_error(fail_msg) %}

+        {% elif on_schema_change == 'full_refresh' %}
+          {{ return({"full_refresh": True}) }}


What do you think about some additional debug-level logging here? Something like:

{% do log("Full refreshing " ~ target_relation ~ " on account of schema change") %}

jtcohen6 · 2023-01-05T10:08:42Z

core/dbt/include/global_project/macros/materializations/models/incremental/incremental.sql

+    {% if dest_columns is mapping and dest_columns.get("full_refresh") %}
+      {% set build_sql = get_create_table_as_sql(False, intermediate_relation, sql) %}
+      {% set need_swap = true %}
+    {% else %}


We'll need logic like this in every incremental materialization, including the modified versions in other adapter plugins we maintain:

dbt-snowflake

dbt-bigquery

dbt-spark

nbsmo4 · 2023-05-31T08:21:20Z

Any progress on this?

Fleid · 2023-05-31T22:48:33Z

It wasn't picked up on our side, so no.

@dbeatty10 do you think you could pick that up as part of the incremental effort?

jtcohen6 · 2023-06-01T13:08:28Z

This will be quite tricky to implement! As described in my comment above - we to re-compile the model's SQL mid-materialization, now with is_incremental() returning False. I think that's possible, but it's not something for which we have an existing pattern, and it likely requires changes outside the scope of just the incremental materialization (Jinja) code.

Fleid · 2023-06-02T17:14:38Z

Oh, by "pick that up" I meant the refinement, not the implementation!
Thanks for the reminder that it's not an easy task Jeremy :)

So no @nbsmo4, no progress on it sadly :/

Stijn-Hoeke · 2023-11-06T09:01:20Z

Would be amazing if this feature makes it through, won't be able to contribute myself unfortunately.

dbeatty10 · 2024-04-10T19:55:34Z

Thanks for taking the time to open this PR @pol-defont-reaulx! Since opening, we've decoupled dbt Adapters from dbt Core, and this code now lives in a separate repo: dbt-adapters.

A consequence of the decoupling is that PR can't be merged anymore as is, so we're closing it. For more context see #9171.

The linked issue has already been transferred. Please don't hesitate to re-open your proposed code changes within this PR in the dbt-adapters repo.

feat: add full_refresh option to on_schema_change of incremental models

0c603a7

pol-defont-reaulx requested a review from a team as a code owner December 9, 2022 16:24

pol-defont-reaulx requested a review from colin-rogers-dbt December 9, 2022 16:24

cla-bot bot added the cla:yes label Dec 9, 2022

feat: add changie for full refresh option

617ca51

pol-defont-reaulx requested a review from a team as a code owner December 9, 2022 16:27

pol-defont-reaulx requested a review from Fleid December 9, 2022 16:27

fix: full_refresh on schema change return must be a mapping

1609c32

jtcohen6 added Team:Adapters Issues designated for the adapter area of the code ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering labels Jan 2, 2023

jtcohen6 requested changes Jan 5, 2023

View reviewed changes

jtcohen6 added Refinement Maintainer input needed and removed ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering labels Jan 5, 2023

jtcohen6 mentioned this pull request Jun 29, 2023

[CT-2755] [Bug] Allow on_schema_change = fail for contracted incremental models #7975

Closed

2 tasks

jtcohen6 mentioned this pull request Oct 23, 2023

[CT-3239] [Feature] Add a full_refresh option to on_schema_change #8873

Closed

3 tasks

dbeatty10 added the community This PR is from a community member label Mar 22, 2024

dbeatty10 added dbt-adapters Needs migration to the dbt-adapters repo enhancement New feature or request paper_cut A small change that impacts lots of users in their day-to-day labels Apr 9, 2024

dbeatty10 closed this Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add full_refresh option to on_schema_change of incremental models #6412

feat: add full_refresh option to on_schema_change of incremental models #6412

pol-defont-reaulx commented Dec 9, 2022 •

edited

Loading

jtcohen6 left a comment

jtcohen6 Jan 5, 2023

jtcohen6 Jan 5, 2023

nbsmo4 commented May 31, 2023

Fleid commented May 31, 2023

jtcohen6 commented Jun 1, 2023

Fleid commented Jun 2, 2023 •

edited

Loading

Stijn-Hoeke commented Nov 6, 2023

dbeatty10 commented Apr 10, 2024

feat: add full_refresh option to on_schema_change of incremental models #6412

feat: add full_refresh option to on_schema_change of incremental models #6412

Conversation

pol-defont-reaulx commented Dec 9, 2022 • edited Loading

Description

Checklist

jtcohen6 left a comment

Choose a reason for hiding this comment

jtcohen6 Jan 5, 2023

Choose a reason for hiding this comment

jtcohen6 Jan 5, 2023

Choose a reason for hiding this comment

nbsmo4 commented May 31, 2023

Fleid commented May 31, 2023

jtcohen6 commented Jun 1, 2023

Fleid commented Jun 2, 2023 • edited Loading

Stijn-Hoeke commented Nov 6, 2023

dbeatty10 commented Apr 10, 2024

pol-defont-reaulx commented Dec 9, 2022 •

edited

Loading

Fleid commented Jun 2, 2023 •

edited

Loading